Efficient resampling methods for nonsmooth estimating functions

DONGLIN ZENG; D Y LIN

doi:10.1093/biostatistics/kxm034

. Author manuscript; available in PMC: 2009 Apr 24.

Published in final edited form as: Biostatistics. 2007 Oct 8;9(2):355–363. doi: 10.1093/biostatistics/kxm034

Efficient resampling methods for nonsmooth estimating functions

DONGLIN ZENG ¹, D Y LIN ^1,^✉

PMCID: PMC2673016 NIHMSID: NIHMS103685 PMID: 17925303

Summary

We propose a simple and general resampling strategy to estimate variances for parameter estimators derived from nonsmooth estimating functions. This approach applies to a wide variety of semiparametric and nonparametric problems in biostatistics. It does not require solving estimating equations and is thus much faster than the existing resampling procedures. Its usefulness is illustrated with heteroscedastic quantile regression and censored data rank regression. Numerical results based on simulated and real data are provided.

Keywords: Bootstrap, Censoring, Quantile regression, Rank regression, Robustness, Variance estimation

1. Introduction

The parameters of interest in biostatistics are typically estimated by minimizing a loss function or more generally by solving an estimating equation. In many nonparametric and semiparametric situations, such as Huber’s (1964) robust estimation of location (with nonsmooth loss functions), quantile regression, and rank regression, the estimating functions are not differentiable. Then, the asymptotic variances of the parameter estimators generally involve unknown density functions and are thus difficult to estimate directly.

In such situations, it is natural to appeal to resampling techniques. The familiar bootstrap (Efron and Tibshirani, 1993) estimates variances by resampling from the empirical distribution function. This approach needs to be justified on a case-by-case basis and may not be appropriate in complex situations. Parzen and others (1994) proposed a resampling technique by equating the observed data estimating function to a random vector which generates the asymptotic distribution of the estimating function. This technique has been applied to numerous biostatistical problems (e.g. Yao and others, 1998; Chen and Jewell, 2001; Cai and others, 2006). Hu and Kalbfleisch (2000) provided a similar procedure for linear estimating functions with independent terms by bootstrapping the individual terms. For estimators that can be written as minimizers of certain U-statistics, Jin and others (2001) developed a resampling approach by incorporating suitable random variables into the minimand. Their approach was adapted by Jin and others (2003, 2006) to the rank and least squares regression with censored data.

All the aforementioned resampling procedures require solving the perturbed estimating equations or minimizing the perturbed loss functions a large number of times. This is computationally very demanding, especially for complex nonlinear functions. In addition, the perturbed estimating equations or loss functions tend to be associated with extreme solutions and are thus unstable. As a result, nonsmooth estimating functions are rarely used in practice.

In the present paper, we propose a new resampling strategy to estimate asymptotic variances of parameter estimators obtained from general nonsmooth estimating functions. Our approach only requires generation of random numbers and evaluation of estimating functions. It does not involve solving any perturbed estimating equations or minimizing any perturbed objective functions; therefore, it is far more efficient and more stable than the existing resampling methods. With our approach, variance estimation for complex nonsmooth estimating functions can be accomplished in a matter of seconds or minutes rather than hours or days. We describe the proposed approach in Section 2. We present simulation results and medical examples in Sections 3 and 4, respectively. We provide some concluding remarks in Section 5.

2. Methods

Let θ₀ denote a d-vector of parameters. We estimate θ₀ by solving the estimating equation U_n(θ) = 0, where U_n is a function based on n independent observations such that n⁻¹U_n(θ₀) → _p 0. Suppose that the solution θ̂ exists and is consistent. Suppose also that, uniformly in a neighborhood of θ₀,

n^{- 1 / 2} U_{n} (θ) = n^{- 1 / 2} \sum_{i = 1}^{n} S_{i} + A n^{1 / 2} (θ - θ_{0}) + o_{p} (1 + n^{1 / 2} ∥ θ - θ_{0} ∥),

(2.1)

where S_i (i = 1, mldr;, n) are independent zero-mean random vectors, and A is a nonsingular matrix, which is the asymptotic slope of n⁻¹U_n(θ₀). This asymptotic expansion holds for a wide variety of estimating functions and can typically be verified through empirical process arguments (van der Vaart and Wellner, 1996, Section 3.3). The S_i are the influence functions for U_n(θ₀). The dependence of S_i and A on θ₀ is suppressed. Since U_n(θ̂) = 0 and θ̂ is consistent, (2.1) implies that θ̂ is n^1/2-consistent and n^1/2(θ̂ − θ₀) is asymptotically zero-mean normal with covariance matrix A⁻¹V (A⁻¹)^T, where $V = {lim}_{n \to \infty} n^{- 1} \sum_{i = 1}^{n} S_{i} S_{i}^{T}$ . For parametric likelihood, $U_{n} (θ_{0}) = \sum_{i = 1}^{n} S_{i}$ and V= − A, where S_i is the score for the ith observation and A is the negative information matrix.

We give 2 examples.

Example 1 (Heteroscedastic quantile regression)

For i = 1, mldr;, n, let Y_i and X_i denote the response variable and a set of covariates for the ith subject. Assume that the 100τth percentile of Y_i is $α_{0} + β_{0}^{T} X_{i}$ . We may estimate $θ_{0} \equiv {(α_{0}, β_{0}^{T})}^{T}$ by solving the equation

\sum_{i = 1}^{n} {I (Y_{i} - α - β^{T} X_{i} \leq 0) - τ} {(1, X_{i}^{T})}^{T} = 0,

where I (·) is the indicator function. The solution θ̂ can be obtained by minimizing the loss function

\sum_{i = 1}^{n} ρ_{τ} (Y_{i} - α - β^{T} X_{i}),

where ρ_τ (υ) is τυ if υ > 0 and (τ − 1)υ if υ ≤ 0. This minimization can be performed by linear programing (Koenker and D’Orey, 1987). Under the assumption that ( $Y_{i} - α_{0} - β_{0}^{T} X_{i}$ ) has a unique 100τth percentile at 0 and has a continuous density function f_i such that f_i (0) is strictly positive, the estimator θ̂ is consistent and the asymptotic expansion (2.1) holds with $S_{i} = {I (Y_{i} - α_{0} - β_{0}^{T} X_{i} \leq 0) - τ} {(1, X_{i}^{T})}^{T}$ (Jin and others, 2001). The slope matrix A involves the density functions f_i. Buchinsky (1995) compared various bootstrap procedures for estimating the asymptotic covariance matrix of θ̂.

Example 2 (Rank regression with censored data)

Assume that

Y_{i} = β_{0}^{T} X_{i} + ε_{i},

(2.2)

where ε_i (i = 1, mldr;, n) are independent and identically distributed random variables that are independent of X_i (i = 1, mldr;, n). Suppose that Y_i is subject to censoring by C_i. In survival analysis, Y_i and C_i are usually expressed on the log-scale and (2.2) is referred to as the accelerated life or accelerated failure time model (Cox and Oakes, 1984, pp. 64–65; Kalbfleisch and Prentice, 2002, pp. 218–219). The data consist of (Ỹ_i, Δ_i, X_i) (i = 1, mldr;, n), where Ỹ_i = min(Y_i, C_i) and Δ_i = I (Y_i ≤ C_i). It is assumed that C_i is independent of Y_i conditional on X_i. One may estimate β₀ by the log-rank estimating equation

\sum_{i = 1}^{n} Δ_{i} {X_{i} - \frac{\sum_{j = 1}^{n} I ({\tilde{Y}}_{j} - β^{T} X_{j} \geq {\tilde{Y}}_{i} - β^{T} X_{i}) X_{j}}{\sum_{j = 1}^{n} I ({\tilde{Y}}_{j} - β^{T} X_{j} \geq {\tilde{Y}}_{i} - β^{T} X_{i})}} = 0 .

(2.3)

It is not a trivial matter to solve this discrete equation, especially when d is large. One may use bisection search or optimization algorithms, such as simulated annealing (Lin and Geyer, 1992). Recently, Jin and others (2003) showed that linear programing can be used to obtain an approximation to the log-rank estimate. Under mild conditions (Tsiatis, 1990; Ying, 1993), expansion (2.1) holds with

S_{i} = Δ_{i} {X_{i} - \frac{Γ_{1} ({\tilde{Y}}_{i} - β_{0}^{T} X_{i})}{Γ_{0} ({\tilde{Y}}_{i} - β_{0}^{T} X_{i})}} - \int_{- \infty}^{{\tilde{Y}}_{i} - β_{0}^{T} X_{i}} {X_{i} - \frac{Γ_{1} (t)}{Γ_{0} (t)}} d Λ_{0} (t),

where

Γ_{0} (t) = lim_{n \to \infty} n^{- 1} \sum_{i = 1}^{n} I ({\tilde{Y}}_{i} - β_{0}^{T} X_{i} \geq t), Γ_{1} (t) = lim_{n \to \infty} n^{- 1} \sum_{i = 1}^{n} I ({\tilde{Y}}_{i} - β_{0}^{T} X_{i} \geq t) X_{i},

and Λ₀ is the cumulative distribution function of ε_i. In this case, direct estimation of A would require estimation of the hazard function or density function of ε_i.

It is natural to estimate V directly by $\hat{V} \equiv n^{- 1} \sum_{i = 1}^{n} {\hat{S}}_{i} {\hat{S}}_{i}^{T}$ ,where Ŝ_i is obtained from S_i by replacing the unknown quantities by their sample estimators. In Example 1, only θ₀ is unknown; in Example 2, the unknown quantities include β₀, Γ₀(·), Γ₁(·), and Λ₀(·). The consistency of V̂ can typically be established by empirical process arguments.

When the Ŝ_i have complicated expressions, it is more convenient and perhaps more accurate to bootstrap from the data. Let $U_{n}^{*} (θ)$ denote the estimating function based on the bootstrap sample. It follows from (2.1) that

n^{- 1 / 2} U_{n}^{*} (θ) = n^{- 1 / 2} \sum_{i = 1}^{n} M_{i} S_{i} + A n^{1 / 2} (θ - θ_{0}) + o_{p} (1 + n^{1 / 2} ∥ θ - θ_{0} ∥),

where M_i is the number of times the ith observation appears in the bootstrap sample. Since U_n(θ̂) = 0 by definition, we obtain

n^{- 1 / 2} U_{n}^{*} (\hat{θ}) = n^{- 1 / 2} U_{n}^{*} (\hat{θ}) - n^{- 1 / 2} U_{n} (\hat{θ}) = n^{- 1 / 2} \sum_{i = 1}^{n} (M_{i} - 1) S_{i} + o_{p} (1 + n^{1 / 2} ∥ \hat{θ} - θ_{0} ∥) .

By Lemma 3.6.15 of van der Vaart and Wellner (1996), the conditional distribution of $n^{- 1 / 2} U_{n}^{*} (\hat{θ})$ given the data is asymptotically zero-mean normal with covariance matrix V provided that the remainder term in the above display is o_p(1) uniformly in the bootstrap samples. It is straightforward to verify the required condition for Examples 1 and 2. The bootstrap estimator of V is also denoted by V̂.

To avoid nonparametric density estimation, we propose efficient resampling procedures to estimate A and consequently the asymptotic covariance matrix of n^1/2(θ̂ − θ₀). Let θ̃ = θ̂ + n^−1/2Z, where Z is a zero-mean random vector independent of the data. It follows from (2.1) that

n^{- 1 / 2} U_{n} (\tilde{θ}) - n^{- 1 / 2} U_{n} (\hat{θ}) = A n^{1 / 2} (\tilde{θ} - \hat{θ}) + o_{p} (1) .

Since U_n(θ̂) = 0 and θ̃ − θ̂ = n^−1/2Z, we have

n^{- 1 / 2} U_{n} (\tilde{θ}) = A Z + o_{p} (1) .

(2.4)

Thus, we propose the following resampling procedure based on the least squares.

LS method

Step 1

Generate B realizations of Z, denoted by Z₁, mldr;, Z_B.

Step 2

Calculate n^−1/2 U_n(θ̂ + n^−1/2Z_b) (b = 1, mldr;, B).

Step 3

For j = 1, mldr;, d, calculate the least squares estimate of n^−1/2 U_jn(θ̂ + n^−1/2Z_b) (b = 1, mldr;, B) on Z_b (b = 1, mldr;, B), where U_jn denotes the jth component of U_n. Let Â be the matrix whose jth row is the jth least squares estimate.

Step 4

Estimate the covariance matrix of n^1/2(θ̂ − θ₀) by Â⁻¹V̂ (Â⁻¹)^T.

In many situations, A is symmetric, in which case a simpler resampling procedure can be obtained. If the covariance matrix of Z is V ⁻¹, then (2.4) implies that Cov (n^−1/2 U_n(θ̃)|data) = AV⁻¹ A^T + o_p(1). The inverse of this covariance matrix is equal to A⁻¹V (A⁻¹)^T when A is symmetric. Thus, we propose the following resampling procedure based on the sample variance of n^−1/2U_n(θ̃).

SV method

Step 1

Generate θ̃_b ≡ θ̂+ n^−1/2Z_b (b = 1, mldr;, B), where Z_b is a zero-mean random vector with covariance matrix V̂⁻¹.

Step 2

Calculate the sample covariance matrix of n^−1/2U_n(θ̃_b) (b = 1, mldr;, B) and denote it by Σ̂.

Step 3

Estimate the covariance matrix of n^1/2(θ̂ − θ₀) by Σ̂⁻¹.

Unlike the existing resampling methods, the least squares (LS) and sample variance (SV) methods do not require solving estimating equations. This is an important advantage since it is computationally intensive to solve complex nonsmooth estimating equations. Although we have suggested the possible use of bootstrap to estimate V, that procedure is different from bootstrap estimation of the variance of θ̂ and does not involve solving equations.

3. Simulation studies

We conducted extensive simulation studies to assess the performance of the proposed resampling methods. For both the LS and the SV methods, we estimated V either by direct evaluation or by bootstrap. We set Z to V̂^−1/2Z^*, where Z^* is either a d-variate standard normal random vector or a d-vector of independent centerd Bernoulli random variables with equal probabilities at −1 and 1. Thus, 8 different variants of the methods were considered.

The first set of studies mimics the simulation studies on median regression reported in Section 3 of Parzen and others (1994). We generated data from the model Y_i = X₁_i + X₂_i + ε_i, where X₁_i and X₂_i are independent standard normal and Bernoulli with 0.5 success probability, respectively, and ε_i is normal with mean 0 and variance |X₁_i|. We obtained the parameter estimates through linear programing.

The second set of studies is similar to those of Jin and others (2003). We generated survival times from model (2.2) in which X₁ and X₂ are independent Uniform(0, 1) variable and Bernoulli variable with 0.5 success probability, β₀ = (1, −1)^T, and the error distribution is either extreme-value or zero-mean normal with standard deviation 0.5. We generated censoring times from a uniform distribution to yield a censoring rate of 25%. We obtained the log-rank estimates through bisection search.

The results from the above 2 sets of studies are summarized in Tables 1 and 2. The results of Table 1 pertain to the continuous covariate. Each entry in the tables is based on 10 000 simulated data sets and B = 10 000. Clearly, all 8 variants of the resampling methods work well in that the variance estimators accurately reflect the true variations and the associated confidence intervals have proper coverage probabilities. There are virtually no differences between the LS and SV methods or between the direct and bootstrap estimation of V. For the rank regression under the normal error distribution, the Bernoulli sampling appears to be slightly better than the normal sampling. For median regression, the new resampling method is approximately 100 times faster than bootstrap (with 10 000 resamples); for rank regression, it is approximately 1000 times faster.

Table 1.

Simulation results for heteroscedastic median regression

n	Bias	SE		LS method				SV method
				Normal Z		Bernoulli Z		Normal Z		Bernoulli Z
				SEE	CP	SEE	CP	SEE	CP	SEE	CP
50	−0.000	0.209	V̂₁	0.228	0.957	0.225	0.948	0.215	0.948	0.216	0.943
			V̂₂	0.228	0.957	0.224	0.947	0.215	0.948	0.216	0.942
100	0.001	0.147	V̂₁	0.152	0.946	0.151	0.937	0.147	0.937	0.147	0.932
			V̂₂	0.152	0.945	0.151	0.937	0.146	0.937	0.147	0.932
200	−0.000	0.102	V̂₁	0.105	0.947	0.102	0.943	0.102	0.941	0.102	0.939
			V̂₂	0.105	0.947	0.104	0.942	0.102	0.941	0.102	0.939

Open in a new tab

Note: Bias and SE are the bias and standard error of the parameter estimator, respectively; SEE and CP denote the mean of the standard error estimator and the coverage probability of the 95% confidence interval, respectively; V̂₁ and V̂₂ denote the direct estimation of V and the bootstrap estimation of V, respectively.

Table 2.

Simulation results for rank regression with censored data

n		Bias	SE		LS method				SV method
					Normal Z		Bernoulli Z		Normal Z		Bernoulli Z
					SEE	CP	SEE	CP	SEE	CP	SEE	CP
Extreme-value error
100	β₁	0.010	0.428	V̂₁	0.428	0.949	0.419	0.941	0.426	0.948	0.419	0.941
				V̂₂	0.423	0.947	0.415	0.938	0.421	0.945	0.415	0.938
	β₂	−0.003	0.248	V̂₁	0.250	0.948	0.245	0.941	0.248	0.947	0.245	0.941
				V̂₂	0.247	0.945	0.243	0.939	0.246	0.943	0.242	0.939
200	β₁	0.000	0.295	V̂₁	0.295	0.948	0.292	0.944	0.295	0.947	0.295	0.944
				V̂₂	0.293	0.946	0.290	0.943	0.293	0.946	0.290	0.943
	β₂	−0.002	0.170	V̂₁	0.173	0.953	0.171	0.950	0.170	0.952	0.171	0.950
				V̂₂	0.172	0.951	0.170	0.949	0.171	0.951	0.170	0.949
Normal error
100	β₁	0.005	0.217	V̂₁	0.237	0.963	0.225	0.951	0.235	0.962	0.225	0.951
				V̂₂	0.235	0.962	0.222	0.949	0.233	0.961	0.222	0.949
	β₂	−0.001	0.126	V̂₁	0.138	0.966	0.130	0.956	0.137	0.965	0.130	0.956
				V̂₂	0.136	0.964	0.129	0.954	0.135	0.964	0.129	0.953
200	β₁	0.004	0.153	V̂₁	0.160	0.956	0.155	0.950	0.159	0.956	0.155	0.949
				V̂₂	0.158	0.955	0.154	0.948	0.158	0.955	0.154	0.948
	β₂	−0.001	0.086	V̂₁	0.090	0.961	0.090	0.957	0.092	0.960	0.090	0.957
				V̂₂	0.092	0.960	0.089	0.956	0.091	0.960	0.089	0.956

Open in a new tab

Note: see the note to Table 1.

4. Applications

4.1 Multiple myeloma study

We applied the proposed resampling methods to a multiple myeloma study (Krall and others, 1975). Out of the 65 patients who were treated with alkylating agents, 48 died during the study. Following Jin and others (2003), we fitted model (142.2) with hemoglobin and the logarithm of blood urea nitrogen as the covariates by using both the log-rank and the Gehan estimators. The Gehan estimator is obtained by incorporating the weight function $n^{- 1} \sum_{j = 1}^{n} I ({\tilde{Y}}_{j} - β^{T} X_{j} \geq {\tilde{Y}}_{i} - β^{T} X_{i})$ into (2.3). We considered the 8 variants of the resampling methods evaluated in the simulation studies. The differences are negligible between the LS and the SV methods and between the direct and the bootstrap methods of estimating V.

The results based on the SV method and direct evaluation of V are shown in Table 3. These results are comparable to those of Jin and others (2003) but were obtained with much less time.

Table 3.

Rank regression analysis of the myeloma data

Covariate	Estimate	Normal Z		Bernoulli Z
		Standard error	95% interval	Standard error	95% interval
Hemoglobin
Log-rank	0.268	0.164	(−0.055, 0.587)	0.158	(−0.044, 0.576)
Gehan	0.292	0.183	(−0.067, 0.651)	0.176	(−0.054, 0.638)
Blood urea nitrogen
Log-rank	−0.505	0.162	(−0.827, −0.191)	0.161	(−0.825, −0.193)
Gehan	−0.532	0.154	(−0.834, −0.230)	0.149	(−0.823, −0.241)

Open in a new tab

4.2 Atherosclerosis Risk in Communities Study

We also applied our methods to the Atherosclerosis Risk in Communities Study (The ARIC Investigators, 1989), which is an epidemiologic cohort study of 15 792 subjects aged 45–64 years to investigate the etiology of atherosclerosis and other diseases. We considered all incident coronary heart disease (CHD) cases occurring between 1987 and 2001. We focused on the Caucasian sample, which consists of 11 526 subjects with 774 cases. We used model (2.2) to study the effects of 5 covariates, including smoking status (ever smoke = 1, never smoke = 0), 2 dummy variables contrasting Minnesota and Washington states to North Carolina, gender (male = 1, female = 0), and standardized age at the baseline, on the time to the occurrence of CHD. For large data sets such as this one, the methods of Jin and others (2003, 2006) are not computationally feasible. We used the Nelder–Mead algorithm as implemented in MATLAB to calculate the log-rank and Buckley–James estimates. The results based on the LS and SV methods with direct evaluation of V and 10 000 normal random samples are displayed in Table 4. For comparison, we also report the results of the method of Parzen and others (1994) with B = 10 000. The standard error estimates are very similar between the LS and the SV methods, whereas those of the method of Parzen and others tend to be slightly larger. The larger standard error estimates by the method of Parzen and others are likely due to the unstabilities of the perturbed estimating equations. Indeed, the method of Parzen and others produced 7 extreme estimates in the Buckley–James estimation of the gender effect, which were excluded in the standard error calculations. For the new resampling approach, it took approximately 1 and 3 min on an IBM BladeCenter HS20 machine to estimate the standard errors for the log-rank and Buckley–James estimators, respectively, whereas the method of Parzen and others consumed 10 and 24 h, respectively.

Table 4.

Accelerated failure time regression for the Atherosclerosis Risk in Communities data

Covariate	Estimate	Standard error estimate
		LS	SV	Parzen
Smoking status
Log-rank	−0.411	0.060	0.060	0.060
Buckely–James	−0.363	0.087	0.090	0.092
Minnesota
Log-rank	0.121	0.064	0.064	0.065
Buckely–James	0.093	0.065	0.065	0.068
Washington
Log-rank	−0.165	0.061	0.061	0.062
Buckely–James	−0.147	0.067	0.067	0.070
Age
Log-rank	−0.292	0.028	0.028	0.028
Buckely–James	−0.264	0.055	0.054	0.058
Gender
Log-rank	−0.893	0.065	0.065	0.067
Buckely–James	−0.842	0.176	0.172	0.195

Open in a new tab

5. Discussion

The existing resampling methods require solving estimating equations or minimizing loss functions repeatedly, whereas the proposed methods only involve the evaluation of estimating functions. In complex situations, such as rank regression and least squares regression with censored data, the amount of time required to evaluate an estimating function is negligible as compared to solving the corresponding estimating equation. Then, the proposed methods are orders of magnitude faster than the existing resampling methods. Despite the continuing improvement in computer power, this degree of saving is very important, especially for large data sets and for simulation studies. Adopting the proposed resampling procedures will not only enhance the utilities of many existing nonparametric and semiparametric estimators but also facilitate the development and evaluation of new methods for complex biostatistical problems.

The approach of Hu and Kalbfleisch (2000) does not require solving estimating equations repeatedly in order to construct confidence intervals but requires to do so for estimating the variances of parameter estimators. It is restricted to linear estimating functions with independent terms and thus would be applicable to quantile regression, but not to rank regression or Buckley–James estimation.

Our method can be viewed as a version of Monte Carlo numerical differentiation. In contrast to the usual numerical differentiation that uses fixed step sizes, the new method generates random step sizes Z, exploring a broad range of step sizes and producing stable estimates. Numerical results indicate that our method is not sensitive to the choice of the distribution of Z.

The proposed methods have very broad applications and are particularly applicable to the situations in which the method of Parzen and others has been used. We have focused our attention on nonsmooth estimating functions. In some situations, the estimating functions are differentiable, but the derivatives are difficult to calculate. Then, the proposed resampling methods would also be appealing.

The results of Section 2 continue to hold if (2.1) is replaced by the more general expansion

n^{- 1 / 2} U_{n} (θ) = G + A n^{1 / 2} (θ - θ_{0}) + o_{p} (1 + n^{1 / 2} ∥ θ - θ_{0} ∥),

where G is a zero-mean random vector whose covariance matrix can be consistently estimated. Thus, the proposed resampling methods can be applied to multivariate responses, biased sampling, and time series data among others. Indeed, the n^1/2 convergence rate is not essential. Furthermore, our approach can potentially be extended to semiparametric situations in which infinite-dimensional parameters are part of θ.

Acknowledgments

The authors thank the reviewers for helpful comments.

Funding

National Institutes of Health.

Footnotes

Conflict of Interest: None declared.

References

Buchinsky M. Estimating the asymptotic covariance-matrix for quantile regression-models—a Monte Carlo study. Journal of Econometrics. 1995;68:303–338. [Google Scholar]
Cai TX, Pepe MS, Zheng YY, Lumley T, Jenny NS. The sensitivity and specificity of markers for event times. Biostatistics. 2006;7:182–197. doi: 10.1093/biostatistics/kxi047. [DOI] [PubMed] [Google Scholar]
Chen YQ, Jewell NP. On a general class of semiparametric hazards regression models. Biometrika. 2001;88:687–702. [Google Scholar]
Cox DR, Oakes D. Analysis of Survival Data. London: Chapman and Hall; 1984. [Google Scholar]
Efron B, Tibshirani RJ. An Introduction to the Bootstrap. New York: Chapman and Hall; 1993. [Google Scholar]
Hu F, Kalbfleisch JD. The estimating function bootstrap (with discussion) Canadian Journal of Statistics. 2000;28:449–499. [Google Scholar]
Huber PJ. Robust estimation of a location parameter. The Annals of Mathematical Statistics. 1964;35:73–101. [Google Scholar]
Jin Z, Lin DY, Wei LJ, Ying Z. Rank-based inference for the accelerated failure time model. Biometrika. 2003;90:341–353. [Google Scholar]
Jin Z, Lin DY, Ying Z. On the least squares regression with censored data. Biometrika. 2006;93:147–162. [Google Scholar]
Jin Z, Ying Z, Wei LJ. A simple resampling method by perturbing the minimand. Biometrika. 2001;88:381–390. [Google Scholar]
Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. 2. Hoboken, NJ: Wiley; 2002. [Google Scholar]
Koenker R, D’Orey V. Computing regression quantiles. Applied Statistics. 1987;36:383–393. [Google Scholar]
Krall JM, Uthoff VA, Harley JB. A step-up procedure for selecting variables associated with survival. Biometrics. 1975;31:49–57. [PubMed] [Google Scholar]
Lin DY, Geyer CJ. Computational methods for semiparametric linear regression with censored data. Journal of Computational and Graphical Statistics. 1992;1:77–90. [Google Scholar]
Parzen MI, Wei LJ, Ying Z. A resampling method based on pivotal estimating functions. Biometrika. 1994;81:341–350. [Google Scholar]
The ARIC Investigators. The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives. American Journal of Epidemiology. 1989;129:687–702. [PubMed] [Google Scholar]
Tsiatis AA. Estimating regression parameters using linear rank tests for censored data. The Annals of Statistics. 1990;18:354–372. [Google Scholar]
van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. New York: Springer; 1996. [Google Scholar]
Yao Q, Wei LJ, Hogan JW. Analysis of incomplete repeated measurements with dependent censoring times. Biometrika. 1998;85:139–149. [Google Scholar]
Ying Z. A large sample study of rank estimation for censored regression data. The Annals of Statistics. 1993;21:76–99. [Google Scholar]

[R1] Buchinsky M. Estimating the asymptotic covariance-matrix for quantile regression-models—a Monte Carlo study. Journal of Econometrics. 1995;68:303–338. [Google Scholar]

[R2] Cai TX, Pepe MS, Zheng YY, Lumley T, Jenny NS. The sensitivity and specificity of markers for event times. Biostatistics. 2006;7:182–197. doi: 10.1093/biostatistics/kxi047. [DOI] [PubMed] [Google Scholar]

[R3] Chen YQ, Jewell NP. On a general class of semiparametric hazards regression models. Biometrika. 2001;88:687–702. [Google Scholar]

[R4] Cox DR, Oakes D. Analysis of Survival Data. London: Chapman and Hall; 1984. [Google Scholar]

[R5] Efron B, Tibshirani RJ. An Introduction to the Bootstrap. New York: Chapman and Hall; 1993. [Google Scholar]

[R6] Hu F, Kalbfleisch JD. The estimating function bootstrap (with discussion) Canadian Journal of Statistics. 2000;28:449–499. [Google Scholar]

[R7] Huber PJ. Robust estimation of a location parameter. The Annals of Mathematical Statistics. 1964;35:73–101. [Google Scholar]

[R8] Jin Z, Lin DY, Wei LJ, Ying Z. Rank-based inference for the accelerated failure time model. Biometrika. 2003;90:341–353. [Google Scholar]

[R9] Jin Z, Lin DY, Ying Z. On the least squares regression with censored data. Biometrika. 2006;93:147–162. [Google Scholar]

[R10] Jin Z, Ying Z, Wei LJ. A simple resampling method by perturbing the minimand. Biometrika. 2001;88:381–390. [Google Scholar]

[R11] Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. 2. Hoboken, NJ: Wiley; 2002. [Google Scholar]

[R12] Koenker R, D’Orey V. Computing regression quantiles. Applied Statistics. 1987;36:383–393. [Google Scholar]

[R13] Krall JM, Uthoff VA, Harley JB. A step-up procedure for selecting variables associated with survival. Biometrics. 1975;31:49–57. [PubMed] [Google Scholar]

[R14] Lin DY, Geyer CJ. Computational methods for semiparametric linear regression with censored data. Journal of Computational and Graphical Statistics. 1992;1:77–90. [Google Scholar]

[R15] Parzen MI, Wei LJ, Ying Z. A resampling method based on pivotal estimating functions. Biometrika. 1994;81:341–350. [Google Scholar]

[R16] The ARIC Investigators. The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives. American Journal of Epidemiology. 1989;129:687–702. [PubMed] [Google Scholar]

[R17] Tsiatis AA. Estimating regression parameters using linear rank tests for censored data. The Annals of Statistics. 1990;18:354–372. [Google Scholar]

[R18] van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. New York: Springer; 1996. [Google Scholar]

[R19] Yao Q, Wei LJ, Hogan JW. Analysis of incomplete repeated measurements with dependent censoring times. Biometrika. 1998;85:139–149. [Google Scholar]

[R20] Ying Z. A large sample study of rank estimation for censored regression data. The Annals of Statistics. 1993;21:76–99. [Google Scholar]

PERMALINK

Efficient resampling methods for nonsmooth estimating functions

DONGLIN ZENG

D Y LIN

Summary

1. Introduction

2. Methods

Example 1 (Heteroscedastic quantile regression)

Example 2 (Rank regression with censored data)

LS method

Step 1

Step 2

Step 3

Step 4

SV method

Step 1

Step 2

Step 3

3. Simulation studies

Table 1.

Table 2.

4. Applications

4.1 Multiple myeloma study

Table 3.

4.2 Atherosclerosis Risk in Communities Study

Table 4.

5. Discussion

Acknowledgments

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Efficient resampling methods for nonsmooth estimating functions

DONGLIN ZENG

D Y LIN

Summary

1. Introduction

2. Methods

Example 1 (Heteroscedastic quantile regression)

Example 2 (Rank regression with censored data)

LS method

Step 1

Step 2

Step 3

Step 4

SV method

Step 1

Step 2

Step 3

3. Simulation studies

Table 1.

Table 2.

4. Applications

4.1 Multiple myeloma study

Table 3.

4.2 Atherosclerosis Risk in Communities Study

Table 4.

5. Discussion

Acknowledgments

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases