Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2009 Apr 24.
Published in final edited form as: Biostatistics. 2007 Oct 8;9(2):355–363. doi: 10.1093/biostatistics/kxm034

Efficient resampling methods for nonsmooth estimating functions

DONGLIN ZENG 1, D Y LIN 1,
PMCID: PMC2673016  NIHMSID: NIHMS103685  PMID: 17925303

Summary

We propose a simple and general resampling strategy to estimate variances for parameter estimators derived from nonsmooth estimating functions. This approach applies to a wide variety of semiparametric and nonparametric problems in biostatistics. It does not require solving estimating equations and is thus much faster than the existing resampling procedures. Its usefulness is illustrated with heteroscedastic quantile regression and censored data rank regression. Numerical results based on simulated and real data are provided.

Keywords: Bootstrap, Censoring, Quantile regression, Rank regression, Robustness, Variance estimation

1. Introduction

The parameters of interest in biostatistics are typically estimated by minimizing a loss function or more generally by solving an estimating equation. In many nonparametric and semiparametric situations, such as Huber’s (1964) robust estimation of location (with nonsmooth loss functions), quantile regression, and rank regression, the estimating functions are not differentiable. Then, the asymptotic variances of the parameter estimators generally involve unknown density functions and are thus difficult to estimate directly.

In such situations, it is natural to appeal to resampling techniques. The familiar bootstrap (Efron and Tibshirani, 1993) estimates variances by resampling from the empirical distribution function. This approach needs to be justified on a case-by-case basis and may not be appropriate in complex situations. Parzen and others (1994) proposed a resampling technique by equating the observed data estimating function to a random vector which generates the asymptotic distribution of the estimating function. This technique has been applied to numerous biostatistical problems (e.g. Yao and others, 1998; Chen and Jewell, 2001; Cai and others, 2006). Hu and Kalbfleisch (2000) provided a similar procedure for linear estimating functions with independent terms by bootstrapping the individual terms. For estimators that can be written as minimizers of certain U-statistics, Jin and others (2001) developed a resampling approach by incorporating suitable random variables into the minimand. Their approach was adapted by Jin and others (2003, 2006) to the rank and least squares regression with censored data.

All the aforementioned resampling procedures require solving the perturbed estimating equations or minimizing the perturbed loss functions a large number of times. This is computationally very demanding, especially for complex nonlinear functions. In addition, the perturbed estimating equations or loss functions tend to be associated with extreme solutions and are thus unstable. As a result, nonsmooth estimating functions are rarely used in practice.

In the present paper, we propose a new resampling strategy to estimate asymptotic variances of parameter estimators obtained from general nonsmooth estimating functions. Our approach only requires generation of random numbers and evaluation of estimating functions. It does not involve solving any perturbed estimating equations or minimizing any perturbed objective functions; therefore, it is far more efficient and more stable than the existing resampling methods. With our approach, variance estimation for complex nonsmooth estimating functions can be accomplished in a matter of seconds or minutes rather than hours or days. We describe the proposed approach in Section 2. We present simulation results and medical examples in Sections 3 and 4, respectively. We provide some concluding remarks in Section 5.

2. Methods

Let θ0 denote a d-vector of parameters. We estimate θ0 by solving the estimating equation Un(θ) = 0, where Un is a function based on n independent observations such that n−1Un(θ0) → p 0. Suppose that the solution θ̂ exists and is consistent. Suppose also that, uniformly in a neighborhood of θ0,

n1/2Un(θ)=n1/2i=1nSi+An1/2(θθ0)+op(1+n1/2θθ0), (2.1)

where Si (i = 1, mldr;, n) are independent zero-mean random vectors, and A is a nonsingular matrix, which is the asymptotic slope of n−1Un(θ0). This asymptotic expansion holds for a wide variety of estimating functions and can typically be verified through empirical process arguments (van der Vaart and Wellner, 1996, Section 3.3). The Si are the influence functions for Un(θ0). The dependence of Si and A on θ0 is suppressed. Since Un(θ̂) = 0 and θ̂ is consistent, (2.1) implies that θ̂ is n1/2-consistent and n1/2(θ̂θ0) is asymptotically zero-mean normal with covariance matrix A−1V (A−1)T, where V=limnn1i=1nSiSiT. For parametric likelihood, Un(θ0)=i=1nSi and V= − A, where Si is the score for the ith observation and A is the negative information matrix.

We give 2 examples.

Example 1 (Heteroscedastic quantile regression)

For i = 1, mldr;, n, let Yi and Xi denote the response variable and a set of covariates for the ith subject. Assume that the 100τth percentile of Yi is α0+β0TXi. We may estimate θ0(α0,β0T)T by solving the equation

i=1n{I(YiαβTXi0)τ}(1,XiT)T=0,

where I (·) is the indicator function. The solution θ̂ can be obtained by minimizing the loss function

i=1nρτ(YiαβTXi),

where ρτ (υ) is τυ if υ > 0 and (τ − 1)υ if υ ≤ 0. This minimization can be performed by linear programing (Koenker and D’Orey, 1987). Under the assumption that ( Yiα0β0TXi) has a unique 100τth percentile at 0 and has a continuous density function fi such that fi (0) is strictly positive, the estimator θ̂ is consistent and the asymptotic expansion (2.1) holds with Si={I(Yiα0β0TXi0)τ}(1,XiT)T (Jin and others, 2001). The slope matrix A involves the density functions fi. Buchinsky (1995) compared various bootstrap procedures for estimating the asymptotic covariance matrix of θ̂.

Example 2 (Rank regression with censored data)

Assume that

Yi=β0TXi+εi, (2.2)

where εi (i = 1, mldr;, n) are independent and identically distributed random variables that are independent of Xi (i = 1, mldr;, n). Suppose that Yi is subject to censoring by Ci. In survival analysis, Yi and Ci are usually expressed on the log-scale and (2.2) is referred to as the accelerated life or accelerated failure time model (Cox and Oakes, 1984, pp. 64–65; Kalbfleisch and Prentice, 2002, pp. 218–219). The data consist of (i, Δi, Xi) (i = 1, mldr;, n), where i = min(Yi, Ci) and Δi = I (YiCi). It is assumed that Ci is independent of Yi conditional on Xi. One may estimate β0 by the log-rank estimating equation

i=1nΔi{Xij=1nI(YjβTXjYiβTXi)Xjj=1nI(YjβTXjYiβTXi)}=0. (2.3)

It is not a trivial matter to solve this discrete equation, especially when d is large. One may use bisection search or optimization algorithms, such as simulated annealing (Lin and Geyer, 1992). Recently, Jin and others (2003) showed that linear programing can be used to obtain an approximation to the log-rank estimate. Under mild conditions (Tsiatis, 1990; Ying, 1993), expansion (2.1) holds with

Si=Δi{XiΓ1(Yiβ0TXi)Γ0(Yiβ0TXi)}Yiβ0TXi{XiΓ1(t)Γ0(t)}dΛ0(t),

where

Γ0(t)=limnn1i=1nI(Yiβ0TXit),Γ1(t)=limnn1i=1nI(Yiβ0TXit)Xi,

and Λ0 is the cumulative distribution function of εi. In this case, direct estimation of A would require estimation of the hazard function or density function of εi.

It is natural to estimate V directly by V^n1i=1nS^iS^iT ,where Ŝi is obtained from Si by replacing the unknown quantities by their sample estimators. In Example 1, only θ0 is unknown; in Example 2, the unknown quantities include β0, Γ0(·), Γ1(·), and Λ0(·). The consistency of can typically be established by empirical process arguments.

When the Ŝi have complicated expressions, it is more convenient and perhaps more accurate to bootstrap from the data. Let Un(θ) denote the estimating function based on the bootstrap sample. It follows from (2.1) that

n1/2Un(θ)=n1/2i=1nMiSi+An1/2(θθ0)+op(1+n1/2θθ0),

where Mi is the number of times the ith observation appears in the bootstrap sample. Since Un(θ̂) = 0 by definition, we obtain

n1/2Un(θ^)=n1/2Un(θ^)n1/2Un(θ^)=n1/2i=1n(Mi1)Si+op(1+n1/2θ^θ0).

By Lemma 3.6.15 of van der Vaart and Wellner (1996), the conditional distribution of n1/2Un(θ^) given the data is asymptotically zero-mean normal with covariance matrix V provided that the remainder term in the above display is op(1) uniformly in the bootstrap samples. It is straightforward to verify the required condition for Examples 1 and 2. The bootstrap estimator of V is also denoted by .

To avoid nonparametric density estimation, we propose efficient resampling procedures to estimate A and consequently the asymptotic covariance matrix of n1/2(θ̂θ0). Let θ̃ = θ̂ + n−1/2Z, where Z is a zero-mean random vector independent of the data. It follows from (2.1) that

n1/2Un(θ)n1/2Un(θ^)=An1/2(θθ^)+op(1).

Since Un(θ̂) = 0 and θ̃θ̂ = n−1/2Z, we have

n1/2Un(θ)=AZ+op(1). (2.4)

Thus, we propose the following resampling procedure based on the least squares.

LS method

Step 1

Generate B realizations of Z, denoted by Z1, mldr;, ZB.

Step 2

Calculate n−1/2 Un(θ̂ + n−1/2Zb) (b = 1, mldr;, B).

Step 3

For j = 1, mldr;, d, calculate the least squares estimate of n−1/2 Ujn(θ̂ + n−1/2Zb) (b = 1, mldr;, B) on Zb (b = 1, mldr;, B), where Ujn denotes the jth component of Un. Let  be the matrix whose jth row is the jth least squares estimate.

Step 4

Estimate the covariance matrix of n1/2(θ̂θ0) by Â−1 (Â−1)T.

In many situations, A is symmetric, in which case a simpler resampling procedure can be obtained. If the covariance matrix of Z is V −1, then (2.4) implies that Cov (n−1/2 Un(θ̃)|data) = AV−1 AT + op(1). The inverse of this covariance matrix is equal to A−1V (A−1)T when A is symmetric. Thus, we propose the following resampling procedure based on the sample variance of n−1/2Un(θ̃).

SV method

Step 1

Generate θ̃bθ̂+ n−1/2Zb (b = 1, mldr;, B), where Zb is a zero-mean random vector with covariance matrix −1.

Step 2

Calculate the sample covariance matrix of n−1/2Un(θ̃b) (b = 1, mldr;, B) and denote it by Σ̂.

Step 3

Estimate the covariance matrix of n1/2(θ̂θ0) by Σ̂−1.

Unlike the existing resampling methods, the least squares (LS) and sample variance (SV) methods do not require solving estimating equations. This is an important advantage since it is computationally intensive to solve complex nonsmooth estimating equations. Although we have suggested the possible use of bootstrap to estimate V, that procedure is different from bootstrap estimation of the variance of θ̂ and does not involve solving equations.

3. Simulation studies

We conducted extensive simulation studies to assess the performance of the proposed resampling methods. For both the LS and the SV methods, we estimated V either by direct evaluation or by bootstrap. We set Z to −1/2Z*, where Z* is either a d-variate standard normal random vector or a d-vector of independent centerd Bernoulli random variables with equal probabilities at −1 and 1. Thus, 8 different variants of the methods were considered.

The first set of studies mimics the simulation studies on median regression reported in Section 3 of Parzen and others (1994). We generated data from the model Yi = X1i + X2i + εi, where X1i and X2i are independent standard normal and Bernoulli with 0.5 success probability, respectively, and εi is normal with mean 0 and variance |X1i|. We obtained the parameter estimates through linear programing.

The second set of studies is similar to those of Jin and others (2003). We generated survival times from model (2.2) in which X1 and X2 are independent Uniform(0, 1) variable and Bernoulli variable with 0.5 success probability, β0 = (1, −1)T, and the error distribution is either extreme-value or zero-mean normal with standard deviation 0.5. We generated censoring times from a uniform distribution to yield a censoring rate of 25%. We obtained the log-rank estimates through bisection search.

The results from the above 2 sets of studies are summarized in Tables 1 and 2. The results of Table 1 pertain to the continuous covariate. Each entry in the tables is based on 10 000 simulated data sets and B = 10 000. Clearly, all 8 variants of the resampling methods work well in that the variance estimators accurately reflect the true variations and the associated confidence intervals have proper coverage probabilities. There are virtually no differences between the LS and SV methods or between the direct and bootstrap estimation of V. For the rank regression under the normal error distribution, the Bernoulli sampling appears to be slightly better than the normal sampling. For median regression, the new resampling method is approximately 100 times faster than bootstrap (with 10 000 resamples); for rank regression, it is approximately 1000 times faster.

Table 1.

Simulation results for heteroscedastic median regression

n Bias SE LS method
SV method
Normal Z
Bernoulli Z
Normal Z
Bernoulli Z
SEE CP SEE CP SEE CP SEE CP
50 −0.000 0.209 1 0.228 0.957 0.225 0.948 0.215 0.948 0.216 0.943
2 0.228 0.957 0.224 0.947 0.215 0.948 0.216 0.942
100 0.001 0.147 1 0.152 0.946 0.151 0.937 0.147 0.937 0.147 0.932
2 0.152 0.945 0.151 0.937 0.146 0.937 0.147 0.932
200 −0.000 0.102 1 0.105 0.947 0.102 0.943 0.102 0.941 0.102 0.939
2 0.105 0.947 0.104 0.942 0.102 0.941 0.102 0.939

Note: Bias and SE are the bias and standard error of the parameter estimator, respectively; SEE and CP denote the mean of the standard error estimator and the coverage probability of the 95% confidence interval, respectively; 1 and 2 denote the direct estimation of V and the bootstrap estimation of V, respectively.

Table 2.

Simulation results for rank regression with censored data

n Bias SE LS method
SV method
Normal Z
Bernoulli Z
Normal Z
Bernoulli Z
SEE CP SEE CP SEE CP SEE CP
Extreme-value error
 100 β1 0.010 0.428 1 0.428 0.949 0.419 0.941 0.426 0.948 0.419 0.941
2 0.423 0.947 0.415 0.938 0.421 0.945 0.415 0.938
β2 −0.003 0.248 1 0.250 0.948 0.245 0.941 0.248 0.947 0.245 0.941
2 0.247 0.945 0.243 0.939 0.246 0.943 0.242 0.939
 200 β1 0.000 0.295 1 0.295 0.948 0.292 0.944 0.295 0.947 0.295 0.944
2 0.293 0.946 0.290 0.943 0.293 0.946 0.290 0.943
β2 −0.002 0.170 1 0.173 0.953 0.171 0.950 0.170 0.952 0.171 0.950
2 0.172 0.951 0.170 0.949 0.171 0.951 0.170 0.949
Normal error
 100 β1 0.005 0.217 1 0.237 0.963 0.225 0.951 0.235 0.962 0.225 0.951
2 0.235 0.962 0.222 0.949 0.233 0.961 0.222 0.949
β2 −0.001 0.126 1 0.138 0.966 0.130 0.956 0.137 0.965 0.130 0.956
2 0.136 0.964 0.129 0.954 0.135 0.964 0.129 0.953
 200 β1 0.004 0.153 1 0.160 0.956 0.155 0.950 0.159 0.956 0.155 0.949
2 0.158 0.955 0.154 0.948 0.158 0.955 0.154 0.948
β2 −0.001 0.086 1 0.090 0.961 0.090 0.957 0.092 0.960 0.090 0.957
2 0.092 0.960 0.089 0.956 0.091 0.960 0.089 0.956

Note: see the note to Table 1.

4. Applications

4.1 Multiple myeloma study

We applied the proposed resampling methods to a multiple myeloma study (Krall and others, 1975). Out of the 65 patients who were treated with alkylating agents, 48 died during the study. Following Jin and others (2003), we fitted model (142.2) with hemoglobin and the logarithm of blood urea nitrogen as the covariates by using both the log-rank and the Gehan estimators. The Gehan estimator is obtained by incorporating the weight function n1j=1nI(YjβTXjYiβTXi) into (2.3). We considered the 8 variants of the resampling methods evaluated in the simulation studies. The differences are negligible between the LS and the SV methods and between the direct and the bootstrap methods of estimating V.

The results based on the SV method and direct evaluation of V are shown in Table 3. These results are comparable to those of Jin and others (2003) but were obtained with much less time.

Table 3.

Rank regression analysis of the myeloma data

Covariate Estimate Normal Z
Bernoulli Z
Standard error 95% interval Standard error 95% interval
Hemoglobin
 Log-rank 0.268 0.164 (−0.055, 0.587) 0.158 (−0.044, 0.576)
 Gehan 0.292 0.183 (−0.067, 0.651) 0.176 (−0.054, 0.638)
Blood urea nitrogen
 Log-rank −0.505 0.162 (−0.827, −0.191) 0.161 (−0.825, −0.193)
 Gehan −0.532 0.154 (−0.834, −0.230) 0.149 (−0.823, −0.241)

4.2 Atherosclerosis Risk in Communities Study

We also applied our methods to the Atherosclerosis Risk in Communities Study (The ARIC Investigators, 1989), which is an epidemiologic cohort study of 15 792 subjects aged 45–64 years to investigate the etiology of atherosclerosis and other diseases. We considered all incident coronary heart disease (CHD) cases occurring between 1987 and 2001. We focused on the Caucasian sample, which consists of 11 526 subjects with 774 cases. We used model (2.2) to study the effects of 5 covariates, including smoking status (ever smoke = 1, never smoke = 0), 2 dummy variables contrasting Minnesota and Washington states to North Carolina, gender (male = 1, female = 0), and standardized age at the baseline, on the time to the occurrence of CHD. For large data sets such as this one, the methods of Jin and others (2003, 2006) are not computationally feasible. We used the Nelder–Mead algorithm as implemented in MATLAB to calculate the log-rank and Buckley–James estimates. The results based on the LS and SV methods with direct evaluation of V and 10 000 normal random samples are displayed in Table 4. For comparison, we also report the results of the method of Parzen and others (1994) with B = 10 000. The standard error estimates are very similar between the LS and the SV methods, whereas those of the method of Parzen and others tend to be slightly larger. The larger standard error estimates by the method of Parzen and others are likely due to the unstabilities of the perturbed estimating equations. Indeed, the method of Parzen and others produced 7 extreme estimates in the Buckley–James estimation of the gender effect, which were excluded in the standard error calculations. For the new resampling approach, it took approximately 1 and 3 min on an IBM BladeCenter HS20 machine to estimate the standard errors for the log-rank and Buckley–James estimators, respectively, whereas the method of Parzen and others consumed 10 and 24 h, respectively.

Table 4.

Accelerated failure time regression for the Atherosclerosis Risk in Communities data

Covariate Estimate Standard error estimate
LS SV Parzen
Smoking status
 Log-rank −0.411 0.060 0.060 0.060
 Buckely–James −0.363 0.087 0.090 0.092
Minnesota
 Log-rank 0.121 0.064 0.064 0.065
 Buckely–James 0.093 0.065 0.065 0.068
Washington
 Log-rank −0.165 0.061 0.061 0.062
 Buckely–James −0.147 0.067 0.067 0.070
Age
 Log-rank −0.292 0.028 0.028 0.028
 Buckely–James −0.264 0.055 0.054 0.058
Gender
 Log-rank −0.893 0.065 0.065 0.067
 Buckely–James −0.842 0.176 0.172 0.195

5. Discussion

The existing resampling methods require solving estimating equations or minimizing loss functions repeatedly, whereas the proposed methods only involve the evaluation of estimating functions. In complex situations, such as rank regression and least squares regression with censored data, the amount of time required to evaluate an estimating function is negligible as compared to solving the corresponding estimating equation. Then, the proposed methods are orders of magnitude faster than the existing resampling methods. Despite the continuing improvement in computer power, this degree of saving is very important, especially for large data sets and for simulation studies. Adopting the proposed resampling procedures will not only enhance the utilities of many existing nonparametric and semiparametric estimators but also facilitate the development and evaluation of new methods for complex biostatistical problems.

The approach of Hu and Kalbfleisch (2000) does not require solving estimating equations repeatedly in order to construct confidence intervals but requires to do so for estimating the variances of parameter estimators. It is restricted to linear estimating functions with independent terms and thus would be applicable to quantile regression, but not to rank regression or Buckley–James estimation.

Our method can be viewed as a version of Monte Carlo numerical differentiation. In contrast to the usual numerical differentiation that uses fixed step sizes, the new method generates random step sizes Z, exploring a broad range of step sizes and producing stable estimates. Numerical results indicate that our method is not sensitive to the choice of the distribution of Z.

The proposed methods have very broad applications and are particularly applicable to the situations in which the method of Parzen and others has been used. We have focused our attention on nonsmooth estimating functions. In some situations, the estimating functions are differentiable, but the derivatives are difficult to calculate. Then, the proposed resampling methods would also be appealing.

The results of Section 2 continue to hold if (2.1) is replaced by the more general expansion

n1/2Un(θ)=G+An1/2(θθ0)+op(1+n1/2θθ0),

where G is a zero-mean random vector whose covariance matrix can be consistently estimated. Thus, the proposed resampling methods can be applied to multivariate responses, biased sampling, and time series data among others. Indeed, the n1/2 convergence rate is not essential. Furthermore, our approach can potentially be extended to semiparametric situations in which infinite-dimensional parameters are part of θ.

Acknowledgments

The authors thank the reviewers for helpful comments.

Funding

National Institutes of Health.

Footnotes

Conflict of Interest: None declared.

References

  1. Buchinsky M. Estimating the asymptotic covariance-matrix for quantile regression-models—a Monte Carlo study. Journal of Econometrics. 1995;68:303–338. [Google Scholar]
  2. Cai TX, Pepe MS, Zheng YY, Lumley T, Jenny NS. The sensitivity and specificity of markers for event times. Biostatistics. 2006;7:182–197. doi: 10.1093/biostatistics/kxi047. [DOI] [PubMed] [Google Scholar]
  3. Chen YQ, Jewell NP. On a general class of semiparametric hazards regression models. Biometrika. 2001;88:687–702. [Google Scholar]
  4. Cox DR, Oakes D. Analysis of Survival Data. London: Chapman and Hall; 1984. [Google Scholar]
  5. Efron B, Tibshirani RJ. An Introduction to the Bootstrap. New York: Chapman and Hall; 1993. [Google Scholar]
  6. Hu F, Kalbfleisch JD. The estimating function bootstrap (with discussion) Canadian Journal of Statistics. 2000;28:449–499. [Google Scholar]
  7. Huber PJ. Robust estimation of a location parameter. The Annals of Mathematical Statistics. 1964;35:73–101. [Google Scholar]
  8. Jin Z, Lin DY, Wei LJ, Ying Z. Rank-based inference for the accelerated failure time model. Biometrika. 2003;90:341–353. [Google Scholar]
  9. Jin Z, Lin DY, Ying Z. On the least squares regression with censored data. Biometrika. 2006;93:147–162. [Google Scholar]
  10. Jin Z, Ying Z, Wei LJ. A simple resampling method by perturbing the minimand. Biometrika. 2001;88:381–390. [Google Scholar]
  11. Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. 2. Hoboken, NJ: Wiley; 2002. [Google Scholar]
  12. Koenker R, D’Orey V. Computing regression quantiles. Applied Statistics. 1987;36:383–393. [Google Scholar]
  13. Krall JM, Uthoff VA, Harley JB. A step-up procedure for selecting variables associated with survival. Biometrics. 1975;31:49–57. [PubMed] [Google Scholar]
  14. Lin DY, Geyer CJ. Computational methods for semiparametric linear regression with censored data. Journal of Computational and Graphical Statistics. 1992;1:77–90. [Google Scholar]
  15. Parzen MI, Wei LJ, Ying Z. A resampling method based on pivotal estimating functions. Biometrika. 1994;81:341–350. [Google Scholar]
  16. The ARIC Investigators. The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives. American Journal of Epidemiology. 1989;129:687–702. [PubMed] [Google Scholar]
  17. Tsiatis AA. Estimating regression parameters using linear rank tests for censored data. The Annals of Statistics. 1990;18:354–372. [Google Scholar]
  18. van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. New York: Springer; 1996. [Google Scholar]
  19. Yao Q, Wei LJ, Hogan JW. Analysis of incomplete repeated measurements with dependent censoring times. Biometrika. 1998;85:139–149. [Google Scholar]
  20. Ying Z. A large sample study of rank estimation for censored regression data. The Annals of Statistics. 1993;21:76–99. [Google Scholar]

RESOURCES