Abstract
An important issue in statistical inference for semiparametric models is how to provide reliable and consistent variance estimation. Brown and Wang (2005. Standard errors and covariance matrices for smoothed rank estimators. Biometrika 92, 732–746) proposed a variance estimation procedure based on an induced smoothing for non-smooth estimating functions. Herein a Monte Carlo version is developed that does not require any explicit form for the estimating function itself, as long as numerical evaluation can be carried out. A general convergence theory is established, showing that any one-step iteration leads to a consistent variance estimator and continuation of the iterations converges at an exponential rate. The method is demonstrated through the Buckley–James estimator and the weighted log-rank estimators for censored linear regression, and rank estimation for multiple event times data.
Keywords: Accelerated failure time model, Asymptotic fiducialdistribution, Buckley–James estimator, Censored data, Contraction mapping, Estimating function, Kaplan–Meier estimator, Monte Carlointegration, Rank estimator
1. Introduction
Many important estimators involve solving non-smooth and perhaps discontinuous estimating functions. Examples include least absolute deviation estimator (Bloomfield and Steiger, 1983), various rank estimators (Hettmansperger and McKean, 1998), and parameter estimators in censored linear regression (Buckley and James, 1979; Ritov, 1990; Tsiatis, 1990; Lai and Ying, 1991; Ying, 1993). Such estimators are also common in the econometrics literature, where robust procedures are advocated and censoring and sampling bias may occur. Koenker and Bassett (1978) pioneered the quantile regression method and Powell (1984) developed an extended least absolute deviation estimator for censored regression, all involving non-smooth estimating equations.
When the estimating functions are non-smooth, the limiting distributions of the resulting estimators often involve density functions, as exhibited in the above-cited examples. It is therefore desirable to develop methods for variance estimation that bypass density estimation. An interesting development is due to Brown and Wang (2005), where they used a pseudo-Bayesian approach to obtain a naturally induced smoothed version whereby a consistent variance estimator can be obtained through an iterative procedure. They demonstrated usefulness of their method through rank estimators. The work of Brown and Wang (2007) contains an extension of the method to the censored linear regression using the Gehan estimating function. Wang and Zhao (2008), Johnson and Strawderman (2009), and Fu and others (2010) further applied the procedure to the analysis of clustered data. Recently, the procedure has been successfully applied to different regression models, Pang and others (2012) on the censored quantile regression model, Li and others (2012) on the accelerated hazards model, and Lin and Peng (2013) on the linear transformation model.
It appears that a key component in the implementation and theoretic analysis in Brown and Wang (2005, 2007) is that the smoothed version of the estimating function, i.e. integration of the original estimating function with respect to a normal kernel, has a closed analytic form from which the iterative algorithm and convergence analysis can be carried out. This would exclude many well-known estimators of which the estimating functions are non-smooth and non-monotone. In particular, it excludes all weighted log-rank estimators (except for the case of Gehan) and the Buckley–James estimator for censored linear regression. They also noted that their approach is effective only when the underlying estimating function is monotone.
Motivated by Brown and Wang (2005), this paper develops a general way for approximating standard errors. It is based on the use of numerical approximations to integrals and modifies the Brown–Wang method so that the scope and applicability are substantially expanded. This new development is particularly appealing to complicated situations, such as the Buckley–James and weighted log-rank estimators in censored linear regression where the estimating functions take complex forms. The lack of an explicit form for the smoothed estimating function in the general situation also entails that new analytic tools are needed to ensure general convergence of the iterative algorithm. Indeed, we show that the so-called contraction mapping theorem is applicable stochastically and, in consequence, an exponential rate of convergence is established.
The paper is organized as follows. In the next section, we describe our proposed method along with theory. In Section 3, we apply the proposed method to estimate the variance of the parameter estimators in rank estimation and least squares estimation for censored regression, and discuss its extension to the multivariate cases. In Section 4, we present several simulation studies and real examples. We conclude with a discussion in Section 5. Supplementary material available at Biostatistics online outlines all theoretical proofs.
2. Method and theory
Let
be a
-dimensional vector of parameters that is related to the observations
. We shall use
to denote a vector of estimating functions for
. The resulting estimator
is obtained by solving
. Without loss of generality, we assume that
is properly scaled so that
converges to a non-random function
and
, where
denotes the true parameter. The first two basic assumptions are as follows.
Assumption A1 —
is asymptotically normal with mean 0 and a covariance matrix
, i.e.
(2.1)
Assumption A2 —
The estimator
obtained by solving
is
-consistent, and
is asymptotically normal with mean
and covariance matrix
.
Assumptions A1 and A2 are usually satisfied by many estimating functions. Inference on
, e.g. construction of a confidence set, requires a consistent estimate of
. As noted in Jin and others (2001) and Brown and Wang (2005), a consistent estimator
of
is usually easy to obtain because it only involves
. If
has a continuous derivative
the typical asymptotic arguments, as discussed in Brown and Wang (2005), would lead to
![]() |
(2.2) |
that is,
is asymptotically normal with mean
and covariance matrix
, where
Thus,
can be estimated by
and
can be estimated by
. However, the estimating function
is often non-smooth, thus one cannot estimate the slope matrix by simply taking partial derivatives. As a consequence, variance estimation for
can be a challenging issue.
The idea behind the elegant approach of Brown and Wang (2005) is to use the asymptotic fiducial distribution as the basis for an induced smoothing kernel. Specifically,
is in distribution approximately equal to
, where
is the standard
-variate normal random vector. It induces the following smoothed version of the estimating function:
![]() |
(2.3) |
where
denotes the expectation with respect to
. Brown and Wang (2005) then suggested to obtain
and
by jointly solving
and the following equation:
![]() |
(2.4) |
where
and
is a consistent estimator of
. Because both sides of (2.4) involve
, an iterative algorithm for solving
results.
For certain rank estimators and quantiles, Brown and Wang (2005, 2007) were able to obtain manageable analytic forms for
and
and showed that their proposed iterative algorithm converges numerically. The approach, however, is not applicable if estimating functions
are non-smooth and their smoothed version
is too complicated to be written out in simple analytic forms. Examples of such kind include all weighted log-rank estimators (except the Gehan estimator) and the Buckley–James estimator for censored linear regression. Thus, there is a need to develop a simple and more generally applicable algorithm to estimate the variance
. It is also desirable to investigate the convergence property of the iterative algorithms under more general conditions.
Next we state another assumption on local asymptotic linearity (LAL) which is generally satisfied even for estimating functions
that are non-smooth and/or non-monotone. In fact, the LAL is a commonly used assumption for proving asymptotic normality. In particular, all examples considered in Brown and Wang (2005, 2007) and in this paper satisfy the following LAL assumption.
Assumption A3 —
is locally asymptotically linear (LAL) at
, i.e.
(2.5) where
is a non-degenerate slope matrix,
denotes the Euclidean norm, and
is some small neighborhood of
.
Note that Assumptions A1–A3 imply (2.2) with
as defined in (2.5). That is,
. It follows that
is asymptotically normal with mean
and covariance matrix
.
Using either Stein's Identity (Stein, 1981) or a simple integration by parts argument, the derivative of the smoothed estimating equation
defined in (2.3) satisfies the following equation:
![]() |
(2.6) |
Remark 2.1 —
The validity of (2.6) does not require the existence of partial derivative of
. It is valid as long as the order of the partial derivative
and the expectation
is exchangeable, which can be checked with Fubini's Theorem in measure theory; see supplementary material available at Biostatistics online for a proof.
Under the above three general Assumptions A1–A3, a very simple consistent estimate of
is given by
, where
and with
being the identity matrix. Note that the two integrals in (2.3) and (2.6) can be numerically approximated arbitrarily well by a simple Monte Carlo method (MCM) or the Gaussian quadrature method (GQM). More specifically, we propose the following two numerical methods to provide simple consistent estimates of the variance
.
An MCM
Step 1: Calculate a consistent estimator
of the matrix
.
Step 2: Choose
and a large number
.
Step 3: For the
th step (
), generate
,
from multivariate normal distribution
. Estimate
by
![]() |
Step 4: Calculate
and define
.
Step 5: Repeat Steps 3 and 4 for next
until
and
converge.
The covariance matrix of
will be estimated using the
at the convergence in the above iterative algorithm.
The convergence of
and
can be assessed by commonly used matrix convergence criteria, such as the difference or relative change. One may choose a very large
to ensure a good approximation for the Gaussian integral.
A GQM
Replace Step 3 in MCM with
Step 3
: Choose grid
-vector points
,
, based on a pre-specified accuracy criterion, and calculate
![]() |
where
are the Gaussian quadrature weights. One choice of
is based on 1D Gauss–Hermite quadrature calculations.
The following theorem justifies the convergence of the algorithms in MCM and GQM. A proof of the theorem can be found in supplementary material available at Biostatistics online.
Theorem 2.2 —
Under Assumptions A1–A3, the one-step (
) estimates of
and
in the MCM or in the GQM with a large
are consistent as
. Moreover, the iteration algorithm in either MCM or GQM converges under Assumptions A1–A3.
3. Examples
We illustrate the methods with three examples: weighted log-rank estimators, the Buckley–James estimators for censored linear regression, and their extensions to multivariate data. It should be noted that the approach of Brown and Wang (2005, 2007) is not applicable to any of the three examples as the corresponding estimating functions are non-monotone and do not have simple form to give an explicit evaluation for the induced smooth versions.
3.1. Weighted log-rank estimators
Consider the accelerated failure time (AFT) model for survival times (Kalbfleisch and Prentice, 2002). Let
be the failure time and
be the
-vector of covariates for the
th individual,
. The AFT model relates the logarithm of the failure time,
, linearly to the covariates
![]() |
(3.1) |
where
are independent and identically distributed random errors with unknown distribution function
and
is a
-vector of unknown regression parameters. Instead of the
, we observe
and
where
are censoring times. We assume that observations
,
, are independent and identically distributed.
The general weighted log-rank estimating function with weight function
takes the form
![]() |
(3.2) |
where
. The corresponding estimate solves
.
In general, the
can be neither smooth nor monotone. Step 1 in the previous section can be easily implemented as the variance of
;
can be estimated through the usual plug-in estimator
![]() |
where
for a matrix
; cf. Ying and others (1992). Steps 2–5 can also be implemented straightforwardly.
3.2. Buckley–James estimators
For the censored linear regression model in Section 3.1, Buckley and James (1979) considered an estimating equation based on the least squares principle with
![]() |
where
and
is the left-continuous version of the Kaplan–Meier estimate of
based on
. The estimator
can be obtained by the method of Jin and others (2006a).
Step 1 in Section 2 can be easily implemented as the variance of
;
can be estimated by the method in Ying and others (1992).
![]() |
The remaining steps are straightforward.
3.3. Rank estimation for multiple event times data
Jin and others (2006b) considered the extension of the rank estimation to multiple events data, recurrent events data, and clustered failure time data and developed resampling approaches to estimate the limiting covariance matrices without non-parametric density estimation or evaluation of numerical derivatives. However, their implementation is computationally intensive. The approach developed in this paper offers a rather simple way of estimating the limiting covariance matrices. We illustrate the use of the proposed method with the rank estimation in multiple events data.
Suppose that a subject can potentially experience
types of events. For the
th subject,
and
, let
be the time to the
th event,
be the corresponding censoring time, and
be the corresponding
vector of covariates. The observed data consist of
, where
and
.
Jin and others (2006b) considered the marginal distributions of the
types of events with AFT models while leaving the dependence structures unspecified.
![]() |
where
is a
vector of unknown regression parameters, and
are independent random vectors with a common, but completely unspecified, joint distribution that are independent of the
.
Let
and
. The weighted log-rank estimating function for
is given by
![]() |
where
, and
is a weight function. The resulting estimator is denoted by
. Note that the choices of
,
and
being the Kaplan–Meier estimator based on
as
correspond to the log-rank, Gehan–Wilcoxon, and Prentice–Wilcoxon statistics, respectively.
Let
and
. The random vector
is asymptotically zero-mean normal with covariance matrix
.
The
can be estimated by the empirical estimator of covariance matrix between
and
.
Let
,
. Denote the empirical estimator of covariance matrix of
as
; then the
can be estimated as follows.
Step 1: Generate
,
from
-dimensional multivariate normal distribution
.
Step 3: Choose
. Then estimate
by
![]() |
Step 4: Calculate
and denote
.
Step 5: Replace
with
; then iterate between Steps 3 and 4 until
converges.
The covariance matrix of
will be
.
4. Simulations and application to real data
Simulation studies were conducted to assess the performance of the proposed methods. Here we present simulation results for censored linear regression model (3.1) using the Gehan, the log-rank, and the Buckley–James least squares estimating equations. Following Jin and others (2006a), we generate failure times from the model
![]() |
where
is Bernoulli with success probability 0.5,
is normal with mean 0 and standard deviation 0.5, and
has the standard normal, extreme value. The censoring times were generated from the uniform Un
distribution, where
was chosen to yield a desired level of censoring. We estimated
and
with the log-rank estimation method as in Jin and others (2003) and the least squares method as in Jin and others (2006a). The 1000 Monte Carlo standard 2D normal random vectors were used and
was set to be
.
The results for a sample size of 100 based on 1000 simulated datasets are summarized in Tables 1 and 2. In all cases, the proposed procedure accurately estimates the variability of the parameter estimator, and the confidence intervals have proper coverage probabilities.
Table 1.
Summary statistics for the simulation studies normal error
| Gehan estimator |
Log-rank |
Least squares |
|||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Parameter | Censoring (%) | Bias | SE | SEE | CP | Bias | SE | SEE | CP | Bias | SE | SEE | CP |
![]() |
0 | 0.002 | 0.207 | 0.209 | 0.945 | 0.006 | 0.222 | 0.225 | 0.941 | 0.003 | 0.202 | 0.213 | 0.957 |
| 25 | ![]() |
0.230 | 0.227 | 0.947 | 0.001 | 0.246 | 0.245 | 0.940 | 0.000 | 0.226 | 0.235 | 0.949 | |
| 50 | ![]() |
0.257 | 0.260 | 0.949 | ![]() |
0.274 | 0.280 | 0.948 | ![]() |
0.254 | 0.267 | 0.953 | |
![]() |
0 | ![]() |
0.213 | 0.211 | 0.943 | ![]() |
0.223 | 0.227 | 0.955 | ![]() |
0.208 | 0.215 | 0.954 |
| 25 | ![]() |
0.233 | 0.231 | 0.938 | ![]() |
0.242 | 0.247 | 0.951 | 0.000 | 0.226 | 0.239 | 0.959 | |
| 50 | 0.005 | 0.263 | 0.267 | 0.955 | ![]() |
0.277 | 0.284 | 0.948 | 0.003 | 0.258 | 0.274 | 0.963 | |
Bias, bias of the parameter estimator; SE, standard error of the parameter estimator; SEE, mean of the standard error estimator; CP, coverage probability of the 95% confidence interval.
Table 2.
Summary statistics for the simulation studies extreme-value error
| Gehan estimator |
Log-rank |
Least squares |
|||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Parameter | Censoring (%) | Bias | SE | SEE | CP | Bias | SE | SEE | CP | Bias | SE | SEE | CP |
![]() |
0 | ![]() |
0.236 | 0.238 | 0.953 | ![]() |
0.204 | 0.211 | 0.952 | ![]() |
0.262 | 0.278 | 0.957 |
| 25 | 0.000 | 0.289 | 0.282 | 0.941 | ![]() |
0.243 | 0.247 | 0.957 | ![]() |
0.306 | 0.307 | 0.949 | |
| 50 | 0.000 | 0.368 | 0.362 | 0.951 | ![]() |
0.318 | 0.321 | 0.956 | ![]() |
0.379 | 0.370 | 0.945 | |
![]() |
0 | 0.005 | 0.243 | 0.241 | 0.945 | ![]() |
0.216 | 0.213 | 0.938 | 0.005 | 0.266 | 0.281 | 0.956 |
| 25 | 0.003 | 0.286 | 0.288 | 0.945 | ![]() |
0.249 | 0.250 | 0.949 | 0.002 | 0.300 | 0.314 | 0.957 | |
| 50 | 0.008 | 0.361 | 0.373 | 0.945 | 0.004 | 0.318 | 0.324 | 0.947 | 0.007 | 0.366 | 0.381 | 0.955 | |
Bias, bias of the parameter estimator; SE, standard error of the parameter estimator; SEE, mean of the standard error estimator; CP, coverage probability of the 95% confidence interval.
We applied the method to the data on multiple myeloma reported by Krall and others (1975), which is the main example in SAS PROC PHREG. (SAS Institute, 1999). Two standardized covariates
(BUN) and hemoglobin at diagnosis (HGB) were considered for the censored regression model in Section 3.1. The 10 000 Monte Carlo standard 2D normal random vectors were used,
was set to be
, and 0.0001 was used as the convergence criterion between successive estimates. Convergence was reached after three or four iterations. It yielded standard errors (0.142, 0.168) for the Gehan estimate (
, 0.292), (0.173, 0.158) for the log-rank estimate (
, 0.268), and (0.122, 0.146) for the least-squares estimate (
, 0.281). The results are similar to those obtained with the resampling approach in Jin and others (2003, 2006a).
We also did reanalysis of the Stanford heart transplantation data in Miller and Halpern (1982) by regressing the base-10 logarithm of the survival time on the patient's age and the T5 mismatch score for the 157 patients with complete records on the T5 mismatch score; the 10 000 Monte Carlo standard 2D normal random vectors were used,
was set to be
and 0.0001 was used as the convergence criterion between successive estimates. After convergence in four iterations, it yielded standard errors (0.0090, 0.1565) for the Gehan estimate (
,
) and (0.0089, 0.1540) for the least-squares estimate (
,
). The results are similar to those obtained with the resampling approach in Jin and others (2006a).
In our numerical studies, the proposed MCM used significantly less computational time compared with the resampling method for a similar accuracy in results.
5. Discussion
Variance estimation is an important aspect in semiparametric inference. It can be a thorny issue when the corresponding estimating functions are non-smooth. The Brown–Wang approach provides a simple solution through an induced smoothing, and can be easily implemented and justified when the closed form of the induced smoothed estimating function is available.
The present paper expands the scope and applicability of the Brown–Wang approach by recognizing that smoothing can be carried out via Monte Carlo approximations. This is especially crucial when the underlying estimating equations involve the empirical version of the infinite-dimensional parameter in the semiparametric model, as being demonstrated through several examples that are common in semiparametric analysis of failure time data.
The paper focuses on the parametric component of the semiparametric model. It is certainly of interest to extend the approach so that inference for the non-parametric component can be carried out properly. This may require effective handling of estimation for the non-parametric part without creating a large number of estimating equations that increases with the sample size.
Supplementary material
Supplementary Material is available at http://biostatistics.oxfordjournals.org.
Funding
Y.S.'s research is partially supported by the NYU Cancer Center Support Grant 2P30 CA16087.
Supplementary Material
Acknowledgments
We thank an associate editor and two referees for their careful reading and valuable comments. Conflict of Interest: None declared.
References
- Bloomfield P., Steiger W. L. (1983). Least Absolute Deviations. Theory, Applications, and Algorithms. Progress in Probability and Statistics 6 Boston, MA: Birkhäuser Boston. [Google Scholar]
- Brown B. M., Wang Y.-G. (2005). Standard errors and covariance matrices for smoothed rank estimators. Biometrika 92, 732–746. [Google Scholar]
- Brown B. M., Wang Y.-G. (2007). Induced smoothing for rank regression with censored survival times. Statistics in Medicine 26, 828–836. [DOI] [PubMed] [Google Scholar]
- Buckley J., James I. (1979). Linear regression with censored data. Biometrika 66, 429–436. [Google Scholar]
- Fu L., Wang Y., Bai Z. (2010). Rank regression for analysis of clustered data: a natural induced smoothing approach. Computational Statistics and Data Analysis 54, 1036–1050. [Google Scholar]
- Hettmansperger T. P., McKean J. W. (1998) Robust Nonparametric Statistical Methods. London: Arnold. [Google Scholar]
- Jin Z., Lin D. Y., Wei L. J., Ying Z. (2003). Rank-based inference for the accelerated failure time model. Biometrika 90, 341–353. [Google Scholar]
- Jin Z., Lin D. Y., Ying Z. (2006a). On least-squares regression with censored data. Biometrika 93, 147–161. [Google Scholar]
- Jin Z., Lin D. Y., Ying Z. (2006b). Rank regression analysis of multivariate failure time data based on marginal linear models. Scandinavian Journal of Statistics 33, 1–23. [Google Scholar]
- Jin Z., Ying Z., Wei L. J. (2001). A simple resampling method by perturbing the minimand. Biometrika 88, 381–390. [Google Scholar]
- Johnson L. M., Strawderman R. L. (2009). Induced smoothing for the semiparametric accelerated failure time model: asymptotics and extensions to clustered data. Biometrika 96, 577–590. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kalbfleisch J. D., Prentice R. L. (2002). The Statistical Analysis of Failure Time Data, 2nd edition Hoboken: John Wiley. [Google Scholar]
- Koenker R., Bassett G. (1978). Regression quantiles. Econometrica 46, 33–50. [Google Scholar]
- Krall J. M., Uthoff V. A., Harley J. B. (1975). A step-up procedure for selecting variables associated with survival. Biometrics 31, 49–57. [PubMed] [Google Scholar]
- Lai T. L., Ying Z. (1991). Large sample theory of a modified Buckley–James estimator for regression analysis with censored data. The Annals of Statistics 19, 1370–1402. [Google Scholar]
- Li H., Zhang J., Tang Y. (2012). Induced smoothing for the semiparametric accelerated hazards model. Computational Statistics and Data Analysis 56, 4312–4319. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lin H., Peng H. (2013). Smoothed rank correlation of the linear transformation regression model. Computational Statistics and Data Analysis 57, 615–630. [Google Scholar]
- Miller R., Halpern J. (1982). Regression with censored data. Biometrika 69, 521–531. [Google Scholar]
- Pang L., Lu W., Wang H. J. (2012). Variance estimation in censored quantile regression via induced smoothing. Computational Statistics and Data Analysis 56, 785–796. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Powell J. L. (1984). Least absolute deviations estimation for the censored regression model. Journal of Econometrics 25, 303–325. [Google Scholar]
- Ritov Y. (1990). Estimation in a linear regression model with censored data. The Annals of Statistics 18, 303–328. [Google Scholar]
- SAS Institute. (1999). SAS/STAT User's Guide, Version 8 Cary NC: SAS Institute Inc. [Google Scholar]
- Stein C. M. (1981). Estimation of the mean of a multivariate normal distribution. The Annals of Statistics 9, 1135–1151. [Google Scholar]
- Tsiatis A. A. (1990). Estimating regression parameters using linear rank tests for censored data. The Annals of Statistics 18, 354–372. [Google Scholar]
- Wang Y.-G., Zhao Y. (2008). Weighted rank regression for clustered data analysis. Biometrics 64, 39–45. [DOI] [PubMed] [Google Scholar]
- Ying Z. (1993). A large sample study of rank estimation for censored regression data. The Annals of Statistics 21, 76–99. [Google Scholar]
- Ying Z., Wei L. J., Lin J. S. (1992). Prediction of survival probability based on a linear regression model. Biometrika 79, 205–209. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.






























































