Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Dec 1.
Published in final edited form as: Stat Comput. 2013 Jul 5;24(5):853–869. doi: 10.1007/s11222-013-9406-4

Shrinkage Estimation of Varying Covariate Effects Based On Quantile Regression

Limin Peng 1, Jinfeng Xu 2, Nancy Kutner 3
PMCID: PMC4201656  NIHMSID: NIHMS473651  PMID: 25332515

Abstract

Varying covariate effects often manifest meaningful heterogeneity in covariate-response associations. In this paper, we adopt a quantile regression model that assumes linearity at a continuous range of quantile levels as a tool to explore such data dynamics. The consideration of potential non-constancy of covariate effects necessitates a new perspective for variable selection, which, under the assumed quantile regression model, is to retain variables that have effects on all quantiles of interest as well as those that influence only part of quantiles considered. Current work on l1-penalized quantile regression either does not concern varying covariate effects or may not produce consistent variable selection in the presence of covariates with partial effects, a practical scenario of interest. In this work, we propose a shrinkage approach by adopting a novel uniform adaptive LASSO penalty. The new approach enjoys easy implementation without requiring smoothing. Moreover, it can consistently identify the true model (uniformly across quantiles) and achieve the oracle estimation efficiency. We further extend the proposed shrinkage method to the case where responses are subject to random right censoring. Numerical studies confirm the theoretical results and support the utility of our proposals.

Keywords: Adaptive-LASSO, Censoring, Quantile regression, Shrinkage estimation, Variable selection, Varying covariate effects

1 Introduction

In regression analysis, there has been growing awareness that the association under study may be prone to considerable heterogeneity. Inhomogeneous associations, embodied by varying covariate effects, are often scientifically plausible and entail important practical implications. There have been numerous reports in literature, for example, Kaslow et al (1987), Dickson et al (1989), Thorogood et al (1990), Verweij and Van Houwelingen (1995), Carey et al (1995), and Jensen et al (1997). As a more recent example, an analysis presented in Peng and Huang (2008) suggested that the severity of restless leg syndrome (RLS) symptoms may have prognostic power only in dialysis patients with high risk of mortality but not in long-term survivors. Such a varying association pattern sheds useful insight on dialysis prognosis but is precluded by traditional linear regression with the constraint of constant coefficients.

Quantile regression Koenker and Bassett (1978), by its modeling strategy, offers an excellent platform to accommodate and identify inhomogeneous associations. For example, a linear quantile regression model may take the form,

QY(τZ)=ZTβ0(τ),τ(τL,τU), (1)

where Y is a response variable, Z ≡ (1, T)T is a p × 1 vector of covariates, β0(τ) ≡ (α0(τ), β̰0(τ)T)T is a p × 1 vector of unknown coefficients, QY (τZ) ≡ inf {y : pr(YyZ) ≥ τ} denotes the τ–th conditional quantile of Y given Z, and (τL, τU) specifies the quantile level range of interest with 0 < τL < τU < 1. Here α0(τ) and β̰0(τ) represent intercept and covariate effects on the τ–th quantile of Y respectively, which are allowed to change over τ . As commented by Portnoy (2003), nonconstant coefficients in β̰0(τ) across τ reflect any sources of heterogeneity affecting Y but not captured by the covariates in . Throughout this paper, we shall refer model (1) to as a “global” linear quantile regression model, reflecting the fact the linear quantile assumption is imposed over a τ-continuum, in contrast to a “local” model that only asserts quantile linearity for a single or a finite number of specified τ’s. It is worth emphasizing that model (1) is a strict extension of the traditional linear model, Y = ZT β + ε, with or without distributional assumptions on the error term ε.

In practice, a large number of covariates are often collected at initial study stage. An appropriate variable selection procedure is thus crucial for building a model that excels in both interpretation and prediction accuracy. The consideration of varying covariate effects poses some conceptual subtlety for variable selection. Take model (1) for instance. Some covariates may exhibit a strong effect throughout the whole τ-range, namely a full effect, while some may impact part of but not all quantiles of interest and thus demonstrate a partial effect. In the dialysis example, RLS shows strong effects on lower quantiles of survival time but its effects seem to vanish for upper quantiles. It is scientifically important to retain such type of covariates in the final model because they can contribute a better understanding of the underlying response-covariate association and facilitate more profound predictions. This need calls for a new perspective for variable selection, which is to identify covariates with full effects as well as those with partial effects. Numerical studies presented in Section 3 indicate that existing variable selection methods that assume constant effects are incapable of fulfilling this need.

In this work, we adopt the varying coefficient quantile regression model (1) as the vehicle of tackling the variable selection problem that concerns potentially inhomogeneous associations. Penalization methods, as a special case of general shrinkage methods, have been actively studied for “local” quantile regression models. For example, Koenker (2005) gave a summary of the use of LASSO penalty (Tibshirani, 1996) and other bridge penalties (Frank and Friedman, 1993) in the context of quantile regression. Li and Zhu (2005) developed the solution path of the L1 penalty quantile regression. Zou and Yuan (2008) proposed a regression method, called composite quantile regression, which was shown to have some advantage in estimation efficiency over least squares methods. Wu and Liu (2009) presented a detailed study on the adaptation of the SCAD (Fan and Li, 2001) and adaptive-LASSO (Zou, 2006) penalties to quantile regression. Rocha et al (2009) extended the work of Knight and Fu (2000) to more general l1-penalized M-estimation settings and provided elegant theoretical results that can be applied to penalized quantile regression with fixed τ. However, all these approaches only concern covariate effects at a single or multiple specified quantile levels. Therefore, it is easy to perceive that they can miss important variables at a great chance when the quantile level targeted by the “local” quantile regression model is not in the τ-range where some covariates have influences. For example, a covariate that only influences the response from its 20th to 40th quantiles can be missed by the variable selection based on a median regression. An intuitive remedy for this problem may be to fit penalized “local” quantile regression at all possible τ’s of interest and remove variables that have shrunken coefficients at all τ’s considered. However, we shall show that this approach, which we call pointwise penalization approach hereafter, does not produce consistent variable selection.

Shrinkage estimation has been studied for other varyingcoefficient models. For example, Wang and Xia (2009) considered a linear model with coefficients formulated as functions of a univariate index variable. More recently, Kai et al (2011) extended the idea of composite quantile regression to estimate a similar type of varying coefficient model which contains both nonparametric coefficients and parametric coefficients. Both of these models permit extra data dynamics in contrast to a standard linear model by allowing coefficients to vary smoothly over the group stratified by the index variable. Of note, they require estimation based on kernel smoothing. As a result, the corresponding penalized estimators involve two levels of regularization; one for kernel bandwidth and one for penalization parameter. This may incur complexity in computation and theory.

The quantile regression model (1) considered in this work offers an alternative for accommodating varying associations. It offers a straightforward interpretation of the coefficient functions in β0(τ), and has gained increased popularity recently, for example, in survival analysis (Portnoy, 2003; Neocleous et al, 2006; Peng and Huang, 2008; Portnoy and Lin, 2010; Huang, 2010). The fact that the estimation of model (1) does not require smoothing is expected to alleviate the computational complexity of the shrinkage estimation of β0(·). The technical challenges now largely come from the need to deal with nonparametric coefficient functions and the non-differentiability of the loss functions tied to the estimation of β0(τ).

Belloni and Chernozhukov (2011) studied a l1-penalized estimator for the “global” quantile regression model in high-dimensional sparse case, allowing for τ-varying coefficients. They established the elegant near-oracle convergence rate (uniformly in τ) and characterized conditions under which the selected model contains the true model as the submodel with finite sample data. One critical condition required for their model selection properties is that the absolute value of any non-zero coefficient function must be bounded below by a positive separation bound uniformly in τ. While the separation bound can go to zero as n increases, it is not clear how their finite sample result on model selection can be extended to imply consistent model selection in the large sample sense without precluding the presence of covariates with partial effects (i.e. their coefficients are zero for some but not all τ’s), an important practical scenario like in the dialysis example discussed earlier.

In this paper, we develop a shrinkage estimation procedure based on the quantile regression model (1), which provides a useful variable selection tool that can flexibly accommodate inhomogeneous associations. Specifically, we follow the idea of adaptive LASSO (Zou, 2006) and propose a new penalization strategy to capture covariate effects’ “global” departure from null. We show that such a strategy can lead to a shrinkage estimator of β0(·) that enjoys the following oracle properties (Fan and Li, 2001): (i) the probability of correctly identifying all zero and nonzero coefficient functions approaches 1, which means consistent variable selection; (ii) the proposed estimator converges weakly to a Gaussian process in the same manner as if the set of irrelevant covariates was known, which entails efficient estimation. Note that, this is a stronger result than the oracle properties attained under a “local” quantile regression model as it asserts the uniform behavior of the coefficient functions. We do not require the full separation of non-zero coefficient functions from zero across τ and thus the new method is capable of detecting covariates that have effects on a partial range of quantile indices. Furthermore, we present an adaptation of the new shrinkage approach for randomly right censored data. This extension greatly enhances the applicability of this work in biomedical study settings, where censoring often arises. Findings from our numerical studies provide evidences for the utility of the proposed methods as well as the importance of accounting for the presence of varying covariate effects in variable selection. In this paper, we only concern the situation where p is fixed. The investigation of the case with pn will be reported in separate work.

2 The Proposed Methods

2.1 Uniform adaptive-LASSO estimation

Suppose the observed data include n independent and identically distributed replicates of (Y, Z), denoted by {(Yi,Zi)}i=1n. We use u(j) to denote the jth component of the vector u, and adopt a shorthand notation Δ for the quantile level range of interest, (τL, τU). Without loss of generality, we assume that supτΔ|β0(j)(τ)|0 for 2 ≤ js and supτΔ|β0(j)(τ)|=0 0 for s+1 ≤ jp. Based on model (1), this means the first s − 1 covariates in have some effects on Y while the rest do not.

It is worth noting that, here we adopt a global sparsity assumption, |{2jp:supτΔ|β0(j)(τ)|>0}|=s1<. This represents a distinction of this work from penalized local quantile regression, which imposes sparsity only on β0(τ) with a given τ (i.e. local sparsity). It is clear that the global sparsity would imply the local sparsity but not vice versa.

We begin with an unpenalized estimator of β0(τ) (Koenker and Bassett, 1978), defined as β̃n (τ) = arg minβWn(β; τ), where

Wn(β;τ)i=1nρ(YiZiTβ;τ),

and ρ(u; τ) = u{τI(u < 0)} is the so-called check function. Despite lack of a close form, β̃n(τ) can be efficiently computed, even in the presence of a large number of covariates, by various algorithms, for example, Koenker and d’Orey (1987), Lustig et al (1994), and Madsen and Nielsen (1993).

To proceed the shrinkage estimation of β0(τ), we adopt the adaptive LASSO idea from Zou (2006). In a general form, we present the penalized estimator as

β^n,λn(τ)=argminβWn,λn(β;τ),

where

Wn,λ(β;τ)=Wn(β;τ)+λj=2pwn,j(τ)·|β(j)|,

and wn,j(τ) stands for the weight assigned to |β(j)|. Note that when wn,j(τ) takes the form of |βn(j)(τ)|γ with γ > 0, the above estimator becomes the adaptive-LASSO estimator studied by Wu and Liu (2009) for a “local” linear quantile regression model assumed for a fixed τ . The naive pointwise penalization approach discussed in Section 1 is to compute this type of estimator with γ = 1 separately for each τΔ to form a function estimator of β0(·). As shown by our simulation studies, this approach may yield problematic variable selection results.

The proposed shrinkage estimator is β^n,λnUS(τ) with wn,j(τ)=wn,jUS, where

wn,jUS=1/(supτΔ|βn(j)(τ)|).

The key idea is about the new penalization strategy, by which we assign a high penalty to covariates, effects of which consistently exhibit small departure from zero for all τ, but not to those that have small effects at some τ’s but have large effects on the other τ’s. Using the uniform adaptive LASSO penalty wn,jUS in Wn, λ(β; τ) is expected to effectively differentiate zero coefficient function of τ versus those which are nonzero for some τΔ. As elucidated in Section 2.2, uniform shrinkage behaviors across τ can be achieved by the proposed uniform adaptive LASSO estimator, β^n,λnUS(τ).

To compute the proposed penalized estimator, β^n,λnUS(τ), one can adopt the existing routines for quantile regression, which have been well framed under linear programming (Koenker, 2005; Li and Zhu, 2005; Wu and Liu, 2009). More specifically, one may use the rq() function in R package quantreg (Koenker, 2011) to obtain β̃n(·), the unpenalized estimator of β0(τ) for τ ∈ (0, 1). Given that β̃n(·) is piecewise constant, applying the rq() function would yield a finite number of function values of β̃n(τ) along with the corresponding τ–jumping points, say 0 ≡ τ0 < τ1 < … < τR < τR+1 = 1. Based on these results, one can readily evaluate supτΔ|βn(j)(τ)| and thus wn,jUS. Next, for any given τ ∈ (0, 1), one may compute β^n,λnUS(τ) by using the rq.fit.lasso() function in R package quantreg with the penalty parameter specified as λn·wn,jUS.

It is worth pointing out that in theory, β^n,λnUS(·) obtained from a finite size dataset is also piecewise constant. However, it is not straightforward to locate the τ–points where β^n,λnUS(·) jumps. To circumvent this complication, one can approximate β^n,λnUS(·) by a cadlag step function which jumps at a sufficiently fine pre-specified grid in Δ. For example, in our numerical studies, we chose the grid points as (τj + τj+1)/2, j = 0, …, R. Our theoretical studies suggest that adopting a grid of size O(n−1/2) would lead to only asymptotically negligible deviations from the exact β^n,λnUS(·).

2.2 Asymptotic results

To study the asymptotic properties of β^n,λnUS(·), we require the following regularity conditions:

  • C1

    (i) Z is bounded; that is, supiZi∥ < ∞; (ii) E(Z ZT) > 0; (iii) β0 (τ) is Lipschitz in τ for τΔ.

  • C2

    (i) infτΔ, z fτ (0∣z) > 0; (ii) For some A0 > 0 and δ1 > 0, supτΔ, z |fτ (xz) − fτ (0∣z)| ≤ A0 |x| for |x| < δ1. Here fτ (xz) denotes the density of YZT β0(τ) given Z = z.

Both C1 and C2 impose rather mild data assumptions. Here, we assume bounded Z to simplify asymptotic arguments. This condition can be relaxed to E(Z2) < ∞, similar to the theoretical studies on accelerated failure time (AFT) model (Ying, 1993). The positive definite covariate matrix assumption stated in C1(ii) is critical for the identifiability of β0(τ), and is common in regression analysis. Condition C2 requires the smoothness and positiveness of fτ (xz) in a neighborhood of x = 0, which is often met in practice. Here and hereafter, we let 2n be shorthand for log log n, and ∥ · ∥ denote the Euclidean norm.

Theorem 1

Assume that (n1/22n)−1 λn = O(1). Under regularity conditions C1-C2, we have supτΔβ^n,λnUS(τ)β0(τ)=Op(n1/22n).

The uniform upper bound for, β^n,λnUS(τ)β0(τ), obtained in Theorem 1, implies the uniform consistency of β̂nn (τ). This result also helps establish the oracle property of β^n,λnUS(τ) stated in the following theorem.

Theorem 2

Assume that limn→∞ (2n)−2 λn = ∞ and limn→∞ n−1/2 λn = 0. Under regularity conditions C1-C2, we have

  1. (selection consistency) limnpr({2jp:supτΔ|β^n,λnUS(j)(τ)|=0}={s+1,,p})=1;

  2. (weak convergence) n1/2{β^n,λnUS(1:s)(τ)β0(1:s)(τ)} converges weakly to a mean zero Gaussian process with a covariance matrix Σoracle(τ, τ′). Here, the superscript (1:s) denotes the first s components of a vector, Σoracle (τ, τ′) is the same as the limit covariance matrix of n1/2{β̂oracle(τ)−β0(τ)}, where β̂oracle (τ) is the unpenalized estimator of β0(1:s)(τ) obtained as the minimizer of Wn(β; τ) with β0(j)(τ)(s+1jp) known to be zero functions.

The oracle property established in Theorem 2 ensures the uniform shrinkage behaviors of β^n,λnUS(τ) over τΔ, which is needed in the context that allows for varying covariate effects. To prove both theorems, we utilize empirical process theory to examine the asymptotic behavior of β^n,λnUS as a process of τ. Delicate probabilistic arguments are also used to handle the non-differentiability of Wnn(β; τ). Proofs are sketched in Appendices A and B, with a more detailed version presented in Supplementary Materials (available upon request).

Theorem 2 indicates the asymptotic order of λn which can yield consistent selection and efficient estimation. With a real dataset with finite sample size, we propose a BIC-like criterion to select a good penalization parameter, λn. Specifically, we define the BIC criterion as a sum of two terms:

BIC(λ)=Δlogσ^λ(τ)+lognn|Sλ|,

where Sλ={2jp:supτΔ|β^n,λUS(j)(τ)|0}, |Sλ| denotes the size of Sλ, and

σ^λ(τ)=n1i=1nρ(YiZiTβ^n,λUS(τ);τ).

Note that the first term in BIC(λ) quantifies the goodness of fit and the second term measures the model complexity. The use of log σ̂λ in BIC was motivated Koenker (2005)’s model selection criteria for quantile regression (see page 135). A similar form of BIC was also used by other authors, for example, Wang and Leng (2007). We select the penalization parameter as λ̂n = arg minλ BIC(λ). Simulations reported in Section 3.1 demonstrate satisfactory performance of λ̂n. We have also tested other methods such as AIC and generalized cross validation (GCV) for selecting λn. The simulation results presented in Supplementary Materials (available upon request) show that, compared to BIC based procedure, the AIC and GCV based procedures have similar performance in terms of the model error but appear to select the true model at substantially lower rates. It would be important to further investigate the justification for our proposal of selecting the penalty parameter based on BIC.

2.3 Inference

The oracle property stated in Theorem 2 facilitates the statistical inference for β0(τ) based on β^n,λnUS(τ). In this subsection, we outline the procedures for variance estimation and construction of confidence intervals.

Let Cn = {1} ∪ Sλn and qn = |Cn|. For a vector u and an index set Φ, we let u(Φ) denote a subvector of u with Φ indicating the components of u included in u(Φ). In the rest of this subsection, we use β̂n(τ) as shorthand notation for β^n,λnUS(τ). Define rn(τ)=(wn,1(τ)sgn{β^(1)(τ)},,wn,p(τ)sgn{β^(p)(τ)})T, ι^j(τ)=Zj(Cn)[τI{YjZjTβ^(τ)<0}λnrn(Cn)(τ)], and Ωn(τ,τ)=n1j=1nι^j(τ)2.

We propose the following steps to obtain variance estimates for regression quantiles that do not shrinkage for all τΔ:

  1. Find a symmetric and nonsingular qn × qn matrix En(τ) ≡ {en,1 (τ), …, en,qn (τ)} such that Ωn(τ, τ) = {En(τ)}2.

  2. Calculate the qn × qn matrix,
    Dn(τ)=(be,1^(τ)β^(Cn)(τ),,be,qn^(τ)β^(Cn)(τ)),
    where
    be,k^(τ)=argminbRqn{i=1nρ(YiZi(Cn)Tb;τ)+λnj=2qnwn,Cn(j)·|b(j)|n1/2bTen,k(τ)}
    with Cn(j) standing for the jth element in set Cn.
  3. Estimate the variance matrix of n1/2{β^n(Cn)(τ)β0(Cn)(τ)} by n{Dn(τ)}⊗2, where u⊗2 = uuT.

The technique used for variance estimation is an adaptation of Huang (2002) and Peng and Fine (2009)’s methods. Based on the proof of Theorem 2 (ii), specifically equation (17) in the Appendix B, and the uniform consistency of β̂(τ) from Theorem 1, we can readily apply the lines of Peng and Fine (2009) to show that n{Dn (τ)}⊗2 is a consistent variance estimate for β̂(1:s) (τ) when Cn = {1, …, s}. This, coupled with Theorem 2 (i), implies that for any ε > 0,

limnpr(Cn={1,,s}andn{Dn(τ)}2oracle(τ,τ)ε)limn{1pr(Cn{1,,s})pr(Cn={1,,s}andn{Dn(τ)}2oracle(τ,τ)>ε)}=1.

The above arguments provide a justification for the proposed variance estimation. Likewise, we propose to construct a 100(1 − α)% confidence interval for β0(j)(τ) as (β̂(j) (τ) − zα/2 σ̂j (τ), β̂(j) (τ) + zα/2 σ̂j (τ)) if jCn and {0} otherwise, where σ^j2 is the jth diagonal element of {Dn (τ)}⊗2 and zα is the 100(1 − α)th percentile of N(0, 1). Denote such a defined confidence interval as In. For 1 ≤ js, given that pr(jCn) = 0 and n1/2{β^(j)(τ)β0(j)(τ)} is asymptotically normal with nσ̂j (τ)2 being a consistent variance estimate, we have

limnpr(β0(j)(τ)In)=limnpr(|β0(j)(τ)β^(j)(τ)|<zα/2σ^j(τ)andjCn)=limnpr(|β0(j)(τ)β^(j)(τ)|<zα/2σ^j(τ)|jCn)=1α.

This implies that the asymptotic confidence level of In is (1 − α).

2.4 An extension to randomly censored data

Censoring often occurs in biomedical study settings. In this subsection, we propose an extension of the shrinkage method proposed in Section 2.1 to accommodate scenarios with random censoring. Specifically, let C denote the censoring variable to the response Y. We assume C is independent of (Y, Z). Note, this independent assumption is stronger than the standard random censoring assumption (i.e. C is independent of Y given Z). Define X = YC and δ = I(YC), where ∧ is the minimum operator. Observables include n independent and identically distributed replicates of (X, δ, Z), denoted by {(Xi,δi,Zi)}i=1n.

For unpenalized estimation of model (1) in the presence of random censoring by C, we consider an adaptation of Ma and Yin (2010)’s estimator, which can be obtained as

βC,n(τ)=argminβVn(β;τ),

where

Vn(β;τ)=i=1nδiG^(Xi)ρ(XiZiTβ;τ).

Here Ĝ(·) is the Kaplan-Meier estimator of G(x) = pr(Cx). One can use the available software, for example, the R package quantreg, to compute β̃C,n(τ) as weighted regression quantiles.

Using the same reasoning as that in Section 2.1, we propose the shrinkage estimator of β0(τ) as β̂C,nC,n (τ) = arg minβ VnC,n (β; τ), where

Vn,λ(β;τ)=Vn(β;τ)+λj=2pwC,n,j(τ)|β(j)|

with wC,n,j(τ)=(supτΔ|βC,n(j)(τ)|)1.

By the analogy between Vn(β; τ) and a standard weighted quantile regression problem, the implementation of β̂C,nC,n(τ) is expected to be very similar to our proposal for the uncensored shrinkage estimator βn,λnUS(τ). So is the asymptotic studies for β̂C,nC,n(τ). With some additional regularity condition on censoring, for example,

  • C3

    There exists ν > 0 such that pr(C = ν) > 0 and pr(C > ν) = 0;

we would have the uniform consistency of Ĝ(t) for t < ν, and thus can obtain the oracle property of β̂C,nC,n(τ) as in the uncensored case. Note that condition C3 is a rather mild assumption and is often satisfied in practice in the presence of administrative censoring. As pointed out by Peng and Fine (2009), artificial truncation may be imposed to ensure this condition always met. The detailed results are stated in the theorem below with the proof omitted.

Theorem 3

Assume that limn→∞(2n)−2λC,n = ∞, and limn→∞ n−1/2 λC,n = 0. Under regularity conditions C1-C2 and some mild assumption on censoring as those in Ma and Yin (2010), we have

  1. (selection consistency) limnpr({2jp:supτΔ|β^C,n,λC,n(j)(τ)|=0}={s+1,,p})=1;

  2. (weak convergence) n1/2{β^C,n,λC,n(1:s)(τ)β0(1:s)(τ)} converges weakly to a mean zero Gaussian process with a covariance matrix ΣC,oracle(τ, τ′). Here, ΣC,oracle(τ, τ′) is the same as the limit covariance matrix of n1/2{β^C,oracle(τ)β0(1:s)(τ)} where β̂C,oracle (τ) is the unpenalized estimator of β0(1:s)(τ) obtained as the minimizer of Vn(β; τ) with β0(j)(τ)(s+1jp) known to be zero functions.

Similarly, we propose to select λn based on a BIC-like criterion:

BIC(λ)=Δlogσ^C,λ(τ)+lognn|Sλ|,

where

σ^C,λ(τ)=i=1nδiρ(XiZiTβ^C,n,λ(τ);τ)/G^(Xi)i=1nδi/G^(Xi)

Note that σ̂C,λ, like σ̂λ, serves as an approximation to n1i=1nρ(YiZiTβ0(τ);τ) with censoring to Yi properly handled. For a given dataset, we choose λC,n as λ^C,n=argminλBIC(λ). Inference, such as variance estimation and confidence intervals, may mimic the procedures proposed for the uncensored case.

3 Numerical Studies

3.1 Monte-Carlo simulations

Extensive simulations were conducted to evaluate the finite-sample performance of the proposed shrinkage estimators in both uncensored and censored cases.

We first considered situations where there is no censoring to the response Y. We generated Y based on model (1) with eight covariates, Z(1),…,Z(8). Only Z(1), Z(2) and Z(5) have non-zero effects, and the coefficients for the other covariates are zero functions of τ. We considered three set-ups, denoted by (I), (II), and (III). For each of these set-ups, we plot the non-zero coefficients for Z(1), Z(2), and Z(5) in Figure 1 in bold lines, dashed lines and dotted lines respectively. The intercept coefficient function in set-up (I) is 2 times the quantile function of the standard normal distribution, and that in set-up (II) or (III) is set as a zero function in τ. It is easy to see from the specification of β0(τ) that a standard linear model with normal random errors holds in set-up (I) but not in set-ups (II) and (III). In set-up (I), Z(1),…,Z(8) were generated as standard normal variates with the correlation between Z(i) and Z(j) equal to 0.5|i−j| (1 ≤ i, j ≤ 8), and were truncated to be between −3 and 3. In set-ups (II) and (III), Z(i) = Φ(ξi) (i = 1,…, 8), where ξ1,…, ξ8 are standard normal variates with the correlation between ξi and ξj equal to 0.5|i−j| (1 ≤ i, j ≤ 8). Additional simulations were also conducted for cases with a larger number of covariates (i.e. p = 40) and are reported in Supplementary Materials (available upon request).

Fig. 1.

Fig. 1

True coefficient functions in simulation set-ups.

We assessed the proposed shrinkage estimation, denoted by US, and compare it with several other methods: (i) the naive pointwise penalization method described in Section 1 and 2.1, denoted by PS; (ii) the adaptive-LASSO estimator for “local” quantile regression at a single predetermined quantile level τ, denoted by SS(τ) with τ = 0.25, 0.50 or 0.75; (iii) adaptive-LASSO estimator for the least-square estimation of linear regression of Y on Z, denoted by LS. In the LS and SS(τ) approaches, it is implicitly assumed that the importance of a covariate can be adequately captured by its effect on the conditional mean or a conditional quantile at a known τ. We set τL = 0.1 and τU = 0.9. For each approach under comparison, the tuning parameter λn is selected as the minimizer of the corresponding BIC criterion among the 25(n/100) equally-spaced grid points between 0 and 12(n/100).

In Table 1, we present results based on 400 simulation replications for the proposed method and all the other methods described above. Specifically, we report mean number of correctly identified zero effects (NC), mean number of incorrectly identified zero effects (NIC), percentage of under-fitted models (PUF), percentage of correctly fitted models (PCF), and percentage of over-fitted models (POF). As a further clarification, NC counts the number of unselected covariates, which have zero coefficient functions (i.e. zero for all τ’s). In set-ups (I)–(III), we hope NC to be close to 5 as much as possible. We also evaluated the estimation accuracy of different estimators by examining their relative estimation errors (REE) with respect to the unpenalized estimator β̃(τ) or the oracle estimator β̂oracle(τ). The REE of β(·) is defined as

REE(β)=100×Δi=1nj=1p|β(j)(τ)β0(j)(τ)|Δi=1nj=1p|β¯(j)(τ)β0(j)(τ)|,

where β̄(τ) is either β̃(τ) or β̂oracle(τ). The estimate from the SS(τ) or LS approach was taken as the constant value of estimated β0 (τ) over τ. The median of REE with respect to the unpenalized estimator (MREEF) and the median of REE with respect to the oracle estimator (MREEO) are presented in Table 1.

Table 1.

Simulation results for uncensored data

Set-up Method n NC NIC PUF (%) PCF (%) POF (%) MREEF (%) MREEO (%)
(I) US 200 4.95 .00 .0 95.0 5.0 36.5 110.7
400 5.00 .00 .0 99.5 .5 35.4 104.8
PS 200 4.19 .00 .0 46.5 53.5 36.2 106.7
400 4.49 .00 .0 59.8 40.2 34.6 103.4
SS(0.25) 200 4.92 .00 .0 93.2 6.8 34.3 107.0
400 4.96 .00 .0 95.8 4.2 32.1 99.1
SS(0.50) 200 4.95 .00 .0 95.5 4.5 31.2 94.4
400 4.97 .00 .0 97.5 2.5 29.4 91.9
SS(0.75) 200 4.91 .00 .0 91.8 8.2 33.2 104.3
400 4.97 .00 .0 97.8 2.2 31.7 100.7
LS 200 4.90 .00 .0 90.8 9.2 24.3 73.8
400 4.95 .00 .0 95.2 4.8 24.3 74.4

(II) US 200 4.93 .00 .2 92.5 7.2 50.8 160.3
400 4.99 .00 .0 99.2 .8 42.6 122.8
PS 200 4.32 .00 .0 52.5 47.5 48.2 147.7
400 4.76 .00 .0 78.8 21.2 44.7 129.4
SS(0.25) 200 4.93 .19 13.8 80.5 5.8 115.4 364.9
400 4.98 .00 .0 98.2 1.8 166.5 500.0
SS(0.50) 200 4.99 .98 98.0 2.0 .0 84.7 267.6
400 5.00 1.00 99.5 .5 .0 127.6 371.1
SS(0.75) 200 4.96 .03 3.0 93.5 3.5 111.5 356.4
400 5.00 .00 .2 99.8 .0 166.6 490.4
LS 200 4.82 0.96 95.2 2.5 2.2 101.9 318.5
400 4.93 0.98 98.2 1.8 0.0 142.1 423.2

(III) US 200 4.94 0.01 1.0 93.8 5.2 33.5 105.1
400 4.99 .00 .0 99.0 1.0 32.3 96.2
PS 200 4.42 .00 .0 58.2 41.8 28.1 89.5
400 4.81 .00 .0 83.2 16.8 27.1 82.5
SS(0.25) 200 4.95 1.00 99.2 .8 .0 94.5 296.1
400 4.98 .99 99.0 .8 .2 141.2 429.6
SS(0.50) 200 4.98 1.94 100.0 .0 .0 78.5 242.8
400 4.99 1.94 99.8 0.2 .0 113.9 341.5
SS(0.75) 200 4.96 0.98 98.0 1.8 0.2 95.2 297.3
400 4.98 1.00 99.8 .2 .0 143.9 428.3
LS 200 4.78 1.05 63.2 24.5 12.2 90.1 277.0
400 4.79 .36 24.0 60.5 15.5 133.1 394.6

From Table 1, we see that in set-up (I) where a linear model with normal random error holds, US, SS(0.25), SS(0.50), SS(0.75) and LS have quite similar performance in terms of model selection. The PCF’s are all above 90%. Among them, LS shows the best estimation accuracy. Nevertheless, this advantage over the other approaches is only moderate. Note that the oracle estimator adopted here is defined based on the quantile regression model (1). Hence it is reasonable to observe MREEO below 0.8 for LS. Our finding on the superior estimation accuracy of LS estimator is consistent with the theory in Zou and Yuan (2008). It is important to note that the PS approach, though demonstrating good estimation accuracy as the other methods, shows a rather high tendency to produce an overfitted model, with POF=53.5% and 40.2% for n = 200 and 400 respectively. This strongly suggests that the intuitive PS approach may fail to generate consistent model selection.

In set-ups (II) and (III) where a standard linear model does not hold, LS has poor performance in identifying covariates that have effects on the response with PCF less than 65% when n = 400. The inconsistent shrinkage behavior also leads to inflated estimation errors. For example, MREEF for the LS estimator even exceeds 1 when n = 400. As in set-up (I), PS remains to have the overfitting problem in these configurations. When covariates with partial effects are present, the penalized “local” quantile regression method SS(τ) can be troublesome. For example, in set-up (II), Z(1) impacts Y at all quantile levels except for τ = 0.5. When one simply applies the penalized median regression, Z(1) is identified as an unimportant covariate and its effects on other quantiles of Y are overlooked. This, reflected by NIC=1 and a low PCF of 2%, indicates a need to conduct a more thorough assessment of covariates’ impact across various quantiles. This message is more clear in set-up (III) where Z(1) and Z(2) have partial effects on two disjoint quantile ranges. Finally, we note that in either set-up (II) or (III), US demonstrates satisfactory performance; PCF is above 99% when n = 400 and MREEO decreases to approach 1 as n increases.

We also considered situations where Y is subject to random censoring. In the new set-ups labeled as (IV), (V), (VI), Y and Z were generated in the same way as in set-ups (I), (II), (III) respectively, except that the intercept coefficient function in set-up (IV) is set as the quantile function of the extreme value distribution. As a result, the relationship between Y and Z in set-up (IV) satisfies a linear model with extreme value error distribution and also a proportional hazards model. The censoring variable, C, was generate as Unif(−3, 14), Unif(−2, 19), and Unif(−2, 16) respectively in set-ups (IV), (V), and (VI) to produce 20% censoring. We set τL = 0.1 and τU = 0.8.

The simulation results for the censored cases are presented in Table 2 in the same manner as that for Table 1. The proposed method for censored data is now denoted by USC, while the methods that we compared USC with include PSC, SSC(0.25), SSC(0.50), and SSC(0.75), which are modified PS, SS(0.25), SS(0.5), and SS(0.75) that adjust for censoring. As a counter-part of LS in the censoring case, we considered adaptive LASSO for accelerated failure time (AFT) model, which was implemented by the method of least squares approximation (Wang and Leng, 2007) based on the log-rank estimator (Tsiatis, 1990; Wei and Gail, 1983). This approach is denoted by AAFT. In addition, we considered adaptive-LASSO estimator for the Cox proportional hazards model (Zhang and Lu, 2007), denoted by PH. From Table 2, we see that the empirical results support the oracle properties of β̂C,nC,n (·). Though PCF’s may be lower than those observed in the uncensored case, which is not surprising given the information loss due to censoring, there is still a clear trend of PCF approaching 100 as n increases. The pointwise penalization approach remains to demonstrate a tendency of producing an overfitted model. Findings on the penalized local quantile regression approach are the same as those from Table 1. AAFT and PH perform well in set-up (IV) where data satisfy a linear model and a proportional hazards model. However, when the constant effect assumption is violated as in set-ups (V) and (VI), these methods can either overfit or underfit the model. By comparing PCF, MREEF and MREEO between USC and AAFT, we see that assuming covariate effects constant when they are indeed varying can lead to not only misleading variable selection results but also substantially inflated estimation errors which entail very biased prediction of quantile profile of Y under model (1).

Table 2.

Simulation results for censored data

Set-up Method n NC NIC PUF (%) PCF (%) POF (%) MREEF (%) MREEO (%)
(IV) USC 200 4.77 .00 .0 83.0 17.0 38.3 113.2
400 4.82 .00 .0 93.0 7.0 35.7 107.6
PSC 200 2.62 .00 .0 9.5 90.5 42.6 121.9
400 2.45 .00 .0 8.2 91.8 42.0 118.1
SSC(0.25) 200 4.63 .00 .0 73.8 26.2 46.8 132.9
400 4.62 .00 .0 76.5 23.5 42.8 127.4
SSC(0.50) 200 4.76 .00 .0 83.2 16.8 31.7 97.0
400 4.72 .00 .0 81.0 19.0 33.4 99.8
SSC(0.75) 200 4.66 .00 .0 76.0 24.0 32.1 94.6
400 4.7 .00 .0 80.8 19.2 31.7 93.5
AAFT 200 4.75 .00 .0 77.8 22.2 26.4 78.1
400 4.84 .00 .0 85.0 15.0 24.5 72.4
PH 200 4.85 .00 .0 86.2 13.8 37.9 112.2
400 4.91 .00 .0 91.0 9.0 33.6 99.4

(V) USC 200 4.88 .00 .2 89.5 10.2 54.1 169.5
400 4.97 .00 .0 97.5 2.5 44.1 128.5
PSC 200 3.96 .00 .0 38.2 61.8 48.9 151.1
400 4.48 .00 .0 59.0 41.0 46.1 136.9
SSC(0.25) 200 4.93 .31 21.5 73.5 5.0 93.1 302.7
400 4.98 .02 1.8 96.0 2.2 124.5 379.3
SSC(0.50) 200 4.97 .95 94.5 5.2 .2 70.6 226.3
400 5.00 .97 96.8 3.2 .0 98.9 300.2
SSC(0.75) 200 4.78 .06 6.5 75.0 18.5 112.2 360.0
400 4.94 .00 .2 94.0 5.8 158.1 477.5
AAFT 200 4.78 0.56 55.5 33.8 10.8 100.9 320.4
400 4.80 0.38 38.0 46.5 15.5 143.1 438.4
PH 200 4.63 .26 25.8 51.5 22.8
400 4.75 .02 2.5 75.8 21.8

(VI) USC 200 4.86 .11 11.2 78.0 10.8 34.9 111.5
400 4.98 .02 2.0 95.8 2.2 30.9 92.6
PSC 200 3.64 .00 .2 24.5 75.2 32.3 101.5
400 4.05 .00 .0 35.2 64.8 30.7 93.8
SSC(0.25) 200 4.96 1.05 98.5 1.5 .0 68.6 221.1
400 4.99 1.00 100.0 .0 .0 103.4 309.2
SSC(0.50) 200 4.91 1.8 98.5 1.2 .2 63.5 200.9
400 4.96 1.83 99.0 1.0 .0 93.6 277.3
SSC(0.75) 200 4.69 .93 92.0 4.2 3.8 92.6 294.6
400 4.86 .95 95.2 3.2 1.5 136.2 414.2
AAFT 200 4.60 1.03 89.2 4.2 6.5 81.9 261.6
400 4.75 0.92 87.0 7.8 5.2 120.0 349.5
PH 200 4.82 .99 89.8 6.8 3.5
400 4.87 .80 79.8 15.5 4.8

3.2 Applications to real datasets

Example 1

We considered the Air Pollution to Mortality Data (McDonald and Schwing, 1973), which are publicly available in the R package SMPracticals. This dataset contains information about a measure of total mortality (y) and 15 independent variables on 60 US metropolitan areas in 1959-1961. The 15 variables are precipitation (x1), January temperature (x2), July temperature(x3), the percentage of age 60 or older(x4), population per household(x5), education(x6), the percentage of soundhouse (x7), population per square mile (x8), the percentage of non-white (x9), the percentage of white collar (x10), the percentage of family income under 3000 dollars (x11), HC potential (x12), NOx potential (x13), SO2 potential (x14), and relative humidity (x15). Of scientific interest is to predict y given x1, …, x15.

We applied the methods, including US, PS, SS(0.25), SS(0.5), SS(0.75) and LS, to the air pollution dataset. To evaluate different methods, we computed prediction errors (PE) as follows. First, we splitted the data randomly into a training set comprising 30 observations and a test set containing the rest 30 observations. We applied the method under evaluation to the training dataset and obtained a shrinkage estimator of β0(τ), denoted by β̂train(τ). Then the prediction error was calculated as

PE=i=1nI(iin Test Set)Δρ(YiZiTβ^train(τ);τ)dτi=1nI(iin Test Set).

In our analysis, all variables were standardized to have mean zero and unit variance. We set Δ = [0.1, 0.9]. In Table 3, we present the selected variables and average PE based on 10,000 replications of random splitting of training and test sets (APE).

Table 3.

Example 1: Analysis of the air pollution to mortality data

Method Selected variables APE
US x1, x9, x10, x14 0.215
PS x1, x2, x3, x5, x6, x7, x8, x9, x10, x12, x13, x14, x15 0.345
LS x1, x2, x6, x9, x13 0.277
SS(0.25) x1, x3, x9, x10, x13, x14 0.243
SS(0.5) x1, x8, x9, x10, x12, x14 0.239
SS(0.75) x6, x8, x9, x12, x14 0.255

From Table 3, we first note that PS may retain too much variables; a total of 13 variables are included in the final model. This result is consistent with the PS’s overfitting behavior observed in the simulation studies. The proposed method, US, may yield the most parsimonious model with only 4 variables, which is associated with the smallest APE. To better understand the discrepancy in selected variables among different approaches, we plot in Figure 2 the coefficient functions from fitting model (1) with all covariates along with the corresponding pointwise 95% confidence intervals. We also depict the least square coefficient estimates and the corresponding 95% confidence intervals by horizonal lines. It is observed from the regression quantile curves that the effects of x10 and x14 demonstrate apparent non-constant patterns with different signs at lower quantiles and upper quantiles, while x1 and x9 exhibit fairly consistent significant effects in the whole τ-range of interest. This finding may help explain why LS selected x1 and x9 but not x10 and x14. The variable selection results from penalized local quantile regression are closer to that from US compared to PS and LS. The APE’s from SS(0.25), S(0.50), S(0.75) are all slightly larger than the APE from US.

Fig. 2.

Fig. 2

Air pollution example: Regression quantiles (bold dotted lines) and the corresponding pointwise 95% confidence intervals (shaded areas), and least square estimates (horizontal solid lines) and the corresponding pointwise 95% confidence intervals (horizonal dashed lines).

Example 2

We also illustrated the proposed method for randomly censored data to the dialysis example discussed in §1. A more detailed study background can be found in Kutner et al (2002) and Peng and Huang (2008). The dataset contains the information on the vital status of 191 incident dialysis patients up to December of 2005, and 14 covariates including severity of restless leg syndrome (x1), patient’s age at study admission (x2), patient’s BMI at study admission (x3), baseline dialysis modality (x4), baseline hematocrit (x5), baseline mental health scale score (x6), baseline ferritin concentration (x7), baseline serum albumin (x8), primary diagnosis of diabetes (x9), indicator for cardiovascular comorbidity (x10), indicator for high education (x11), gender (x12), indicator for fish eater (x13), and race (x14). One goal of this study is to investigate the risk factors for dialysis mortality, assessed by time from study admission to death, subject to censoring due to renal transplantation or end of the study. The censoring rate in this dataset is 35%. In our analysis, the response variable y is taken as the log transformed survival time and Δ is set as [0.1, 0.9]. We excluded ten subjects with missing covariates. The calculation of prediction error is slightly modified for the censored case. Specifically, the training set consists of 91 randomly selected observations, and

PE=i=1nI(iin Test Set)Δδiρ(XiZiTβ^train(τ);τ)/G^(Xi)i=1nδiI(iin Test Set)/G^(Xi)

Here the Kaplan-Meier estimator, Ĝ(·), is computed based on the test dataset.

In Table 4, we summarize the results from the methods, US, PS, SS(0.25), SS(0.50), SS(0.75), AAFT, and PH. In Figure 3, we plot the regression quantiles and the estimates of AFT model coefficients when all covariates are included. It is seen from Table 4 that, as in example 1, PS tends to select more variables than the other methods, while US renders the smallest APE. With a particular interest in the impact of severity of restless leg syndrome on dialysis survival, we note that x1 was selected by US and SS(0.25) but not SS(0.50), SS(0.75), AAFT and PH. This result may be explained by the regression quantile curve for x1 in Figure 3. That is, the effect of x1 on the τ-th quantile of y may have a changing trend with τ; the coefficient magnitude decreases and approaches 0 around τ = 0.6. As a result, the penalization method based on local quantile regression at a large τ would fail to identify restless leg syndrome as an important risk factor for predicting the mortality of dialysis patients. Fitting models that only allow for constant effects, such as the Cox proportional hazards model and the AFT model, also appears to attenuate the influence of restless leg syndrome on dialysis survival, thereby excluding x1 in the final model. This example, again, demonstrates the importance of appropriately accounting for varying covariate effects in identifying important variables and building parsimonious predictive models.

Table 4.

Example 2: Analysis of the dialysis data

Method Selected variables APE
US x1, x2, x5, x14 0.236
PS x1, x2, x3, x4, x5, x11, x14 0.244
SS(0.25) x1, x14 0.244
SS(0.50) x2, x14 0.242
SS(0.75) x2, x11, x14 0.243
AAFT x2, x4, x10, x13, x14 0.252
PH x2, x4, x8, x10, x11, x13, x14 0.258
Fig. 3.

Fig. 3

Dialysis example: Regression quantiles (dotted lines) and the corresponding pointwise 95% confidence intervals (shaded areas), and penalized AFT estimates (horizontal solid lines) and the corresponding pointwise 95% confidence intervals (horizonal dashed lines).

4 Remarks

Meaningful heterogeneous associations between covariates and response are naturally embedded in many real datasets but are precluded by common regression models with constant coefficients. Some of empirical work presented in the paper shows that making inference based on a constant coefficient model in the presence of varying covariate effects can lead to misleading variable selection results and also deteriorated prediction accuracy.

In this work, we adopt a global quantile regression model as the platform to address potential inhomogeneity in the association of interest. This choice of model facilitates the development of the proposed variable selection method by avoiding smoothing and enabling stable and efficient computation.

Our empirical work also strongly suggests that fitting penalized “local” quantile regression pointwisely would often lead to an overfitted model. The intuition for this phenomenon shares similar flavor with the multiple comparison problem. To be more clear, consider the problem of testing the null hypothesis, H0:β0(j)(τ)=0 for all τΔ. The estimation of β0(j)(·) based on the pointwise penalization approach is analogous to the testing procedure that tests H0(τ):β0(j)(τ)=0 for each τΔ and rejects H0 when H0(τ) is rejected at any τΔ. In this view, the chance that the pointwise estimator of a true zero coefficient function fails to shrink uniformly in τ would be analogous to the type-I error in the above naive test procedure, which can be significantly inflated particularly when Δ contains an infinite number of τ.

For randomly censored data, the proposed shrinkage method that assumes the independence between the censoring variable C and Z can be generalized to allow for covariate-dependent censoring. For example, one may replace the Ĝ(Xi) in Vn(β; τ) by some adhoc estimator of G(tZ) ≡ pr(C > tZ) obtained via reasonable parametric or semiparametric modeling of G(·∣Z).

Acknowledgments

The authors are grateful to the editor, associate editor, and the two referees for many helpful comments. This research has been supported by the National Heart, Lung, And Blood Institute of the National Institute of Health under Award Number R01HL 113548 (the first author). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of Health.

Appendix A: Proof of Theorem 1

Define Bn,τ (C) = {β : β = β0(τ)+(n−1/22n)u,∥u∥ ≤ C}, and let ∂Bn,τ (C) denote the boundary set of Bn,τ (C). Since Wnn(β; τ) is convex in β for all τΔ, it is sufficient to show that for any ε > 0, there exists C0 > 0 and N0 > 0 such that for nN0,

pr(infτΔ[infβBn,τ(C0)Wn,λn(β;τ)Wn,λn{β0(τ);τ}]>0)1ε. (2)

To show equation (2), first note that

n1Wn,λn(β;τ)n1Wn,λn(β0(τ);τ)=n1i=1n[ρ(YiZiTβ;τ)ρ{YiZiTβ0(τ);τ}]+n1λnj=2p{|β(j)|wn,j(τ)|β0(j)(τ)|wn,j(τ)}I+II

Write ei(τ)=YiZiTβ0(τ) and u(τ ; β) = ββ0(τ), and let D and Di respectively denote operators such that D(U) = UE(U) and Di(U) = UE(UZi) for a random variable U. We can further decomposed the term I as I = III + IV, where

III=E[I{0<ei(τ)<ZiTu(τ;β)}{ei(τ)+ZiTu(τ;β)}]+E[I{ZiTu(τ;β)<ei(τ)<0}{ei(τ)ZiTu(τ;β)}]andn·(IV)=i=1nD(I{0<ei(τ)<ZiTu(τ;β)}ZiTu(τ;β)ei(τ)ZiTu(τ;β)ZiT)u(τ;β)+i=1nD(I{ZiTu(τ;β)<ei(τ)<0}ei(τ)ZiTu(τ;β)ZiTu(τ;β)(ZiT))u(τ;β)+i=1nI{ZiTu(τ;β)>0}Di(I{ei(τ)>ZiTu(τ;β)}){τZiTu(τ;β)}+i=1nI{ZiTu(τ;β)>0}Di(I{ei(τ)<0}){(1τ)ZiTu(τ;β)}+i=1nI{ZiTu(τ;β)>0}Di(I{0<ei(τ)<ZiTu(τ;β)}){τZiTu(τ;β)}+i=1nI{ZiTu(τ;β)<0}Di(I{ei(τ)<ZiTu(τ;β)}){(1τ)ZiTu(τ;β)}+i=1nI{ZiTu(τ;β)<0}Di(I{ei(τ)>0}){τZiTu(τ;β)}+i=1nI{ZiTu(τ;β)<0}Di(I{ZiTu(τ;β)<ei(τ)<0}){(1τ)ZiTu(τ;β)}j=18n·(IVj)

For the term IV1, we can show that the function class, F={I{0<ei(τ)<ZiTu(τ;β)}ZiTu(τ;β)ei(τ)ZiTu(τ;β)Zi,τΔ,βA(C0)}, is a Donsker class (page 81 in Van der Vaart and Wellner (2000)), where A(C0) = {bRp : infτΔbβ0(τ)∥ ≤ C0}. Given the uniform boundedness of the functional class F and since A(C0) covers Bn,τ (C0) for all τΔ, we can apply the functional law of the iterated logarithm (LIL) (Goodman et al, 1981) to n1i=1nD(I{0<ei(τ)<ZiTu(τ;β)}ZiTu(τ;β)ei(τ)ZiTu(τ;β)ZiT), and get

supτΔ,βBn,τ(C0)|IV1|Op(n1/22n)(C0n1/22n).

Similarly, we can show that, for j = 2, …, 8,

supτΔ,βBn,τ(C0)IVjOp(n1/22n)(C0n1/22n).

Therefore, we have

supτΔ,βBn,τ(C0)|IV|Op(n1/22n)(C0n1/22n). (3)

For the term III, note that, under condition C2 (i), infτΔ, z fτ (0∣z) > 0. By the definition of Bn,τ (C0), there exists N1 > 0 such that for nN1,

supβBn,τ(C0),z|zTu(τ;β)|δ1infτΔ,z{fτ(0|z)/(2A0)}.

This, coupled with condition C2 (ii), implies that for n > N1, fτ(xz) > infτΔ, z fτ (0∣z)/2 for any x ∈ (0, zT u(τ ; β)) with β∂Bn,τ (C0). Let δ2 ≡ infτ, zfτ (0∣z)/2 and δ3 ≡ eig min E (ZZT), where eig min(·) denotes the minimum eigenvalue of a matrix. Let EZ(·) denote the expectation with regard to Z. We get, for any τΔ and β∂Bn,τ (C0),

E[I{0<ei(τ)<ZiTu(τ;β)}{ei(τ)+ZiTu(τ;β)}]=EZ[0ZTu(τ;β){x+ZTu(τ;β)}fτ(xZ)],δ2EZ[0ZTu(τ;β){x+ZTu(τ;β)}dx]=δ2EZ{12u(τ;β)T(ZZT)u(τ;β)}12δ2δ3C02(n1/22n)2>0.

Since a similar result can be shown for the second term in III, it follows that

infβBn,τ(C0),τΔIIIC02Op((2n)2/n). (4)

For the term II, it is easy to see that

IIλnnj=2s|β(j)β0(j)(τ)|wn,j(τ).

By the uniform consistency of β̃n (τ), for 2 ≤ js, supτΔ|wn,j(τ)|=(supτΔ|βn(j)(τ)|)1=Op(1). Therefore, with (n1/2 2n)−1 λn = O(1),

infβBn,τ(C0),τΔIIC0·Op((2n)2/n). (5)

Based on equations (3), (4), and (5), it follows that

infτΔ,βBn,τ(C0)[n1Wn,λn(β;τ)n1Wn,λn{β0(τ);τ}]C02·Op((2n)2/n)C0·Op((2n)2/n).

Therefore, (2) holds if we choose C0 large enough. This completes the proof of Theorem 1.

Appendix B: Proof of Theorem 2

Define sgn(x) = I(x > 0) − I(x < 0), Un,j(β;τ)=Wn,λn(β;τ)β(j), and let Un(β; τ) = (Un,1 (β, τ), …, Un,p (β, τ))T. Define μj(β)=E[Zi(j){τI(YiZiTβ<0)}], μ (β) = (μ1 (β), …, μp (β))T, and A (β) = ∂μ (β)/∂β ≡ (A1(β)T, …, Ap(β)T)T.

First, by Theorem 1, for j = 2, …, s, we have

limnpr(supτΔ|β^n,λnUS(j)(τ)|=0)limnpr(supτΔ|β^n,λnUS(j)(τ)β0(j)(τ)|supτΔ|β0(j)(τ)|)=0. (6)

Next, we note that given the uniform consistency of β^n,λnUS(τ) implied by Theorem 1, it can be shown by following the lines of Lemma 1 in Peng and Huang (2008) that

supτΔ|n1/2i=1nZi(j)[τI{YiZiTβ^n,λnUS(τ)<0}]+n1/2i=1nZi(j)[τI{YiZiTβ0(τ)<0}] (7)
n1/2[μj{β^n,λnUS(τ)}μj{β0(τ)}]|=op(1),j=1,,s. (8)

This implies, for j = s + 1, …, p,

supτΔ|En,j(τ)|supτΔn1/2Un,j{β^n,λnUS(τ);τ}n1/2Un,j{β0(τ);τ}n1/2[μj{β^n,λnUS(τ)}μj{β0(τ)}]n1/2λnwn,j(τ)sgn{β^n,λnUS(j)(τ)}=op(1) (9)

By the definition of β^n,λnUS(τ),

supτΔ|n1/2Un,j{β^n,λnUS(τ);τ}|=op(1). (10)

Applying Functional LIL to n−1 Un,j{β0(τ); τ} gives

supτΔ|n1/2Un,j{β0(τ);τ}|=Op(2n). (11)

An application of Taylor expansion to μj{β^n,λnUS(τ)}μj{β0(τ)}, coupled with Theorem 1 and the uniform boundedness of Aj (β), shows that

supτΔ|n1/2[μj{β^n,λnUS(τ)}μj{β0(τ)}]|=Op(2n). (12)

In addition, the proof of Theorem 1 can be used to justify that supτΔ|βn(j)(τ)|=Op(n1/22n) and thus

limn(12n)(n1/2λnwn,j(τ))=. (13)

By (10), (11), (12), and (13), with a fixed M > 0, limn→∞ pr(supτΔ |En,j (τ)| ≤ M, supτΔ|β^n,λnUS(j)(τ)|0)=0. This, coupled with (9), implies that, for j = s + 1, …, p,

limnpr(supτΔ|β^n,λnUS(j)(τ)|0)limn{pr(supτΔ|En,j(τ)|M,supτΔ|β^n,λnUS(j)(τ)|0) (14)
+pr(En,j(τ)>M)}=0. (15)

The proof of Theorem 2 (i) is completed based on (6) and (14).

Let =a denote asymptotic equivalence in the sense that the difference converges to zero in probability uniformly in τΔ. Define β¯n(τ)=(β^n,λnUS(1:s)(τ)0). By the result in Theorem 2 (i), we have

n1/2Un(β¯n;τ)=an1/2Un(β^n,λnUS;τ)

and hence n1/2Un(β¯n;τ)=a0. Using the result in (7) and applying Taylor expansion to μj {β̄n (τ)} − μj {β0 (τ)}, we get

0=an1/2Un(β¯n;τ)=an1/2i=1nZi[τI{YiZiTβ0(τ)<0}]+[A{β(τ)}]n1/2{β¯n(τ)β0(τ)}+n1/2λnbn(τ), (16)

where β̆(τ) is on the line segment between β0 (τ) and β̄(τ), and bn(τ)=(wn,1(τ)sgn{β^n,λnUS(1)(τ)},wn,s(τ)sgn{β^n,λnUS(s)(τ)},00)T.

Since limn→∞ n−1/2 λn = 0, supτΔ |wn,j (τ)| = Op(1) for j = 1, …, s, and A{β(τ)}=aA{β0(τ)} given the uniform convergence of β^n,λnUS(τ) to β0 (τ), it follows from (16) that

n1/2{β^n,λnUS(1:s)(τ)β0(1:s)(τ)}=aA11{β0(τ)}n1/2i=1nZi(1:s)I{YiZiTβ0(τ)<0}, (17)

where A11 (·) stands for the submatrix of A (·) formed by the first s rows and columns. An application of the Donsker theorem based on (17) thus shows that n1/2{β^n,λnUS(1:s)(τ)β0(1:s)(τ)} converges weakly to a mean zero Gaussian process with the covariance matrix,

(τ,τ)=E[A11{β0(τ)}Zi(1:s)Zi(1:s)TA11T{β0(τ)}I{YiZiTβ0(τ)<0}I{YiZiTβ0(τ)<0}].

Using similar steps, we can show that (17) still holds when β̂oracle (τ) is in place of β^n,λnUS(1:s)(τ). This completes the proof of Theorem 2.

Contributor Information

Limin Peng, Department of Biostatistics and Bioinformatics, Emory University, 1518 Clifton Rd. NE, Atlanta, USA.

Jinfeng Xu, Department of Statistics and Applied Probability, National University of Singapore, Singapore 117546.

Nancy Kutner, Department of Rehabilitation Medicine, Emory University, Atlanta, USA 30322.

References

  1. Belloni A, Chernozhukov V. l1 penalized quantile regression in high-dimensional sparse models. Annals of Statistics. 2011;82:82–130. [Google Scholar]
  2. Carey JR, Liedo P, Orozco D, Tatar M, Vaupel JW. A male-female longevity paradox in medfly cohorts. The Journal of Animal Ecology. 1995;64:107–116. [Google Scholar]
  3. Dickson E, Grambsch P, Fleming T, Fisher L, Langworthy A. Prognosis in primary biliary cirrhosis: Model for decision making. Hepatology. 1989;10:1–7. doi: 10.1002/hep.1840100102. [DOI] [PubMed] [Google Scholar]
  4. Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the Americian Statistical Association. 2001;96:1348–1360. [Google Scholar]
  5. Frank I, Friedman J. A statistical view of some chemometrics regression tools. Technometrics 1993 [Google Scholar]
  6. Goodman V, Kuelbs J, Zinn J. Some reults on the lil in banach space with applications to weighted empirical processes. Annals of Probability. 1981;9:713–752. [Google Scholar]
  7. Huang Y. Calibration regression of censored lifetime medical cost. Journal of the American Statistical Association. 2002;98 [Google Scholar]
  8. Huang Y. Quantile Calculus and Censored Regression. The Annals of Statistics. 2010;38(3):1607–1637. doi: 10.1214/09-aos771. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Jensen G, Torp-Pedersen C, Hildebrandt P, Kober L, Nielsen F, Melchior T, Joen T, Andersen P. Does in-hosipital fibrillation affect prognosis after myocardial infarction? European Heart Journal. 1997;18:919–924. doi: 10.1093/oxfordjournals.eurheartj.a015379. [DOI] [PubMed] [Google Scholar]
  10. Kai B, Li R, Zou H. New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear models. Annals of Statistics. 2011;39:305–332. doi: 10.1214/10-AOS842. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Kaslow R, Ostrow D, Detels R, Phair J, Polk B, Rinaldo C. The multicenter aids cohort study: rationale, organization and selected characteristics of the participants. American Journal of Epidemiology. 1987;126:310–318. doi: 10.1093/aje/126.2.310. [DOI] [PubMed] [Google Scholar]
  12. Knight K, Fu W. Asymptotics for lasso-type estimators. The Annals of Statistics. 2000:1356–1378. [Google Scholar]
  13. Koenker R. Quantile regression. Cambridge University Press; 2005. [Google Scholar]
  14. Koenker R. quantreg: Quantile regression. 2011 http://wwwr-projectorg.
  15. Koenker R, Bassett G. Regression quantiles. Econometrica. 1978;46:33–50. [Google Scholar]
  16. Koenker R, d’Orey V. Computing regression quantiles. Applied Statistics. 1987;36:383–393. [Google Scholar]
  17. Kutner N, Clow P, Zhang R, Aviles X. Association of fish intake and survival in a cohort of incident dialysis patients. American Journal of Kidney Diseases. 2002;39:1018–1024. doi: 10.1053/ajkd.2002.32775. [DOI] [PubMed] [Google Scholar]
  18. Li Y, Zhu J. Quantile regression in reproducing kernel hilbert spaces. Journal of Computational Graphical Statistics. 2005;17:163–185. [Google Scholar]
  19. Lustig I, Marsden R, Shanno D. Interior point methods for linear programming: Computational state of the art with discussion. ORSA Journal on Computing. 1994;6:1–14. [Google Scholar]
  20. Ma Y, Yin G. Semiparametric median residual life model and inference. Canadian Journal of Statistics. 2010;38:665–679. [Google Scholar]
  21. Madsen K, Nielsen H. A finite smoothing algorithm for linear l1 estimation. SIAMJ Optimization. 1993;3:223–235. [Google Scholar]
  22. McDonald G, Schwing R. Instabilities of regression estimates relating air pollution to mortality. Technometrics. 1973;15:463–481. [Google Scholar]
  23. Neocleous T, Vanden Branden K, Portnoy S. Correction to censored regression quantiles by portnoy, s. (2003), 1001–1012. J Am Statist Assoc. 2006;101(474):860–861. [Google Scholar]
  24. Peng L, Fine J. Competing risks quantile regression. Journal of the American Statistical Association. 2009;104 [Google Scholar]
  25. Peng L, Huang Y. Survival analysis with quantile regression models. J Am Statist Assoc. 2008;103(482):637–649. [Google Scholar]
  26. Portnoy S. Censored regression quantiles. J Am Statist Assoc. 2003;98(464):1001–1012. [Google Scholar]
  27. Portnoy S, Lin G. Asymptotics for censored regression quantiles 2010 [Google Scholar]
  28. Rocha G, Wang X, Yu B. Asymptotic distribution and sparsistency for l1- penalized parametric mestimators with applications to linear svm and logistic regression. 2009 http://arxivorg/abs/09081940.
  29. Thorogood J, Persijn G, Schreuder G, D’amaro J, Zantvoort F, Van Houwelingen J, Van Rood J. The effect of hla matching on kidney graft survival in separate posttransplantation intervals. Transplantation. 1990;50:146–150. doi: 10.1097/00007890-199007000-00027. [DOI] [PubMed] [Google Scholar]
  30. Tibshirani R. Regression shrinkage and selection via the lasso. Journal of Royal Statistical Society: Series B. 1996;58:267–288. [Google Scholar]
  31. Tsiatis AA. Estimating regression parameters using linear rank tests for censored data. Ann Statist. 1990;18:354–372. [Google Scholar]
  32. Van der Vaart A, Wellner J. Weak Convergence and Empirical Processes: with Applications to Statistics. Springer Verlag; 2000. [Google Scholar]
  33. Verweij P, Van Houwelingen H. Time-dependent effects of fixed covariates in cox regression. Biometrics. 1995;51:1550–1556. [PubMed] [Google Scholar]
  34. Wang H, Leng C. Unified lasso estimation by least squares approximation. Journal of the American Statistical Association. 2007;102:1039–1048. [Google Scholar]
  35. Wang H, Xia Y. Shrinkage estimation of the varying coefficient model. Journal of the Americal Statistical Association. 2009;104:747–757. [Google Scholar]
  36. Wei LJ, Gail MH. Nonparametric estimation for a scale-change with censored observations. J Am Statist Assoc. 1983;78:382–388. [Google Scholar]
  37. Wu Y, Liu Y. Variable selection in quantile regression. Statistica Sinica. 2009;19:801–817. [Google Scholar]
  38. Ying Z. A large sample study of rank estimation for censored data. Annals of Statististics. 1993:76–99. [Google Scholar]
  39. Zhang H, Lu W. Adaptive lasso for cox’s proportional hazards model. Biometrika. 2007;94:691–703. [Google Scholar]
  40. Zou H. The adaptive lasso and its oracle properties. Journal of the Americal Statistical Association. 2006;101:1418–1429. [Google Scholar]
  41. Zou H, Yuan M. Composite quantile regression and the oracle model selection theory. Annals of Statistics. 2008;36:1108–1126. [Google Scholar]

RESOURCES