A Perturbation Method for Inference on Regularized Regression Estimates

Jessica Minnier; Lu Tian; Tianxi Cai

doi:10.1198/jasa.2011.tm10382

. Author manuscript; available in PMC: 2012 Jul 25.

Published in final edited form as: J Am Stat Assoc. 2012 Jan 24;106(496):1371–1382. doi: 10.1198/jasa.2011.tm10382

A Perturbation Method for Inference on Regularized Regression Estimates

Jessica Minnier ¹, Lu Tian ², Tianxi Cai ³

PMCID: PMC3404855 NIHMSID: NIHMS390937 PMID: 22844171

Abstract

Analysis of high dimensional data often seeks to identify a subset of important features and assess their effects on the outcome. Traditional statistical inference procedures based on standard regression methods often fail in the presence of high-dimensional features. In recent years, regularization methods have emerged as promising tools for analyzing high dimensional data. These methods simultaneously select important features and provide stable estimation of their effects. Adaptive LASSO and SCAD for instance, give consistent and asymptotically normal estimates with oracle properties. However, in finite samples, it remains difficult to obtain interval estimators for the regression parameters. In this paper, we propose perturbation resampling based procedures to approximate the distribution of a general class of penalized parameter estimates. Our proposal, justified by asymptotic theory, provides a simple way to estimate the covariance matrix and confidence regions. Through finite sample simulations, we verify the ability of this method to give accurate inference and compare it to other widely used standard deviation and confidence interval estimates. We also illustrate our proposals with a data set used to study the association of HIV drug resistance and a large number of genetic mutations.

Keywords: High dimensional regression, Interval estimation, Oracle property, Regularized estimation, Resampling methods

1. INTRODUCTION

Accurate prediction of disease outcomes is fundamental for successful disease prevention and treatment selection. Recent advancement in biological and genomic research has led to the discovery of a vast number of new markers that can potentially be used to develop molecular disease prevention and intervention strategies. For example, gene expression analyses have identified molecular subtypes that are associated with differential prognosis and response to treatment for breast cancer patients (Perou et al. 2000; Dent et al. 2007). For non-small cell lung cancer patients, a composite score consisting of several biological markers including cyclin E and Ki-67 was shown to be highly predictive of patient survival (Dosaka-Akita et al. 2001). However, construction of accurate prediction models with a panel of markers is a difficult task in general. For example, statistical models for calculating individual cancer risk have been developed for a few types of cancer in the past two decades (Gail et al. 1989; Thompson et al. 2006; Cassidy et al. 2008; Freedman et al. 2009). However, much refinement is needed even for the best of these models due to their limited discriminatory accuracy (Spiegelman et al. 1994; Gail and Costantino 2001).

The increasing availability of new potential markers, while holding great promise for better prediction of disease outcomes, imposes challenges to model development due to the high dimensionality in the feature space and the relatively small sample size. To improve prediction with a large number of promising genomic or biological markers, an important step is to build a parsimonious model that only includes important markers. Such a model could reduce the cost associated with unnecessary marker measurements and improve the prediction precision for future patients. For such purposes, various regularization procedures such as the LASSO (Tibshirani 1996; Knight and Fu 2000), the SCAD (Fan and Li 2001, 2002, 2004; Zhang et al. 2006), the adaptive LASSO (ALASSO; Zou 2006; Wang and Leng 2007), the Elastic Net (Zou and Hastie 2005; Zou and Zhang 2009), and one-step local linear approximation (LLA; Zou and Li 2008) have been developed in recent years. These procedures simultaneously identify non-informative variables and produce coefficient estimates for the selected variables to induce a model for prediction.

These regularization procedures, while effective for variable selection and stable estimation, yield estimators whose distributions are difficult to approximate. LASSO type estimators have a non-standard limiting distribution that depends on which components of the coefficient vector are zero. Since the LASSO type estimator is not consistent in variable selection, the limiting distribution cannot be estimated directly. Furthermore, standard bootstrap methods fail when the true coefficient vector is sparse (Knight and Fu 2000). Recently, Chatterjee and Lahiri (2010) proposed a truncated LASSO estimator whose distribution can be approximated using a residual bootstrap procedure. To overcome the difficulties in LASSO estimators, other regularized procedures such as the SCAD and ALASSO have been proposed. These estimators possess asymptotic oracle properties including perfect variable selection and super efficiency. However, our simulation results suggest that in finite samples, such oracle properties are far from being true and inference procedures based on asymptotic properties such as those given in Zou (2006) perform poorly especially when the signal to noise ratio (SNR) is high and the between covariate correlations are not low. Recently, Pötscher and Schneider (2009, 2010) developed theory on the coverage probabilities of the confidence intervals for ALASSO type estimators under the orthogonal design. It was shown that estimating the distribution function of the ALASSO estimator is not feasible when the true parameter is of similar magnitude to $n^{- \frac{1}{2}}$ , where n is the sample size. It is thus generally difficult to develop well performed confidence regions (CRs) and hypothesis testing procedures based on these regularized estimators. Such difficulties limit applicability to clinical studies where confidence in statistical evidence is crucial for clinical decision making.

In this paper, we propose resampling methods to derive CR and testing procedures for marker effects estimated from regularized procedures such as the ALASSO and one-step SCAD estimator when the true parameter is fixed. Our preliminary studies suggest that CRs constructed from such resampling procedures perform much better than their asymptotic based counterparts. When the fitted model is merely a working model, many frequently used estimation procedures may fail to produce stable parameter estimates. Procedures that can provide stable parameter estimates and valid interval estimates under a possibly misspecified working model are highly valuable when building a prediction model with high dimensional data. Our proposed procedures remain valid even if the fitted model fails to hold, provided that the employed objective function satisfies mild regularity conditions. The rest of the paper is organized as follows. In Section 2, we introduce the proposed perturbation resampling procedures and describe various methods for constructing confidence regions. In Section 3, we demonstrate the validity of the proposed procedures in finite samples via simulation studies. In Section 4, we illustrate our proposed procedure with an HIV drug resistance study where the goal is to predict phenotypic drug resistance levels using genotypic viral mutations.

2. RESAMPLING PROCEDURES

Suppose that y = (y₁, … y_n)^⊤ is the n × 1 vector of response variables and x_j = (x₁_i, …, x_pi)^⊤, i = 1 … n, are the predictors. Let X = [x₁, …, x_n]^⊤ be the n × p matrix of these covariates. Assume that the effect of x on y is determined via an objective function Inline graphic (θ; ) = ℓ(y, α + β^⊤x), where θ = (α, β^⊤)^⊤, α is an unknown location parameter, β is an unknown p × 1 vector of covariate effects, and = (y, x^⊤)^⊤. To assess the association between x and y, let $\tilde{L} (θ) = n^{- 1} \sum_{i = 1}^{n} L (θ; D_{i})$ be the objective function used to fit a regression model and θ̃ = (α̃, β̃^⊤)^⊤ = argmin_θ Inline graphic (θ). To obtain a regularized estimator for θ₀, we minimize the regularized objective function

\hat{L} (θ) = \tilde{L} (θ) + \sum_{j = 1}^{p} p_{λ_{n} j}^{'} (∣ {\tilde{β}}_{j} ∣) ∣ β_{j} ∣

(1)

where $p_{λ_{n} j}^{'} (∣ {\tilde{β}}_{j} ∣)$ is the derivative of a penalty p_{λ_nj} (|β_j|) evaluated at the initial estimate of β₀_j. We consider the cases where p_{λ_nj} (|β_j|) is the concave SCAD penalty or the L_q penalty for 0 < q < 1, and utilize a one-step estimator of these penalties with the local linear approximation (LLA) method proposed by Zou and Li (2008). Additionally, we consider the ALASSO penalty of Zou (2006) that arises when $p_{λ_{n} j}^{'} (∣ {\tilde{β}}_{j} ∣) = n^{- \frac{1}{2}} λ_{n} {∣ {\tilde{β}}_{j} ∣}^{- 1}$ .

2.1 Regularity Conditions

To ensure the asymptotic oracle properties of the regularized estimators and the validity of the proposed resampling procedures, we require the following set of conditions:

C1
ℙ{ (θ; )} has a unique minimum at θ₀ and a continuous secondary derivative with a positive definite = ∂²ℙ{ (θ; )}/∂θθ^⊤|_{θ = θ₀} > 0, where ℙ is the probability measure generated by the data = { , i = 1, …, n}.
C2
The class of functions indexed by θ, { (θ; ) | θ ∈ Ω}, is Glivenko-Cantelli (Kosorok 2008), where = (y, x^T )^T and Ω is the compact parameter space containing θ₀.
C3
There exists a “qausi-derivative” function (θ; ) for (θ; ) such that for any positive sequence δ_n → 0
1. ℙ{ (θ₀; )} = , a positive definite matrix.
2. $P {L (θ; D) - L (θ_{0}; D) - U (θ_{0}; D) (θ - θ_{0})} = \frac{1}{2} {(θ - θ_{0})}^{T} A (θ - θ_{0}) + o ({| | θ - θ_{0} | |}^{2})$ , where ||θ − θ₀|| ≤ δ_n.
3. $P_{n} {L (θ_{1}; D) - L (θ_{2}; D) - U (θ_{2}; D) (θ_{1} - θ_{2})} = \frac{1}{2} {(θ_{1} - θ_{2})}^{T} A (θ_{1} - θ_{2}) + o ({| | θ_{1} - θ_{2} | |}^{2} + n^{- 1 / 2} | | θ_{1} - θ_{2} | |)$ , almost surely, uniformly in ||θ₁ − θ₀|| ≤ δ_n, ||θ₂ − θ₀|| ≤ δ_n.

These conditions are parallel to the conditions required in Proposition A1–A3 in Jin et al. (2001). These regularity conditions hold for commonly used L₂ minimization with Inline graphic (β; ) = (y − β^⊤x)² and L₁ minimization with (β; ) = |y − β^⊤x|. Details of the justification for these two cases can be found in Section 3 of Jin et al. (2001). These conditions also guarantee that θ̃ is a consistent estimator of θ₀ and $n^{- \frac{1}{2}} (\tilde{θ} - θ_{0})$ converges in distribution to N(0, Inline graphic ). Let = {j : β₀_j ≠ 0} of size q and = {j : β₀_j = 0}, where a_j denotes the jth component of a vector a.

Following similar arguments to those given in Zou (2006), Zou and Li (2008) and the unconditional arguments given in the appendix, θ̂ = argmin_θ Inline graphic (θ) has ‘good’ properties for certain choices of λ_n, including the oracle property,

Lemma 1: (Oracle properties)

Suppose that λ_n → 0 and $λ_{n} n^{\frac{1}{2}} \to \infty$ . Then the regularized estimates must satisfy the following:

Consistency in variable selection: lim_nℙ{I ( = ) = 1} = 1, where = {j : β_j ≠ 0}.
Asymptotic normality: $n^{\frac{1}{2}} ({\hat{θ}}_{A} - θ_{0 A}) \to_{d} {}^{*}N (0, A_{11}^{- 1} B_{11} A_{11}^{- 1})$ , where and are the respective q × q submatrices of and corresponding to .

This lemma guarantees that the regularized estimate asymptotically chooses the correct model and has the optimal estimation rate. However, estimating the distribution of $n^{\frac{1}{2}} (\hat{θ} - θ_{0})$ in finite samples remains difficult. To estimate the standard errors of the SCAD estimates $\hat{θ} = {argmin}_{θ} {\tilde{L} (θ) + \sum_{j = 1}^{p} p_{λ_{n} j} (∣ β_{j} ∣)}$ when $\tilde{L} (θ) = n^{- 1} \sum_{i = 1}^{n} L (θ; D_{i})$ is smooth in θ, Fan and Li (2001) proposed a local quadratic approximation (LQA) method. This gives a sandwich estimator for the covariance matrix of the estimated nonzero parameters:

\hat{cov} ({\hat{θ}}_{\hat{A}}) = {\nabla^{2} \tilde{L} ({\hat{θ}}_{\hat{A}}) + \sum_{λ} ({\hat{θ}}_{\hat{A}})}^{- 1} \hat{cov} {\nabla \tilde{L} ({\hat{θ}}_{\hat{A}})} {\nabla^{2} \tilde{L} ({\hat{θ}}_{\hat{A}}) + \sum_{λ} ({\hat{θ}}_{\hat{A}})}^{- 1}

(2)

where ∇ Inline graphic (θ̂) = ∂ (θ̂/∂θ, ∇² (θ̂) = ∂² (θ̂)/∂θ∂θ^T, and Σ_λ(θ̂) is a diagonal matrix with the (j, j)th element being $I ({\hat{β}}_{j} \neq 0) p_{λ_{n} 1}^{'} (∣ {\hat{β}}_{j} ∣) / ∣ {\hat{β}}_{j} ∣$ . The LQA approach can also be used to construct a covariance estimate for the ALASSO estimates where $p_{λ_{n} j}^{'} (∣ {\tilde{β}}_{j} ∣) = n^{- \frac{1}{2}} λ_{n} {∣ {\tilde{β}}_{j} ∣}^{- 1}$ . Similar to covariance estimates in Tibshirani (1996) and Fan and Li (2001) for penalized estimates, this procedure estimates the standard errors for variables with β̂_j = 0 as 0. Although this sandwich estimator has been proven to be consistent (Fan and Peng 2004) under the linear regression model, it tends to underestimate the standard errors, and normal confidence regions (CRs) using this estimate often do not provide acceptable coverage in finite sample.

To approximate the covariance of θ̂ more accurately, we propose a perturbation method to estimate the distribution of $n^{\frac{1}{2}} (\hat{θ} - θ_{0})$ for a general class of objective functions and penalties. Let Inline graphic = {G_i, i = 1, …, n} be a set of independent and identically distributed (i.i.d.) positive random variables with mean and variance equal to one. We first perturb the initial objective function and obtain

{\tilde{L}}^{*} (θ) = n^{- 1} \sum_{i = 1}^{n} \tilde{L} (θ, D_{i}) G_{i}, and {\tilde{θ}}^{*} = \underset{θ}{argmin} {\tilde{L}}^{*} (θ) .

(3)

Then with the same set Inline graphic , we obtain the minimizer of a stochastically perturbed version of the regularized objective function:

{\hat{L}}^{*} (θ) = {\tilde{L}}^{*} (θ) + \sum_{j = 1}^{p} p_{λ_{n}^{*} j}^{'} (∣ {\tilde{β}}_{j}^{*} ∣) ∣ β_{j} ∣

(4)

where $λ_{n}^{*}$ satisfies the same order constraints as λ_n as discussed in the Lemma 1. In practice, one may select λ_n and $λ_{n}^{*}$ based on the BIC criterion detailed in the appendix with the corresponding objective functions. In the appendix we first show that $n^{\frac{1}{2}} ({\hat{θ}}_{A}^{*} - θ_{0 A})$ converges in distribution to $N (0, A_{11}^{- 1} B_{11} A_{11}^{- 1})$ , the same limiting distribution of $n^{\frac{1}{2}} (\hat{θ} - θ_{0})$ . Furthermore, $P^{*} ({\hat{θ}}_{A^{c}}^{*} = 0) \to 1$ , where ℙ^* is the probability measure generated by both Inline graphic and . In addition, we show that the distribution of $n^{\frac{1}{2}} ({\hat{θ}}_{A}^{*} - {\hat{θ}}_{A})$ conditional on the data can be used to approximate the unconditional distribution of $n^{\frac{1}{2}} ({\hat{θ}}_{A} - θ_{A 0})$ and that $P^{*} ({\hat{θ}}_{A^{c}}^{*} = 0 ∣ X) \to 1$ . In practice, these results allow us to estimate the distribution of $n^{\frac{1}{2}} (\hat{θ} - θ_{0})$ by generating a large number, M, say, of random samples Inline graphic . We obtain ${\hat{θ}}_{m}^{*}$ by minimizing the perturbed objective function for each sample m = 1, … M, and then approximate the theoretical distribution of θ̂ by the empirical distribution { ${\hat{θ}}_{m}^{*}$ , m = 1, … M}. Specifically, the covariance matrix of θ̂ can be estimated by the sample covariance matrix constructed from { ${\hat{θ}}_{m}^{*}$ , m = 1, … M}.

Estimating the distribution of $n^{\frac{1}{2}} (\hat{θ} - θ_{0})$ based on the distribution of $n^{\frac{1}{2}} ({\hat{θ}}^{*} - \hat{θ}) ∣ X$ leads to the construction of three possible (1 − α)100% confidence regions for θ₀. For the first, let ${\hat{σ}}_{j}^{2} = M^{- 1} \sum_{m = 1}^{M} {({\hat{β}}_{m j}^{*} - {\hat{β}}_{j})}^{2}$ . We construct a normal CR for β₀_j, ${CR}_{j}^{* N}$ , centered at β̂_j with standard deviation ${\hat{σ}}_{j}^{*}$ . Since $n^{\frac{1}{2}} ({\hat{θ}}^{*} - \hat{θ}) ∣ X$ and $n^{\frac{1}{2}} (\hat{θ} - θ_{0})$ converge to the same normal distribution, $n {\hat{σ}}_{j}^{2}$ consistently estimates the variance of $n^{\frac{1}{2}} ({\hat{β}}_{j} - β_{0 j})$ . This method is in contrast to CR^Asym obtained with standard deviations ${\hat{σ}}_{j}^{Asym}$ estimated with the asymptotically consistent LQA sandwich estimator in Fan and Li (2001) and Zou (2006). In contrast to setting the standard error to 0 when β̂_j = 0, we set ${CR}_{j}^{* N} = {0}$ if the proportion of ${\hat{β}}_{j}^{*}$ being 0 is larger than a threshold p̂_high, such that p̂_high → p_high < 1. This method accounts for the superefficiency due to the oracle property and results in a shorter interval with valid coverage. Secondly, we simply take the (α/2)100th and (1 − α/2)100th quantiles of ${\hat{β}}_{j}^{*}$ as the upper and lower bounds of ${CR}_{j}^{* Q}$ . For the third, we estimate the density of ${\hat{β}}_{j}^{*}$ with a kernel density estimator and choose the (1 − α)100% highest density region, ${CR}_{j}^{* HDR}$ . We estimate the density of ${\hat{β}}_{j}^{*} ∣ X$ as a mixed density with distribution $f_{j}^{*} (β) = {\hat{P}}_{0 j} I (β = 0) + (1 - {\hat{P}}_{0 j}) f_{j}^{*} (β)$ , where Inline graphic is the proportion of ${\hat{β}}_{j}^{*}$ set to 0, and $f_{j}^{*} (β)$ is the unknown distribution of ${\hat{β}}_{j}^{*}$ given that it is not set to 0. To construct a highest density confidence region that has accurate coverage of this mixed distribution, we adjust the definition of the region depending on thresholds that reflect the strength of evidence for β₀_j = 0. Our highest density confidence region ${CR}_{j}^{* HDR}$ is defined as

{CR}_{j}^{* HDR} = {\begin{array}{l} {0} & if {\hat{P}}_{0 j} \geq {\hat{p}}_{high} \\ {β : f_{j}^{*} (β) \geq {\hat{c}}_{1}} \cup {0} & if {\hat{p}}_{low} \leq {\hat{P}}_{0 j} < {\hat{p}}_{high} \\ {β : f_{j}^{*} (β) \geq {\hat{c}}_{2}} \cup {0} & if α \leq {\hat{P}}_{0 j} < max (α, {\hat{p}}_{low}) \\ {β : f_{j}^{*} (β) \geq {\hat{c}}_{3}} & if {\hat{P}}_{0 j} \leq α \end{array}

where ĉ₁, ĉ₂, and ĉ₃ are chosen such that for $H (c) = \int I {f_{j}^{*} (β) \geq c} f_{j}^{*} (β) d β$ , we have H(ĉ₁) = (1 − α − Inline graphic )/(1 − ), H (ĉ₂) = 1 − α + α( + p̂_low), H (ĉ₃) = 1 − α, while p̂_low → 0 and p̂_high → p_high = 1 − α. When , the proportion of ${\hat{β}}_{j}^{*}$ set to zero, is greater than the upper thresholding value p̂_high, we have strong evidence that β₀_j = 0 and thus take {0} as the confidence interval. When Inline graphic is between the high and low thresholding p̂ values, we have moderately strong evidence that β₀_j = 0 and thus take the mass at 0 and a 1 − α − highest density region from the ${\hat{β}}_{j}^{*} ∣ {\hat{β}}_{j}^{*} \neq 0$ samples. The occurrence of α ≤ < max(α, p̂_low) suggests that β₀_j is likely to be a weak signal. For such cases, it would be difficult to make inference about β₀_j due to shrinkage. Thus, we inflate the highest density region from the ${\hat{β}}_{j}^{*} ∣ {\hat{β}}_{j}^{*} \neq 0$ samples. Finally, when Inline graphic < α, we have strong evidence that β₀_j is nonzero and so we take the 1 − α highest density region of the continuous empirical distribution of nonzero ${\hat{β}}_{j}^{*}$ samples. The justification of this method and the choices of p̂_high and p̂_low are relegated to the appendix.

In practice, when assessing the effects of multiple features, it is often important to adjust for multiple comparisons. For interval estimation, we may construct a (1 − α)100% simultaneous confidence region to cover the entire parameter vector θ₀. We may then make statements about the importance of each of the covariates in the presence of other covariates while maintaining a type I error of α. For the regularized estimator, we define the Normal simultaneous region as ${CR}^{* Sim} = \prod_{j \notin {\hat{A}}^{*}} {0} \times \prod_{j \in {\hat{A}}^{*}} ({\hat{β}}_{j} - γ_{α} {\hat{σ}}_{j}^{*}, {\hat{β}}_{j} + γ_{α} {\hat{σ}}_{j}^{*})$ where Inline graphic = {j : < p̂_high} and γ_α is the (1 − α) 100% quantile of ${max {∣ {\hat{β}}_{j m}^{*} - {\hat{β}}_{j} ∣ / {\hat{σ}}_{j}^{*}}_{j \in {\hat{A}}^{*}}}_{m = 1}^{M}$ . We define the (1 − α)100% HDR simultaneous region as ${CR}^{* SimHDR} = \prod_{j} {CR}_{j, α_{s}}^{* HDR}$ where ${CR}_{j, α_{s}}^{* HDR}$ is the $1 - α_{s} {CR}_{j}^{* HDR}$ for β̂_j and α_s = 2(1 − Φ(γ_α)). We compare the performance of these confidence regions with numerical examples in Sections 3 and 4.

3. SIMULATION STUDIES

To examine the validity of our procedures in finite samples, we performed simulation studies to assess the performance of the corresponding confidence regions. For each setting, we simulated 1500 data sets with n observations generated under the linear model, y = Xβ + ε, where x_ij ~ Inline graphic (0, 1), the pairwise correlation between x_i and x_j was set to cor(x_i, x_j) = ρ, ε_i ~ (0, σ²), and β, ρ, and σ were varied between settings. In each setting, β was sparse and included medium and high signals. We obtained ALASSO estimators via LARS (Efron et al. 2004) for each simulated data set with OLS initial estimates and λ chosen by the BIC as described in the appendix, and then M = 500 perturbed samples using our proposed method with Inline graphic generated from a mean 1 exponential distribution. The sample size n was set to 100, 200, 400, or 1000, while ρ was 0, 0.2, or 0.5, and σ was 1 or 2. To compute the highest density regions CR^*HDR we utilized the hdrcde package in R with the “ndr” bandwidth estimator as presented in Scott (1992) based on Silverman’s rule of thumb (Silverman 1986). We chose ${\hat{p}}_{low} = min {\sqrt{2 / π} exp (- n λ / (4 {\hat{σ}}^{2})), 0.49}$ and ${\hat{p}}_{high} = min {1 - \sqrt{2 / π} exp (- n λ / {\hat{σ}}^{2}), 0.95}$ as justified in the appendix for CR^*HDR, CR^*N, CR^*Sim, and CR^*SimHDR. We substituted the σ used in the standard deviation estimate from Zou (2006) analogous to equation (2) with the known σ from the simulations. We present the results for simulations with n = 100, 200 and 400 when σ = 1 or 2 and p = 10 or 20. In these cases, the true β₀ contains two large effects of β₀_j = 1, two moderate effects of β₀_j = 0.5, and six (for p = 10) or sixteen (for p = 20) noise parameters where β₀_j = 0. To examine the effect of regularization we compare our CRs for the regularized estimators to CR^OLS, the normal CR based on the empirical standard error of the perturbed ordinary least squares (OLS) estimates.

In Tables 1 and 2 we see that when σ = 1, most regions perform similarly for nonzero parameters. When σ = 2, the perturbation regions usually have higher coverage than CR^Asym and sacrifice very little in length. The asymmetric CR^*HDR has the shortest length when β₀_j = 0 for all settings. Coverage for CR^*HDR and simultaneous confidence regions can be low when n = 100 due to the difficulty of estimating Inline graphic at such a small sample size, but coverage reaches nominal levels by n = 200. The standard deviation estimate from Zou (2006), σ̂^Asym (also see Table 3), is not large enough to cover β₀_j sufficiently, and while the coverage probability of the CR^OLS is not extremely low, it is notably outperformed by the other confidence regions when β₀_j = 0. We omit the results from the settings where n = 1000 because the results have similar patterns as those with n = 400. For these large sample cases with n greater than or equal to 400 we saw convergence to 95% coverage for the normal CRs, highest density regions, and OLS CRs in all settings when the true parameter was nonzero. For true zero parameters, the coverage probabilities of our confidence regions converged to 1, while the OLS CR converged to 0.95. A tradeoff associated with our method is that while the coverage of our perturbation confidence regions tends to be higher than CR^OLS and CR^Asym, some power is sacrificed for moderate signals of β₀_j = 0.5. This loss is minimal, however, and only appears in difficult cases when sample size is low and ρ and σ are high. When β₀_j = 0, CR^OLS has coverage lower than 95% for small samples while our methods produce regions with coverage probability near 1 and very short lengths reflecting the oracle properties. Overall, the most disparity between our methods and previous methods is seen when the SNR is low.

Table 1.

Coverage probabilities (lengths) of confidence regions when σ = 1.

p	β₀		n = 100			n = 200			n = 400
p	β₀		ρ = 0	ρ = 0.2	ρ = 0.5	ρ = 0	ρ = 0.2	ρ = 0.5	ρ = 0	ρ = 0.2	ρ = 0.5
10	1	CR^*N	91.6 (38)	92.7 (41)	92.5 (52)	92.9 (27)	94.1 (29)	94.2 (37)	93.9 (19)	95.1 (21)	94.9 (26)
		CR^*HDR	91.4 (38)	91.7 (40)	91.5 (51)	92.4 (27)	93.7 (29)	93.9 (36)	93.4 (19)	94.5 (21)	94.1 (26)
		CR^*Q	91.7 (38)	91.0 (40)	91.5 (51)	92.5 (27)	93.6 (29)	93.8 (36)	93.5 (19)	94.3 (20)	93.9 (26)
		CR^Asym	93.9 (41)	94.1 (43)	93.7 (53)	94.1 (28)	94.2 (30)	94.1 (36)	94.4 (20)	95.0 (21)	94.9 (25)
		CR^OLS	91.4 (38)	91.7 (40)	90.6 (51)	93.0 (27)	93.6 (29)	93.9 (36)	93.0 (19)	94.3 (21)	93.7 (26)
	0.5	CR^*N	93.0 (40)	93.3 (43)	92.0 (54)	93.9 (28)	94.6 (30)	95.1 (38)	93.7 (20)	94.2 (21)	95.2 (27)
		CR^*HDR	91.9 (38)	92.5 (41)	93.4 (51)	93.5 (27)	94.0 (29)	93.8 (37)	93.7 (19)	93.8 (21)	94.7 (26)
		CR^*Q	91.7 (39)	92.3 (42)	90.7 (53)	93.3 (27)	93.7 (29)	93.9 (37)	93.8 (19)	93.6 (21)	94.7 (26)
		CR^Asym	93.3 (41)	93.5 (43)	91.5 (52)	94.5 (28)	94.3 (30)	93.7 (36)	95.0 (20)	94.0 (21)	94.1 (25)
		CR^OLS	92.4 (38)	91.6 (41)	90.9 (50)	93.5 (27)	94.3 (29)	93.8 (36)	94.3 (19)	93.7 (21)	94.9 (26)
	0	CR^*N	97.6 (23)	98.5 (25)	98.1 (31)	98.4 (17)	98.3 (19)	97.9 (23)	98.7 (13)	98.7 (13)	98.7 (16)
		CR^*HDR	99.1 (17)	99.3 (18)	99.4 (23)	99.6 (12)	99.3 (13)	99.0 (16)	99.7 (8)	99.3 (8)	99.5 (11)
		CR^*Q	99.5 (31)	99.7 (33)	99.8 (43)	99.7 (22)	99.7 (24)	99.7 (30)	99.8 (16)	99.7 (17)	99.8 (21)
		CR^OLS	92.9 (38)	92.9 (40)	91.7 (51)	93.4 (27)	93.0 (29)	92.6 (36)	93.4 (19)	93.7 (21)	93.6 (26)
		CR^*SimHDR	91.9 (36)	92.5 (39)	91.7 (49)	93.1 (26)	94.9 (28)	93.8 (36)	94.9 (19)	94.8 (20)	96.0 (25)
		CR^*Sim	92.5 (42)	92.9 (46)	91.9 (58)	93.7 (31)	95.5 (33)	95.2 (42)	95.5 (23)	95.7 (24)	96.6 (30)
		CR^*SimOLS	87.1 (54)	86.5 (58)	85.9 (72)	89.8 (38)	90.0 (41)	90.9 (52)	92.3 (28)	91.6 (30)	92.6 (37)
20	1	CR^*N	91.7 (38)	92.4 (42)	92.5 (53)	93.3 (27)	92.9 (30)	94.3 (38)	95.4 (19)	94.6 (21)	94.5 (27)
		CR^*HDR	90.2 (37)	90.9 (41)	90.8 (51)	92.4 (26)	91.9 (29)	91.9 (36)	95.1 (19)	93.6 (21)	93.3 (26)
		CR^*Q	90.2 (37)	90.7 (41)	90.7 (51)	92.3 (26)	92.3 (29)	91.9 (36)	95.1 (19)	93.2 (21)	93.1 (26)
		CR^Asym	93.9 (41)	94.5 (44)	93.3 (54)	95.1 (28)	93.9 (30)	94.0 (37)	95.9 (20)	94.7 (21)	93.3 (26)
		CR^OLS	90.3 (38)	89.9 (42)	90.0 (52)	92.6 (27)	92.1 (29)	92.0 (37)	95.3 (19)	93.4 (21)	93.3 (26)
	0.5	CR^*N	91.1 (40)	91.5 (44)	91.0 (56)	93.7 (28)	93.5 (30)	92.7 (39)	93.1 (20)	94.7 (21)	95.0 (27)
		CR^*HDR	89.7 (38)	90.3 (42)	92.5 (52)	93.1 (27)	93.1 (30)	91.7 (38)	92.9 (19)	93.3 (21)	94.4 (27)
		CR^*Q	89.7 (39)	89.7 (43)	89.2 (54)	92.7 (27)	92.5 (30)	91.5 (38)	92.7 (19)	93.5 (21)	94.5 (27)
		CR^Asym	91.7 (41)	92.0 (44)	89.7 (53)	94.5 (28)	93.2 (30)	92.3 (37)	93.7 (20)	94.3 (21)	94.2 (26)
		CR^OLS	89.7 (38)	89.7 (42)	89.6 (52)	92.9 (27)	92.9 (29)	91.8 (37)	92.9 (19)	92.7 (21)	94.2 (26)
	0	CR^*N	96.6 (29)	97.3 (32)	96.8 (40)	98.6 (21)	98.5 (23)	98.7 (29)	98.8 (15)	99.1 (17)	99.0 (21)
		CR^*HDR	97.7 (25)	98.3 (28)	98.3 (34)	98.9 (17)	99.3 (19)	99.1 (24)	99.3 (12)	99.5 (13)	99.4 (16)
		CR^*Q	99.0 (31)	99.4 (34)	99.2 (43)	99.5 (22)	99.9 (24)	99.5 (30)	99.8 (15)	99.9 (17)	99.7 (21)
		CR^OLS	89.5 (38)	90.1 (41)	90.1 (52)	92.4 (27)	92.2 (29)	92.6 (37)	94.7 (19)	93.8 (21)	94.2 (26)
		CR^*SimHDR	90.5 (42)	92.0 (47)	92.1 (58)	95.4 (31)	95.7 (34)	95.3 (44)	96.9 (22)	96.9 (25)	96.9 (32)
		CR^*Sim	92.5 (51)	91.9 (57)	91.7 (71)	96.5 (38)	97.5 (42)	96.5 (54)	97.7 (28)	97.8 (31)	98.1 (40)
		CR^*SimOLS	80.1 (59)	78.4 (64)	77.9 (80)	87.5 (41)	87.1 (45)	84.9 (57)	90.6 (29)	90.0 (32)	91.1 (41)

Open in a new tab

NOTE: We multiply values by 100. The lengths of the simultaneous confidence regions are averaged over the number of parameters.

Table 2.

Coverage probabilities (lengths) of confidence regions when σ = 2.

p	β₀		n = 100			n = 200			n = 400
p	β₀		ρ = 0	ρ = 0.2	ρ = 0.5	ρ = 0	ρ = 0.2	ρ = 0.5	ρ = 0	ρ = 0.2	ρ = 0.5
10	1	CR^*N	92.6 (79)	94.3 (85)	92.9 (110)	93.4 (55)	94.4 (59)	94.9 (76)	94.4 (39)	94.2 (42)	94.7 (53)
		CR^*HDR	91.7 (76)	93.9 (82)	94.0 (104)	92.7 (55)	94.2 (59)	94.4 (74)	94.0 (39)	93.6 (42)	93.7 (52)
		CR^*Q	91.9 (77)	93.7 (84)	91.4 (107)	92.7 (55)	93.8 (59)	94.3 (75)	93.9 (39)	93.4 (42)	93.8 (52)
		CR^Asym	80.7 (57)	82.7 (60)	78.8 (73)	82.3 (40)	83.6 (42)	80.5 (51)	83.7 (28)	83.5 (30)	80.9 (36)
		CR^OLS	91.7 (75)	93.4 (81)	91.1 (102)	92.7 (54)	93.4 (58)	94.0 (73)	94.0 (39)	93.7 (42)	93.4 (52)
	0.5	CR^*N	87.5 (80)	87.6 (86)	81.3 (100)	93.5 (59)	94.2 (63)	90.3 (79)	94.7 (41)	95.7 (44)	94.5 (57)
		CR^*HDR	90.3 (71)	91.4 (76)	83.9 (88)	95.9 (54)	96.5 (58)	92.3 (69)	93.5 (39)	94.5 (42)	96.5 (52)
		CR^*Q	91.5 (76)	92.3 (81)	90.5 (97)	93.4 (57)	93.9 (61)	92.8 (75)	94.0 (40)	94.6 (43)	94.0 (56)
		CR^Asym	76.5 (51)	76.3 (53)	67.3 (57)	78.8 (39)	78.8 (42)	78.3 (48)	79.9 (28)	80.9 (29)	78.1 (36)
		CR^OLS	91.1 (75)	92.1 (81)	90.7 (101)	93.0 (54)	93.5 (58)	93.1 (72)	94.1 (38)	94.5 (41)	93.5 (52)
	0	CR^*N	97.1 (47)	97.1 (50)	97.7 (65)	98.1 (33)	98.0 (37)	98.1 (47)	98.3 (25)	98.4 (27)	98.8 (34)
		CR^*HDR	98.3 (40)	98.2 (40)	99.1 (52)	99.0 (25)	99.4 (28)	99.5 (36)	99.4 (17)	99.5 (19)	99.6 (24)
		CR^*Q	99.1 (64)	98.9 (68)	99.4 (86)	99.7 (45)	99.7 (49)	99.7 (61)	99.9 (32)	99.7 (34)	99.9 (43)
		CR^OLS	91.3 (75)	91.2 (81)	91.7 (102)	92.9 (54)	92.5 (58)	92.3 (73)	94.1 (39)	94.7 (42)	94.2 (52)
		CR^*SimHDR	85.3 (71)	85.7 (77)	75.1 (96)	94.1 (51)	94.4 (56)	90.8 (70)	96.3 (38)	95.2 (41)	95.2 (52)
		CR^*Sim	84.1 (83)	84.1 (91)	73.9 (116)	93.4 (60)	93.5 (66)	90.1 (84)	96.6 (45)	95.9 (49)	95.8 (62)
		CR^*SimOLS	85.2 (108)	86.9 (116)	87.5 (146)	90.9 (77)	90.7 (83)	91.1 (104)	92.7 (55)	92.5 (59)	92.6 (74)
20	1	CR^*N	91.2 (80)	91.7 (87)	90.0 (112)	92.1 (55)	93.9 (60)	92.9 (78)	94.3 (39)	93.7 (43)	94.6 (54)
		CR^*HDR	90.2 (76)	91.2 (83)	92.1 (104)	92.1 (54)	93.1 (59)	91.7 (75)	94.1 (38)	93.1 (42)	93.9 (53)
		CR^*Q	90.1 (77)	90.7 (84)	89.6 (107)	92.1 (54)	93.3 (59)	91.9 (76)	94.4 (38)	92.6 (42)	93.8 (53)
		CR^Asym	79.2 (58)	78.9 (62)	75.9 (76)	80.3 (40)	82.2 (43)	77.3 (53)	83.5 (28)	80.5 (30)	81.9 (37)
		CR^OLS	89.7 (76)	90.1 (83)	89.7 (104)	92.3 (54)	92.9 (58)	91.3 (74)	94.4 (38)	92.9 (42)	93.8 (53)
	0.5	CR^*N	86.1 (81)	82.7 (86)	80.7 (103)	91.6 (59)	91.9 (64)	88.4 (81)	93.9 (41)	94.9 (45)	93.1 (58)
		CR^*HDR	91.0 (74)	87.8 (79)	84.7 (94)	95.7 (55)	94.7 (59)	92.1 (73)	93.3 (39)	94.3 (43)	96.1 (54)
		CR^*Q	89.1 (75)	88.8 (80)	88.6 (97)	92.0 (57)	92.1 (61)	90.9 (75)	93.3 (40)	94.1 (44)	92.5 (56)
		CR^Asym	72.5 (50)	71.6 (51)	64.2 (57)	77.2 (39)	76.6 (41)	73.8 (48)	81.3 (28)	79.2 (30)	76.8 (36)
		CR^OLS	89.5 (76)	88.9 (82)	89.3 (104)	92.5 (54)	92.5 (59)	91.8 (74)	94.1 (38)	94.4 (42)	93.1 (53)
	0	CR^*N	97.3 (57)	96.8 (61)	97.0 (79)	98.5 (40)	97.3 (45)	97.7 (58)	98.8 (29)	98.9 (33)	98.9 (41)
		CR^*HDR	97.9 (53)	97.5 (55)	97.7 (70)	99.1 (35)	98.2 (39)	98.7 (50)	99.3 (24)	99.3 (27)	99.3 (33)
		CR^*Q	98.9 (63)	98.7 (69)	98.8 (87)	99.6 (44)	99.2 (49)	99.3 (62)	99.6 (31)	99.7 (34)	99.7 (43)
		CR^OLS	90.1 (76)	90.2 (83)	89.8 (104)	92.8 (54)	91.9 (59)	91.9 (74)	93.5 (38)	93.5 (42)	93.7 (53)
		CR^*SimHDR	87.1 (81)	83.6 (89)	78.1 (112)	95.1 (60)	94.4 (66)	93.7 (84)	97.1 (45)	97.1 (50)	97.3 (62)
		CR^*Sim	86.1 (99)	82.8 (109)	78.3 (138)	94.3 (73)	94.4 (81)	94.1 (103)	97.8 (55)	98.3 (61)	97.1 (77)
		CR^*SimOLS	77.7 (117)	76.3 (128)	76.6 (160)	86.1 (82)	85.4 (90)	86.9 (114)	90.5 (59)	89.8 (64)	90.3 (81)

Open in a new tab

NOTE: We multiply values by 100. The lengths of the simultaneous confidence regions are averaged over the number of parameters.

Table 3.

Empirical s.d. of the parameter estimates (σ̃) and average s.e. estimates (σ̂).

p	β₀		n = 100			n = 200			n = 400
p	β₀		ρ= 0	ρ = 0.2	ρ = 0.5	ρ = 0	ρ = 0.2	ρ = 0.5	ρ = 0	ρ = 0.2	ρ = 0.5
10	1	σ̃	21.7	22.2	29.9	14.7	15.2	19.3	10.1	10.8	13.7
		σ̃^OLS	21.1	21.6	28.5	14.4	15.2	19.1	10.1	10.9	13.9
		σ̂^*	20.0	21.7	28.1	14.1	15.2	19.4	10.0	10.8	13.5
		σ̂^OLS	19.1	20.6	26.0	13.8	14.8	18.5	9.8	10.6	13.2
		σ̂^Asym	14.7	15.4	18.7	10.2	10.8	13.1	7.1	7.5	9.2
	0.5	σ̃	24.2	25.5	32.3	16.1	16.8	21.8	10.6	11.2	14.7
		σ̃^OLS	21.6	22.8	29.2	14.8	15.5	19.4	10.2	10.8	13.9
		σ̂^*	21.1	22.8	27.9	15.1	16.2	20.5	10.3	11.2	14.5
		σ̂^OLS	19.1	20.7	25.9	13.8	14.8	18.4	9.8	10.6	13.2
		σ̂^Asym	13.0	13.6	14.5	10.0	10.6	12.1	7.1	7.5	9.1
	0	σ̃	17.0	17.2	20.3	10.7	11.5	14.2	6.9	7.2	9.2
		σ̃^OLS	22.4	23.5	28.4	14.9	16.2	19.9	10.2	10.9	13.8
		σ̂^*	18.6	19.9	25.1	13.2	14.2	17.8	9.4	10.1	12.6
		σ̂^OLS	19.2	20.6	26.1	13.8	14.9	18.5	9.9	10.6	13.3
		σ̂^Asym	5.0	4.5	5.4	2.6	2.9	3.5	1.5	1.5	2.0
20	1	σ̃	23.2	24.8	33.4	15.4	16.2	21.6	9.9	11.4	14.0
		σ̃^OLS	23.0	24.6	31.7	15.4	16.1	21.3	9.8	11.4	13.9
		σ̂^*	20.3	22.2	28.5	14.1	15.3	19.9	9.9	10.9	13.9
		σ̂^OLS	19.3	21.1	26.5	13.7	14.9	18.9	9.7	10.7	13.4
		σ̂^Asym	14.9	15.9	19.4	10.3	10.9	13.5	7.1	7.6	9.4
	0.5	σ̃	24.7	27.1	32.8	16.4	18.1	23.1	10.5	11.7	15.7
		σ̃^OLS	22.8	25.3	31.6	15.1	16.6	20.8	10.0	11.2	14.4
		σ̂^*	21.2	22.8	28.1	15.2	16.5	20.8	10.5	11.5	14.9
		σ̂^OLS	19.3	21.0	26.4	13.7	14.9	18.8	9.8	10.7	13.5
		σ̂^Asym	12.8	13.1	14.5	10.1	10.5	12.2	7.2	7.6	9.2
	0	σ̃	15.3	16.7	21.1	9.3	10.8	13.7	6.1	6.6	8.1
		σ̃^OLS	22.5	25.0	31.7	14.7	16.6	21.2	10.2	11.4	14.1
		σ̂^*	18.5	20.2	25.6	12.9	14.2	18.1	9.1	10.2	12.7
		σ̂^Asym	4.7	5.1	6.1	2.6	2.8	3.5	1.4	1.6	1.9

Open in a new tab

NOTE: We present results for settings when σ, the standard deviation of ε, is 2. All values are multiplied by 100. Note that ${\hat{σ}}_{j}^{Asym} = 0$ when β̂_j = 0, but β̂_j and ${\hat{β}}_{j}^{*}$ are not always 0 in the simulations, and therefore the average ${\hat{σ}}_{j}^{Asym}$ is nonzero.

The coverage probabilities and lengths of our simultaneous confidence regions are also displayed in Tables 1 and 2. We compared our methods to CR^*SimOLS, constructed analogously to CR^*Sim except Inline graphic = {j|j = 1, …, p} and CR^*SimOLS is centered at the OLS estimates and the standard error is the sample standard deviation of the perturbed OLS estimates. Our regularized CR^*Sim and CR^*SimHDR have the advantage of shrinking the dimension of the region by reducing some CRs to the point {0} when Inline graphic is large. We see that our CR^*Sim and CR^*SimHDR outperform CR^*SimOLS in coverage and have shorter lengths. For large sample settings when n = 1000, CR^*SimOLS converges further to 95% coverage with levels around 90% for p = 20 and CR^*Sim and CR^*SimHDR have coverage almost always over 95%.

In Table 3 we also present the standard error estimates when σ = 2. For notation, let the empirical standard deviations of the estimators β̂_j and β̃_j be denoted as σ̃_j and ${\tilde{σ}}_{j}^{OLS}$ , respectively. We see that our estimate of the standard error from the perturbed samples, ${\hat{σ}}_{j}^{*}$ , does well in estimating σ̃_j. However, the standard error proposed by Zou (2006) underestimates the true standard error of the parameter estimates, especially when σ = 2 and β₀_j = 0.5 or 0. When the SNR is higher, ${\tilde{σ}}_{j}^{Asym}$ estimates σ̃_j well except when β₀_j = 0 because ${\hat{σ}}_{j}^{Asym} = 0$ whereas σ̃_j and ${\hat{σ}}_{j}^{*}$ are clearly nonzero.

4. EXAMPLE: HIV DRUG RESISTANCE

We illustrate our methods in a real example using the HIV antiretroviral drug susceptibility data described in Rhee et al. (2003). This dataset was refined from the Stanford HIV Drug Resistance Database (available at http://hivdb.stanford.edu/), and is used to study the association of protease mutations with susceptibility to the protease inhibitor anti-retroviral (ARV) drug amprenavir. The data consist of mutation information at 99 protease codons in the viral genome, of which 79 contain mutations, and ARV drug resistance assays for n = 702 HIV infected patients. Drug resistance was measured in units of IC₅₀, the amount of drug needed to inhibit viral replication by 50% in units of fold increase compared to drug-sensitive wildtype virus. Researchers are interested in determining which protease mutations are associated with ARV resistance so that they may develop a genotype test for resistance that looks for these mutations in the patient’s infecting HIV strain. Therefore, we aim to examine the effect of the presence of any of the mutations at 79 codons on IC₅₀, where higher IC₅₀ measurements indicate higher levels of drug resistance. We chose to log-transform the non-negative IC₅₀ outcome and represented the presence of each of the mutations as a binary predictor in our regression model. We removed the fifteen mutations that occurred less than 0.5% in the data set. Recently, Wu (2009) analyzed these data with a permutation test for regression coefficients of LASSO. In this paper, we will analyze the data using ALASSO and gain inference by using our perturbation methods to construct CRs and standard errors.

For this analysis, we used LARS to fit an ALASSO linear model with initial parameters β̃ estimated by OLS and λ and λ^* chosen to minimize the BIC as described in the appendix. We generated M=500 perturbation variable sets Inline graphic , consisting of n = 702 i.i.d. variables from an exponential distribution with mean and variance equal to 1, and for each we minimized the perturbed objective function to obtain ${\hat{β}}_{m}^{*}$ . We constructed 95% CRs using our perturbation method and compared inference gained from CR^*N, CR^*HDR, and CR^*Q to the inference from CR^Asym and CR^OLS. We estimated the σ used in the standard deviation estimate from Zou (2006) analogous to equation (2) with the residual variance from the nonregularized linear regression model and chose p̂_high and p̂_low as described in the simulation studies.

We present a graphical summary of the analysis results in Figure 1. Previous studies by Prado et al. (2002) and results collected by Johnson et al. (2005) found that mutations at codons 10, 32, 46, 47, 50, 54, 73, 82, 84 and 90 emerge in amprenavir resistant viral genomes. Using a permutation based p-value adjusted for multiple testing, Wu (2009) determined these mutations (except 73 and 82) as well as additional codon mutations to be significantly associated with amprenavir susceptibility at the α = 0.05 level for a total of thirteen significant associations. The ALASSO estimator obtained with λ = 0.56 from BIC estimated 36 coefficients as nonzero. The confidence region from nonregularized estimates CR^OLS was significant for twenty-six mutations. Our perturbation based CR^*N, CR^*HDR, and CR^*Q for the mutations found significant by Wu (2009) did not include zero and three new mutations (37, 64, 93) had significant perturbation confidence regions. We see in Figure 2 that the parameter for codons 71 and 89 have marginally significant Normal and HDR confidence regions and marginally nonsignificant quantile confidence regions and note that Inline graphic is marginally close to 0.05.

Perturbation methods results denoting significant associations between genetic mutations and drug susceptibility.

95% perturbation CRs (CR^*N and CR^*HDR) for the association between genetic mutations and antiretroviral drug susceptibility. Estimated coefficients *β̂_j* are represented with a circle on each CR line and a star at zero signifies that the CR includes the point mass at zero. The shaded region denotes the simultaneous confidence regions CR^*Sim and CR^*SimHDR. Note that even coefficients estimated as zero may have CRs around their estimates and that CR^*HDR may be asymmetrical and noncontiguous.

Our use of ALASSO provides estimates of the effects of each mutation while adjusting for the presence of other mutations. Several studies have shown that mutations associated with resistance to protease inhibitors can have varying effects when combined with other mutations (Schumi and DeGruttola 2008; Van Marck et al. 2009). For instance, the mutation at codon 32 has been found to have no effect on resistance of the protease inhibitor drug darunavir when a mutation at codon 84 is present (Van Marck et al. 2009). Our method allows us to determine the size of associations without orthogonalizing predictors and we adjust for multiple testing with the simultaneous confidence region CR^*Sim. Results could be impacted by studies summarized in Johnson et al. (2005) that may not have adjusted for other mutations, and the use of LASSO estimators that do not have oracle properties in Wu (2009). Our methods highlight three new mutations that have not been found to be associated with drug susceptibility. Furthermore, our methods produce CRs for the coefficients of mutations that were estimated as zero. These CRs quantify the uncertainty in our estimation and can aid scientists who wish to conduct future drug therapy studies involving the codons.

5. DISCUSSION

In this paper, we address the problem of constructing a covariance estimate for parameter estimates obtained with a general objective function and concave penalty functions including adaptive LASSO and SCAD. The proposed methods for covariance estimates are simple to implement and possess the attractive property that parameters estimated as zero have nonzero standard errors. We may then construct confidence regions for each parameter estimate and obtain more meaningful inference.

We have shown through extensive simulation studies using the ALASSO penalty that our perturbation method results in confidence regions with accurate coverage probability. The perturbation based normal CR does not sacrifice much in length and has reasonable coverage for small sample sizes. We set the CR to {0} when the proportion of perturbed estimates set to 0 is higher than a threshold, and therefore shorten the length by utilizing the oracle property. The perturbation based highest density region has even shorter length and good coverage probability, especially for the moderate signal β₀_j = 0.5 in comparison to all other confidence regions. The asymptotic based Normal interval that uses the standard error estimate presented in Zou (2006) fails to reach nominal coverage levels due to the underestimation of the standard error, most notably when the standard error is estimated as 0 when β̂ = 0. However, our estimate of the standard error of the parameter estimates based on our perturbation samples is close to the empirical standard error of the ALASSO estimates, even for parameters estimated as 0. Additionally, we propose two types of simultaneous CRs that adjust for multiple comparisons. We again utilize the oracle property and reduce the dimension of our region by setting intervals to {0} when the proportion of zero perturbed parameter estimates is high. Therefore, the average length of our Normal simultaneous region will often be shorter than the simultaneous OLS region. For instance, when all covariates are independent, the OLS length is asymptotically proportional to $γ_{OLS} = max {| ({\tilde{β}}_{j} - β_{0 j}) / σ |}_{j = 1}^{p}$ whereas the perturbation region length is asymptotically proportional to (q/p)γ where γ = max {|(β̂_j − β₀_j)/σ|}_{β_0j ≠ 0}. Note that γ ≤ γ_OLS and so the length of the perturbation region will be shorter than the OLS length when the true model is sparse. Similarly, when the covariates are not independent, ${({\tilde{β}}_{j} - β_{0 j}) / σ}_{j = 1}^{p} ~ N (0, Corr (\hat{β}))$ and the perturbation region generally has shorter average length than the OLS region. Simple simulations show that when q parameters are estimated as nonzero, we expect the perturbation region length to be approximately 0.36 times the OLS region length when p = 10 and q = 4 and approximately 0.16 times the OLS region length when p = 20 and q = 4 for both the independent case and the compound symmetry case when ρ = 0.5 and σ = 1. However, in finite sample, the gain in interval length for the shrinkage estimators may be substantially less than the theoretical gain as oracle properties may be far from being true and the intervals may need to be enlarged to ensure proper coverage levels.

When the SNR is low, much larger sample sizes may be required for the resampling procedure to yield confidence intervals with proper coverage levels. We conducted further simulations for the case when β = (1, 1, 0.1, 0.1, 0₁_×₍_p₋₄₎)^⊤. In general, we find that the standard error estimates perform well even with sample sizes around 100. The confidence intervals have reasonable coverage levels for β₃ when σ = 1 and the correlation ρ is low with sample size 400 or larger. For example, when σ = 1, ρ = 0.2, and p = 20, the coverage level of the 95% CR^*HDR of β₃ is about 90% for n = 400 and 94% for n = 1000. As we increase the correlation ρ and σ, the interval estimation of β₃ becomes more difficult. For example, for the most difficult case with σ = 2, ρ = 0.5, and p = 20, the empirical coverage level of CR^*HDR is about 60%, 84% and 90% when n = 400, 1000, and 2000, respectively. This is a particularly difficult case as it has been shown that estimating the distribution function of the ALASSO type estimator is not feasible when the effect size is of similar magnitude to $n^{- \frac{1}{2}}$ (Pötscher and Schneider 2009). Note that when σ = 2, the effect size corresponding to β₃ is 0.05 whereas $n^{- \frac{1}{2}} = 0.1$ when n is 100 and $n^{- \frac{1}{2}} \approx 0.032$ for n = 1000.

Additionally, it is well known that regularized estimators, while possessing asymptotic oracle properties, are prone to bias in finite samples. Bias correction for the ALASSO estimator can be achieved based on our perturbation samples. We present the technical details of the estimation of the bias in the appendix. We find that this bias correction works well in practice, especially when the signal is small or moderate, as when β₀_j = 0.5. For example, in our simulations when p = 20, n = 200, ρ = 0.2, σ = 2, and β₀_j = 0.5, the bias of β̂_j is −0.067 while the bias of ${\hat{β}}_{0 j}^{BC}$ is −0.034. Similar gains are seen for most settings. The bias corrected estimator has empirical standard error similar to that of the original ALASSO estimator but with smaller bias. We could construct analogous bias-corrected estimators based on other penalties and objective functions. The model size with the ALASSO and bias-corrected ALASSO estimator in our simulations is close to 5 when σ = 1, except for the difficult cases when n = 100 and p = 20 for which the average model size is closer to 5.5. For the settings where the SNR is low with σ = 2, the oracle property is weak in finite samples and so the model size is between 5 and 6 when p = 10 and between 6 and 9 when p = 20.

We note that when p is large relative to n, initial parameter estimates obtained with ridge regression can produce more stable results. Furthermore, our results may be extended to the case where p tends to ∞ at some rate slower than n. We expect that the theory could be derived using similar arguments as given in Fan and Peng (2004) and Zou and Zhang (2009). Lastly, we note that our methods are robust to misspecification of the model and are valid provided that regularity conditions given in Section 2 hold.

Acknowledgments

The authors thank the editor, the associate editor, and two referees for their insightful and constructive comments that greatly improved the article.

This research was supported by National Institutes of Health Grants T32 AI007358, R01 GM079330, R01 HL089778 and DMS 0854970.

APPENDIX

A.1 Justification for the resampling method

To show that the distribution of $n^{\frac{1}{2}} (\hat{θ} - θ_{0})$ can be estimated by that of $n^{\frac{1}{2}} ({\hat{θ}}^{*} - \hat{θ}) ∣ X$ under conditions C1–C3, we first consider the distribution of $n^{\frac{1}{2}} ({\hat{θ}}^{*} - θ_{0})$ under the product probability measure ℙ^* generated by the data, Inline graphic , and the perturbation variables = {G_i, i = 1, …, n}. Throughout, we assume that the parameter space for θ, denoted by Ω, is a compact set and θ₀ is an interior point of Ω. Note that this compactness condition may be nontrivial in practice. This condition is necessary for this proof of our proposed method, and validity of the method without this condition warrants further investigation. We let ℙ_n denote the empirical measure generated by Inline graphic and $G_{n} = n^{- \frac{1}{2}} (P_{n} - P)$ . We use notation →_p to denote convergence in probability.

We first show that θ̃^* →_p θ₀, where θ̃^* is the perturbed initial parameter estimate obtained by minimizing the perturbed un-regularized likelihood in (3). For f ∈ { Inline graphic (θ; )}, denote the empirical perturbation measure as $P_{n}^{*} f = n^{- 1} \sum_{i = 1}^{n} G_{i} f (X_{i})$ . Since { (θ; ) : θ ∈ Ω} is ℙ-Glivenko-Cantelli, by Corollary 10.14 of Kosorok (2008) $∣ {\tilde{L}}^{*} (θ) - P {L (θ : D)} ∣ \leq ∣ {\tilde{L}}^{*} (θ) - \tilde{L} (θ) ∣ + ∣ \tilde{L} (θ) - P {\tilde{L} (θ; D)} ∣ = ∣ (P_{n}^{*} - P_{n}) {L (θ; D)} ∣ + ∣ (P_{n} - P) {\tilde{L} (θ; D)} ∣$ uniformly converges to zero. Then, under condition C1, ℙ{ Inline graphic (θ; )} has a unique minimum at θ₀, and so θ̃^* →_p θ₀ (Newey and McFadden 1994; Theorem 2.1).

We now show that θ̂^* →_p θ₀. First note that $\sum_{j = 1}^{p} p_{λ_{n}^{*} j}^{'} (∣ {\tilde{β}}_{j}^{*} ∣) ∣ β_{j} ∣ \to 0$ in probability. When the penalty is $L_{q}, p_{λ_{n}^{*} j}^{'} (∣ {\tilde{β}}_{j}^{*} ∣) = λ_{n} {∣ β_{j} ∣}^{q}, p^{'} (∣ {\tilde{β}}_{j}^{*} ∣) \to_{p} p^{'} (∣ β_{0 j} ∣)$ by the continuous mapping theorem and λ_n → 0. For the SCAD penalty, $p_{λ_{n}^{*} j}^{'} (∣ {\tilde{β}}_{j}^{*} ∣) = λ_{n} I (∣ {\tilde{β}}_{j}^{*} ∣ \leq λ_{n}) + {(a λ_{n} - ∣ {\tilde{β}}_{j}^{*} ∣)}_{+} I (∣ {\tilde{β}}_{j}^{*} ∣ > λ_{n}) / (a - 1)$ . We consider two cases: (i) β₀_j ≠ 0, and (ii) β₀_j = 0. For case (i), λ_n → 0 and $∣ {\tilde{β}}_{j}^{*} ∣ \to_{p} ∣ β_{0 j} ∣$ . Thus, $I (∣ {\tilde{β}}_{j}^{*} ∣ \leq λ_{n}) \to_{p} 0$ and ${(a λ_{n} - ∣ {\tilde{β}}_{j}^{*} ∣)}_{+} \to_{p} 0$ . For case (ii), λ_n → 0 and ${(a λ_{n} - ∣ {\tilde{β}}_{j}^{*} ∣)}_{+} \to_{p} 0$ . Finally, for the ALASSO penalty, $p_{λ_{n}^{*} j}^{'} (∣ {\tilde{β}}_{j}^{*} ∣) = λ_{n} {∣ n^{\frac{1}{2}} {\tilde{β}}_{j}^{*} ∣}^{- 1}, ∣ n^{\frac{1}{2}} {\tilde{β}}_{j}^{*} ∣ = O_{P} (1)$ , and λ_n → 0. Then, since θ lies in a compact space, $\sum_{j = 1}^{p} p_{λ_{n}^{*} j}^{'} (∣ {\tilde{β}}_{j}^{*} ∣) ∣ β_{j} ∣ \leq τ \sum_{j = 1}^{p} p_{λ_{n}^{*} j}^{'} (∣ {\tilde{β}}_{j}^{*} ∣) \leq | | β | | B_{n}$ , where τ = max{|β_j|}, B_n = o_ℙ (1) since $p_{λ_{n}^{*} j}^{'} (∣ {\tilde{β}}_{j}^{*} ∣) \to_{P} 0$ for each j, and hence ${sup}_{θ} | \sum_{j = 1}^{p} p_{λ_{n}^{*} j}^{'} (∣ {\tilde{β}}_{j}^{*} ∣) ∣ β_{j} ∣ | \to_{p} 0$ (Newey and McFadden 1994; Lemma 2.9). Now, with similar arguments as above for the proof of θ̃^* →_p θ₀, we have that $∣ {\hat{L}}^{*} (θ) = P {L (θ; D)} ∣ \leq ∣ {\tilde{L}}^{*} (θ) - P {L (θ; D)} ∣ + \sum_{j = 1}^{p} p_{λ_{n}^{*} j}^{'} (∣ {\tilde{β}}_{j}^{*} ∣) ∣ β_{j} ∣$ uniformly converges to zero. This implies the convergence of θ̂^* →_p θ₀.

We next show that $| | {\hat{θ}}^{*} - θ_{0} | | = O_{P^{*}} (n^{- \frac{1}{2}})$ . It is sufficient to show that for any ε > 0, there exits C > 0 such that

P^{*} (inf_{| | θ - θ_{o} | | \geq {C n}^{- \frac{1}{2}}} {\hat{L}}^{*} (θ) > {\hat{L}}^{*} (θ_{0})) > 1 - ε

(A.1)

Consider $θ = θ_{0} + n^{- \frac{1}{2}} u$ . Condition C3(c) implies

\frac{P_{n} {L (θ_{0} + n^{- \frac{1}{2}} u) - L (θ_{0}) - n^{- \frac{1}{2}} U {(θ_{0}; D)}^{⊤} u} - \frac{1}{2} n^{- 1} u^{⊤} A u}{| | n^{- \frac{1}{2}} u | |} = o_{P} (1)

uniformly in u. By the multiplier central limit theorem (Kosorok 2008; Theorem 10.1),

\frac{P_{n}^{*} {L (θ_{0} + n^{- \frac{1}{2}} u) G - \tilde{L} (θ_{0}) G - n^{- \frac{1}{2}} U {(θ_{0}; D)}^{⊤} u G} - \frac{1}{2} n^{- 1} u^{⊤} A u}{| | n^{- \frac{1}{2}} u | |} = o_{P^{*}} (1)

uniformly in u. It follows that uniformly for $θ \in {θ : | | θ - θ_{0} | | \leq n^{- \frac{1}{2}} u}$ ,

{\tilde{L}}^{*} (θ_{0} + n^{- \frac{1}{2}} u) - {\tilde{L}}^{*} (θ_{0}) = n^{- \frac{1}{2}} P_{n} {U (θ_{0}; D) G} u + \frac{1}{2} n^{- 1} u^{⊤} A u + o_{P^{*}} (n^{- 1} | | u | |)

(A.2)

and thus we may approximate $n {{\hat{L}}^{*} (θ_{0} + n^{- \frac{1}{2}} u) - {\hat{L}}^{*} (θ_{0})}$ with $G_{n} {U (θ_{0}; D) G} u + \frac{1}{2} u^{⊤} A u + n \sum_{j = 1}^{p} p_{λ_{n}^{*} j}^{'} (∣ {\tilde{β}}_{j}^{*} ∣) (| β_{0 j} + n^{- \frac{1}{2}} u_{j} | - ∣ β_{0 j} ∣) + o_{P^{*}} ({| | u | |}^{2} + | | u | |)$ .

Now we show the “consistency” of variable selection, i.e., $P^{*} ({\hat{θ}}_{A^{c}}^{*} = 0) \to 1$ as n → ∞. It suffices to to show that for any constant C and given θ̃ such that $| | {\tilde{θ}}_{A} - θ_{0 A} | | = O_{P^{*}} (n^{- \frac{1}{2}})$

P^{*} [{argmin}_{| | θ_{A^{c}} | | \leq {C n}^{- \frac{1}{2}}} {\hat{L}}^{*} {{({\tilde{θ}}_{A}^{⊤}, θ_{A^{c}}^{⊤})}^{⊤}} = 0] \to 1.

(A.3)

Let ũ and u denote $n^{\frac{1}{2}} ({\tilde{θ}}_{A} - θ_{0 A})$ and $n^{\frac{1}{2}} θ_{A^{c}}$ , respectively. It follows from (A.2)

\begin{array}{l} n [{\hat{L}}^{*} {{(θ_{0 A}^{⊤} + n^{- \frac{1}{2}} {\tilde{u}}_{A}^{⊤}, n^{- \frac{1}{2}} u_{A^{c}}^{⊤})}^{⊤}} - {\hat{L}}^{*} {{(θ_{0 A}^{⊤} + n^{- \frac{1}{2}} {\tilde{u}}_{A}^{⊤}, 0^{⊤})}^{⊤}}] \\ = [G_{n} {U {(θ_{0}; D)}_{A^{c}}^{⊤} G} + {\tilde{u}}_{A}^{⊤} A_{12}] u_{A^{c}} + \frac{1}{2} u_{A^{c}}^{⊤} A_{22} u_{A^{c}} + n \sum_{j \in A^{c}} p_{λ_{n}^{*} j}^{'} (∣ {\tilde{β}}_{j}^{*} ∣) ∣ n^{- \frac{1}{2}} u_{j} ∣ \\ + o_{P^{*}} ({| | u_{A^{c}} | |}^{2} + | | u_{A^{c}} | |) = \sum_{j \in A^{c}} n^{\frac{1}{2}} p_{λ_{n}^{*} j}^{'} (∣ {\tilde{β}}_{j}^{*} ∣) ∣ u_{j} ∣ + R_{n} (u_{A^{c}}) . \end{array}

where sup_||_u_{|| ≤} _C R_n(u)/(||u||² + ||u||) = o_ℙ^* (1). Zou and Li (2008) consider the limiting behavior of $n^{\frac{1}{2}} p_{λ_{n}^{*} j}^{'} (∣ {\tilde{β}}_{j}^{*} ∣)$ for SCAD and L_q penalties in their proof of the oracle properties of the one-step LLA estimator. They show that for both cases, when j ∈ Inline graphic , $n^{\frac{1}{2}} p_{λ_{n}^{*} j}^{'} (∣ {\tilde{β}}_{j}^{*} ∣) \to_{p} \infty$ . Additionally, for the ALASSO penalty, $n^{\frac{1}{2}} p_{λ_{n}^{*} j}^{'} (∣ {\tilde{β}}_{j}^{*} ∣) = n^{- \frac{1}{2}} λ_{n} {∣ n^{\frac{1}{2}} {\tilde{β}}_{j}^{*} ∣}^{- 1}$ , when j ∈ Inline graphic , we have $n^{- \frac{1}{2}} λ_{n} \to \infty$ and $∣ n^{\frac{1}{2}} {\tilde{β}}_{j}^{*} ∣ = O_{P^{*}} (1)$ . Hence, for all three types of penalties, $n^{\frac{1}{2}} p_{λ_{n}^{*} j}^{'} (∣ {\tilde{β}}_{j}^{*} ∣) \to_{p} \infty$ . Thus, for any ε > 0, there exist C₁ > C₀ > 0 and N₀ such that $P^{*} {\sum_{j \in A^{c}} n^{\frac{1}{2}} p_{λ_{n}^{*} j}^{'} (∣ {\tilde{β}}_{j}^{*} ∣) ∣ u_{j} ∣ \geq C_{1} \sum_{j \in A^{c}} ∣ u_{j} ∣} \geq 1 - ε$ and ℙ^*{C₀ Σ_j _∈|u_j| ≥ |R_n(u)|} ≥ 1 − ε for ||u|| ≤ C and n ≥ N₀. This implies that with probability greater than 1 − 2ε, $n [{\hat{L}}^{*} {{({\tilde{θ}}_{A}^{⊤}, n^{- \frac{1}{2}} u_{A^{c}}^{⊤})}^{⊤}} - {\hat{L}}^{*} {{({\tilde{θ}}_{A}^{⊤}, 0^{⊤})}^{⊤}}] \geq (C_{1} - C_{0}) \sum_{j \in A^{c}} ∣ u_{j} ∣ \geq 0$ , which implies (A.3).

Lastly, we will justify the oracle property of ${\hat{θ}}_{A}^{*}$ . Since $P^{*} ({\hat{θ}}_{A^{c}}^{*} = 0) \to 1, {\hat{θ}}_{A}^{*}$ can be considered as the minimizer of ${\hat{L}}_{A}^{*} (θ_{A}) = {\hat{L}}^{*} {{(θ_{A}^{⊤}, 0^{⊤})}^{⊤}}$ . Following the approach of Zou (2006), we consider the reparametrization

{\hat{L}}_{A}^{*} (θ_{0 A} + n^{- \frac{1}{2}} u_{A}) = P_{n} L {{(θ_{0 A}^{⊤} + n^{- \frac{1}{2}} u_{A}^{⊤}, 0^{⊤})}^{⊤}, D_{i}} G_{i} + \sum_{j \in A} p_{λ_{n}^{*} j}^{'} (∣ {\tilde{β}}_{j}^{*} ∣) | β_{0 j} + n^{- \frac{1}{2}} u_{j} | .

(A.4)

Let ${\hat{u}}_{A}^{(n)} = arg {min}_{u_{A}} {\hat{L}}_{A}^{*} (θ_{0 A} + n^{- \frac{1}{2}} u_{A})$ . Note ${\hat{u}}_{A}^{(n)} = n^{\frac{1}{2}} ({\hat{θ}}_{A}^{*} - θ_{0 A})$ is also the minimizer of $V_{n}^{*} (u_{A}) \equiv {\hat{L}}_{A}^{*} (θ_{0 A} + n^{- \frac{1}{2}} u_{A}) - {\hat{L}}^{*} (θ_{0})$ , as Inline graphic (θ₀) is a constant. Again, it follows from (A.2)

\begin{array}{l} V_{n}^{*} (u_{A}) = n^{\frac{1}{2}} u_{A}^{⊤} P_{n} {U_{A} (θ_{0}, D) G} + \frac{1}{2} u_{A}^{⊤} A_{11} u_{A} \\ + n \sum_{j \in A} p_{λ_{n}^{*} j}^{'} (∣ {\tilde{β}}_{j}^{*} ∣) (| β_{0 j} + n^{\frac{1}{2}} u_{j} | - ∣ β_{0 j} ∣) + o_{P^{*}} ({| | u_{A} | |}^{2} + | | u_{A} | |) \end{array}

To examine the limiting behavior of the third term of $V_{n}^{*} (u)$ , we have β₀_j ≠ 0, $n^{\frac{1}{2}} (∣ β_{j 0} + n^{- \frac{1}{2}} u_{j} ∣ - ∣ β_{j 0} ∣) \to_{p} u_{j} sgn (β_{0 j})$ , since j ∈ Inline graphic . Also, as Zou and Li (2008) proved in their appendix, $n^{\frac{1}{2}} p_{λ_{n}^{*} j}^{'} (∣ {\tilde{β}}_{j}^{*} ∣) \to_{p} 0$ when j ∈ for the SCAD and L_q penalties. For the ALASSO penalty, $n^{\frac{1}{2}} p_{λ_{n}^{*} j}^{'} (∣ {\tilde{β}}_{j}^{*} ∣) = λ_{n} {∣ {\tilde{β}}_{j} ∣}^{- 1}$ , λ_n → 0, and |β̃_j|⁻¹ →_p |β₀_j|⁻¹ for β₀_j ≠ 0. Therefore, by Slutsky’s theorem, we have ${n p}_{λ_{n}^{*} j}^{'} (∣ {\tilde{β}}_{j}^{*} ∣) (| β_{0 j} + n^{\frac{1}{2}} u_{j} | - ∣ β_{0 j} ∣) = o_{P^{*}} (1)$ and

V_{n}^{*} (u_{A}) = u_{A}^{⊤} G_{n} {U_{A} (θ_{0}, D) G} + \frac{1}{2} u_{A}^{⊤} A_{11} u_{A} + o_{P^{*}} (1 + {| | u_{A} | |}^{2} + | | u_{A} | |) .

Thus, ${\hat{u}}_{A}^{(n)} = - A_{11}^{- 1} G_{n} {U_{A} (θ_{0}, D) G} + o_{P^{*}} (1)$ . Since Inline graphic { (θ₀, )G} converges to N(0, ) in distribution, $n^{\frac{1}{2}} ({\hat{θ}}_{A}^{*} - θ_{0, A}) \to_{d} N (0, A_{11}^{- 1} B_{11} A_{11}^{- 1})$ and $P^{*} ({\hat{θ}}_{A^{c}}^{*} = 0) \to 1$ . Then the perturbed regularized estimator θ̂^* is asymptotically normal in the true nonzero parameter set.

Similar arguments as given above, along with the conditional multiplier central limit theorem (Kosorok 2008; Theorem 10.4), can be used to justify that the distribution of $n^{\frac{1}{2}} ({\hat{θ}}^{*} - \hat{θ}) ∣ X$ approximates that of $n^{\frac{1}{2}} (\hat{θ} - θ_{0})$ . Specifically, we can similarly obtain $n^{\frac{1}{2}} ({\hat{θ}}_{A} - θ_{0 A}) = - A_{11}^{- 1} G_{n} {U_{A} (θ_{0}, D)} + o_{P} (1)$ and ℙ^*(θ̂= 0) → 1. Therefore, $n^{\frac{1}{2}} ({\hat{θ}}_{A}^{*} - {\hat{θ}}_{A}) = - A_{11}^{- 1} G_{n} {U_{A} (θ_{0}, D) (G - 1)} + o_{P^{*}} (1)$ . Since $- A_{11}^{- 1} G_{n} {U_{A} (θ_{0}, D) (G - 1)} ∣ X \to_{d} N (0, A_{11}^{- 1} {\hat{B}}_{11} A_{11}^{- 1})$ and Inline graphic →_ℙ , $n^{\frac{1}{2}} ({\hat{θ}}_{A}^{*} - {\hat{θ}}_{A}) ∣ X$ and $n^{\frac{1}{2}} ({\hat{θ}}_{A} - θ_{0 A})$ converge in distribution to the same limit. Furthermore, $P^{*} ({\hat{θ}}_{A^{C}}^{*} = 0 ∣ X) \to 1$ .

A.2 Choice of thresholding values p̂_high and p̂_low for confidence regions

We choose values for p̂_high and p̂_low to converge at rates relative to the order of the tuning parameter λ and bounded by the probability that our perturbation samples are set to zero. For illustration, consider the univariate β case with one predictor under orthonormal design. Consider standardized parameters γ̂ = β̂/σ, γ = β/σ and λ̃_n = λ_n/σ², where λ_n → 0 and $n^{\frac{1}{2}} λ_{n} \to \infty$ . Then

\hat{γ} \dot{\sim} N (γ_{0}, n^{- 1}), {\hat{γ}}_{1} = \hat{γ} {(1 - \frac{{\tilde{λ}}_{n}}{{∣ \hat{γ} ∣}^{2}})}_{+}, γ_{1}^{*} \dot{\sim} γ^{*} {(1 - \frac{{\tilde{λ}}_{n}}{{∣ γ^{*} ∣}^{2}})}_{+}

where γ^* ∼̇ N(γ̂, 1/n). Thus $γ_{1}^{*} = 0$ with probability

{\hat{P}}_{0} = P {∣ γ^{*} ∣ < {\tilde{λ}}_{n}^{1 / 2}} = Φ {n^{1 / 2} ({\tilde{λ}}_{n}^{1 / 2} - \hat{γ})} - Φ {- n^{1 / 2} ({\tilde{λ}}_{n}^{1 / 2} + \hat{γ})}

First consider non-zero parameters. Without loss of generality, assume that γ₀ > 0. Let $ε = 2 {\tilde{λ}}_{n}^{\frac{1}{2}}$ and assume that γ₀ > 2ε. Then

\begin{array}{l} E ({\hat{P}}_{0}) = E [Φ {n^{\frac{1}{2}} ({\tilde{λ}}_{n}^{\frac{1}{2}} - \hat{γ})} - Φ {n^{\frac{1}{2}} (- {\tilde{λ}}_{n}^{\frac{1}{2}} - \hat{γ})}] \\ \leq [Φ {n^{\frac{1}{2}} ({\tilde{λ}}_{n}^{\frac{1}{2}} - ε)} - Φ {n^{\frac{1}{2}} (- {\tilde{λ}}_{n}^{\frac{1}{2}} - ε)}] P (\hat{γ} > ε) + P (\hat{γ} \leq ε) \\ \leq 2 Φ (- n^{\frac{1}{2}} ε) ⪷ \sqrt{2 / π} exp {- n {\tilde{λ}}_{n}} = \sqrt{2 / π} exp {- n λ_{n} / σ^{2}} \end{array}

Thus, we propose to choose the lower bound ${\hat{p}}_{low} = min (0.49, \sqrt{2 / π} exp {- n λ_{n} / (4 σ^{2})})$ such that ${\hat{p}}_{low} ≫ \sqrt{2 / π} exp (- n λ_{n} / σ^{2})$ . On the other hand, if γ₀ = 0, then

\begin{array}{l} E (1 - {\hat{P}}_{0}) = E [Φ {n^{\frac{1}{2}} (- {\tilde{λ}}_{n}^{\frac{1}{2}} + \hat{γ})} + Φ {n^{\frac{1}{2}} (- {\tilde{λ}}_{n}^{\frac{1}{2}} - \hat{γ})}] \\ \approx 2 Φ {- n^{\frac{1}{2}} {\tilde{λ}}_{n}^{\frac{1}{2}} / \sqrt{2}} ⪷ \sqrt{2 / π} exp {- n {\tilde{λ}}_{n} / 4} = \sqrt{2 / π} exp {- n λ_{n} / (4 σ^{2})} \end{array}

Thus, we chose ${\hat{p}}_{high}^{*} = 1 - \sqrt{2 / π} exp (- n λ_{n} / σ^{2})$ such that ${\hat{p}}_{high}^{*} ≫ 1 - \sqrt{2 / π} exp {- n λ_{n} / (4 σ^{2})}$ . Note that we chose p̂_low and p̂_high such that p̂_low goes to 0 at a much slower rate than P̂₀ for γ₀ ≠ 0. On the other hand, when γ₀ = 0, ${\hat{p}}_{high}^{*}$ goes to 1 at a much faster rate than P̂₀ and thus ${\hat{P}}_{0} > {\hat{p}}_{high} = min (1 - α, {\hat{p}}_{high}^{*})$ occurs with probability approaching 1 as n → ∞, for any fixed α > 0. Consequently, this indicates a strong evidence of γ = 0 when P̂₀ > p̂_high. When σ is unknown, it is replaced with a consistent estimate σ̂.

A.3 Justification of highest density region and bias estimate

For j ∈ Inline graphic , $P^{*} ({\hat{β}}_{j}^{*} = 0) \to 1$ and thus for any α > 0, ℙ^*( > α) → 1, and ℙ( < p̂_high) + ℙ( < p̂_low) → 0. Hence, $P^{*} (0 \in {CR}_{j}^{* HDR}) \to 1$ and so we include {0} in our CR when > p̂_low and the coverage of ${CR}_{j}^{* HDR}$ converges to 1 when β₀_j = 1. For j ∈ Inline graphic , →_p 0, and our estimates converge to a continuous distribution, specifically $n^{\frac{1}{2}} ({\hat{β}}_{j}^{*} - {\hat{β}}_{j}) ∣ X \to_{d} N (0, σ_{j}^{2})$ , where $σ_{j}^{2}$ is the asymptotic variance of $n^{\frac{1}{2}} ({\hat{β}}_{j} - β_{0 j})$ . It follows that ${sup}_{x} ∣ n^{- \frac{1}{2}} f_{j}^{*} ({\hat{β}}_{0 j} + n^{\frac{1}{2}} x) - φ_{σ_{j}} (x) ∣ \to_{p} 0$ where φ_σ(x) = φ(x/σ)/σ and φ(·) is the density function of the standard normal. Therefore, ${sup}_{β} ∣ n^{- \frac{1}{2}} f_{j}^{*} (β) - φ_{σ_{j}} {n^{\frac{1}{2}} (β - {\hat{β}}_{0 j})} ∣ \to_{p} 0$ and $n^{- \frac{1}{2}} {\hat{c}}_{3} \to_{p} c_{30}$ , where c₃₀ is the solution to ∫ I{φ_{σ_j} (x) > c₃₀}φ_{σ_j} (x)dx = 1 − α. It follows that the coverage of our CR converges to nominal levels since, with respect to probability measure ℙ^*, $pr (β_{0 j} \in {CR}_{j}^{* HDR}) = pr {f_{j}^{*} (β_{0 j}) \geq {\hat{c}}_{3}} + o_{P^{*}} (1) = pr {n^{- \frac{1}{2}} f_{j}^{*} (β_{0 j}) \geq n^{- \frac{1}{2}} {\hat{c}}_{3}} + o_{P^{*}} (1) = pr [φ_{σ_{j}} {n^{\frac{1}{2}} (β_{0 j} - {\hat{β}}_{0 j})} \geq c_{30}] + o_{P^{*}} (1) \to 1 - α$ .

Here we define our bias corrected estimator for β₀_j, ${\hat{β}}_{j}^{B C} = {\hat{β}}_{j} + I ({\hat{β}}_{j} \neq 0) {\hat{bias}}_{j}$ , where ${\hat{bias}}_{j} = (\frac{1}{M} \sum_{m = 1}^{M} {\hat{β}}_{j, m}^{*}) {(- 1)}^{I [\sum_{m = 1}^{M} {I ({\hat{β}}_{j, m}^{*} > 0) - I ({\hat{β}}_{j, m}^{*} < 0)} < 0]} {({\hat{A}}_{λ}^{- 1})}_{j j} / {n max (∣ {\hat{ξ}}_{7.5} ∣, ∣ {\hat{ξ}}_{97.5} ∣)}, {\hat{A}}_{λ} = n^{- 1} (X_{\hat{A}}^{⊤} X_{\hat{A}} + n^{- \frac{1}{2}} λ_{n} diag {1 / {\tilde{β}}_{j}^{2}}_{j = 1}^{p})$ and ξ̂_r is the r percentile of { ${\tilde{β}}_{j, m}^{*}$ , m = 1, … M}. We estimate Inline graphic for ALASSO with following the methods of Cai et al. (2009) where a stabilized estimate of the covariance of coefficients from an accelerated failure time model is used.

A.4 Selection of λ with Bayes Information Criterion

In Section 2, we suggest choosing the tuning parameter λ_n by minimizing the BIC. Here we explicitly present the BIC for the linear regression objective function and ALASSO penalty that we utilized in the simulations and data example in Sections 3 and 4. First, assume that the data has been centered so there is no intercept. We implement a least squares approximation of the likelihood for BIC(λ) as in Wang and Leng (2007). For a given λ,

BIC (λ) = {(\hat{β} (λ) - \tilde{β})}^{T} {\sum^{^}}_{λ}^{- 1} (\hat{β} (λ) - \tilde{β}) + {\hat{q}}_{λ} ω_{n}

where β̂ (λ) minimizes the least squares objective function $\hat{L} (β) = {(y - X β)}^{T} (y - X β) + \sum_{j = 1}^{p} p_{λ}^{'} (∣ {\tilde{β}}_{j} ∣) ∣ β_{j} ∣$ , based on (1), ${\sum^{^}}_{λ}^{- 1} = {({\hat{σ}}^{2} n)}^{- 1} {X^{T} X + λ diag {I ({\hat{β}}_{j} (λ) \neq 0) / ∣ {\tilde{β}}_{j} {\tilde{β}}_{j} (λ) ∣}_{j = 1}^{p}}$ is a stabilized estimate of Σ similar to that in Zou (2006), σ̂² is the consistent estimate of σ from the linear regression model based on the residual variance, and q̂_λ estimates the degrees of freedom of ALASSO with the number of nonzero elements of β̂ (λ) (Zou et al. 2007). We choose ω_n = min(n^0.1, log(n)) because numerical results suggest that log(n) is much greater than n^0.1 and leads to excessive shrinkage of moderately sized parameters in finite sample.

Contributor Information

Jessica Minnier, Email: jminnier@hsph.harvard.edu, Ph.D. candidate, Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115.

Lu Tian, Email: lutian@stanford.edu, Assistant Professor, Department of Health Research & Policy, Stanford University School of Medicine, Palo Alto, CA 94304.

Tianxi Cai, Email: tcai@hsph.harvard.edu, Associate Professor, Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115.

References

Cai T, Huang J, Tian L. Regularized estimation for the accelerated failure time model. Biometrics. 2009;65:394–404. doi: 10.1111/j.1541-0420.2008.01074.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cassidy A, Myles J, van Tongeren M, Page R, Liloglou T, Duffy S, Field J. The LLP risk model: an individual risk prediction model for lung cancer. British Journal of Cancer. 2008;98:270. doi: 10.1038/sj.bjc.6604158. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chatterjee A, Lahiri S. Asymptotic properties of the residual bootstrap for lasso estimators. Proceedings of the American Mathematical Society. 2010 (accepted) [Google Scholar]
Dent R, Trudeau M, Pritchard K, Hanna W, Kahn H, Sawka C, Lickley L, Rawlinson E, Sun P, Narod S. Triple-negative breast cancer: clinical features and patterns of recurrence. Clinical Cancer Research. 2007;13:4429. doi: 10.1158/1078-0432.CCR-06-3045. [DOI] [PubMed] [Google Scholar]
Dosaka-Akita H, Hommura F, Mishina T, Ogura S, Shimizu M, Katoh H, Kawakami Y. A risk-stratification model of non-small cell lung cancers using cyclin E, Ki-67, and ras p21: different roles of G1 cyclins in cell proliferation and prognosis. Cancer Research. 2001;61:2500. [PubMed] [Google Scholar]
Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression. Annals of Statistics. 2004;32:407–451. [Google Scholar]
Fan J, Li R. Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. Journal of the American Statistical Association. 2001;96:1348–1360. [Google Scholar]
Fan J, Li R. Variable Selection for Cox’s Proportional Hazards Model and Frailty Model. The Annals of Statistics. 2002;30:74–99. [Google Scholar]
Fan J, Li R. New Estimation and Model Selection Procedures for Semiparametric Modeling in Longitudinal Data Analysis. Journal of the American Statistical Association. 2004;99:710–723. [Google Scholar]
Fan J, Peng H. On Nonconcave Penalized Likelihood With Diverging Number of Parameters. The Annals of Statistics. 2004;32:928–961. [Google Scholar]
Freedman A, Slattery M, Ballard-Barbash R, Willis G, Cann B, Pee D, Gail M, Pfeiffer R. Colorectal cancer risk prediction tool for white men and women without known susceptibility. Journal of Clinical Oncology. 2009;27:686. doi: 10.1200/JCO.2008.17.4797. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gail M, Brinton L, Byar D, Corle D, Green S, Schairer C, Mulvihill J. Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. Journal of the National Cancer Institute. 1989;81:1879. doi: 10.1093/jnci/81.24.1879. [DOI] [PubMed] [Google Scholar]
Gail M, Costantino J. Validating and improving models for projecting the absolute risk of breast cancer. Journal of the National Cancer Institute. 2001;93:334. doi: 10.1093/jnci/93.5.334. [DOI] [PubMed] [Google Scholar]
Jin Z, Ying Z, Wei L. A simple resampling method by perturbing the minimand. Biometrika. 2001;88:381–390. [Google Scholar]
Johnson V, Brun-Vézinet F, Clotet B, Conway B, Kuritzkes D, Pillay D, Schapiro J, Telenti A, Richman D. Update of the drug resistance mutations in HIV-1: Fall 2005. Top HIV Med. 2005;13:125–131. [PubMed] [Google Scholar]
Knight K, Fu W. Asymptotics for Lasso-Type Estimators. The Annals of Statistics. 2000;28:1356–1378. [Google Scholar]
Kosorok M. Introduction to empirical processes and semiparametric inference. New York: Springer Verlag; 2008. [Google Scholar]
Newey W, McFadden D. Large sample estimation and hypothesis testing. Handbook of econometrics. 1994;4:2111–2245. [Google Scholar]
Perou C, Sørlie T, Eisen M, van de Rijn M, Jeffrey S, Rees C, Pollack J, Ross D, Johnsen H, Akslen L, et al. Molecular portraits of human breast tumours. Nature. 2000;406:747–752. doi: 10.1038/35021093. [DOI] [PubMed] [Google Scholar]
Pötscher BM, Schneider U. On the distribution of the adaptive LASSO estimator. Journal of Statistical Planning and Inference. 2009;139:2775–2790. [Google Scholar]
Pötscher BM, Schneider U. Confidence Sets Based on Penalized Maximum Likelihood Estimators in Gaussian Regression. Electronic Journal of Statistics. 2010;4:334–360. [Google Scholar]
Prado J, Wrin T, Beauchaine J, Ruiz L, Petropoulos C, Frost S, Clotet B, D’Aquila R, Martinez-Picado J. Amprenavir-resistant HIV-1 exhibits lopinavir cross-resistance and reduced replication capacity. Aids. 2002;16:1009. doi: 10.1097/00002030-200205030-00007. [DOI] [PubMed] [Google Scholar]
Rhee S, Gonzales M, Kantor R, Betts B, Ravela J, Shafer R. HIV reverse transcriptase and sequence database. Nucleic Acids Res. 2003;31:298–303. doi: 10.1093/nar/gkg100. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schumi J, DeGruttola V. Resampling-based analyses of the effects of combinations of HIV genetic mutations on drug susceptibility. Statistics in Medicine. 2008;27 doi: 10.1002/sim.3181. [DOI] [PubMed] [Google Scholar]
Scott D. Multivariate density estimation: theory, practice, and visualization. Wiley-Interscience; 1992. [Google Scholar]
Silverman B. Density estimation for statistics and data analysis. Chapman & Hall/CRC; 1986. [Google Scholar]
Spiegelman D, Colditz G, Hunter D, Hertzmark E. Validation of the Gail et al. model for predicting individual breast cancer risk. Journal of the National Cancer Institute. 1994;86:600. doi: 10.1093/jnci/86.8.600. [DOI] [PubMed] [Google Scholar]
Thompson I, Ankerst D, Chi C, Goodman P, Tangen C, Lucia M, Feng Z, Parnes H, Coltman C., Jr Assessing prostate cancer risk: results from the Prostate Cancer Prevention Trial. Journal of the National Cancer Institute. 2006;98:529. doi: 10.1093/jnci/djj131. [DOI] [PubMed] [Google Scholar]
Tibshirani R. Regression Shrinkage and Selection Via the Lasso. Journal of the Royal Statistical Society, Series B. 1996;58:267–288. [Google Scholar]
Van Marck H, Dierynck I, Kraus G, Hallenberger S, Pattery T, Muyldermans G, Geeraert L, Borozdina L, Bonesteel R, Aston C, et al. The Impact of Individual Human Immunodeficiency Virus Type 1 Protease Mutations on Drug Susceptibility Is Highly Influenced by Complex Interactions with the Background Protease Sequence. Journal of Virology. 2009;83:9512. doi: 10.1128/JVI.00291-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang H, Leng C. Unified LASSO estimation via least squares approximation. Journal of the American Statistical Association. 2007;102:1039–1048. [Google Scholar]
Wu M. PhD thesis. Harvard School of Public Health, Department of Biostatistics; Boston, MA: 2009. A parametric permutation test for regression coefficients in LASSO regularized regression. [Google Scholar]
Zhang H, Ahn J, Lin X, Park C. Gene Selection using Support Vector Machines with Non-convex Penalty. Bioinformatics. 2006;22:88–95. doi: 10.1093/bioinformatics/bti736. [DOI] [PubMed] [Google Scholar]
Zou H. The Adaptive Lasso and Its Oracle Properties. Journal of the American Statistical Association. 2006;101:1418–1429. [Google Scholar]
Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society. 2005;B67:301–320. [Google Scholar]
Zou H, Hastie T, Tibshirani R. On the “degrees of freedom” of the lasso. Annals of Statistics. 2007;35:2173–2192. [Google Scholar]
Zou H, Li R. One-step sparse estimates in nonconcave penalized likelihood models. The Annals of Statistics. 2008;36:1509–1533. doi: 10.1214/009053607000000802. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zou H, Zhang H. On the adaptive elastic-net with a diverging number of parameters. The Annals of Statistics. 2009;37:1733–1751. doi: 10.1214/08-AOS625. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] Cai T, Huang J, Tian L. Regularized estimation for the accelerated failure time model. Biometrics. 2009;65:394–404. doi: 10.1111/j.1541-0420.2008.01074.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] Cassidy A, Myles J, van Tongeren M, Page R, Liloglou T, Duffy S, Field J. The LLP risk model: an individual risk prediction model for lung cancer. British Journal of Cancer. 2008;98:270. doi: 10.1038/sj.bjc.6604158. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] Chatterjee A, Lahiri S. Asymptotic properties of the residual bootstrap for lasso estimators. Proceedings of the American Mathematical Society. 2010 (accepted) [Google Scholar]

[R4] Dent R, Trudeau M, Pritchard K, Hanna W, Kahn H, Sawka C, Lickley L, Rawlinson E, Sun P, Narod S. Triple-negative breast cancer: clinical features and patterns of recurrence. Clinical Cancer Research. 2007;13:4429. doi: 10.1158/1078-0432.CCR-06-3045. [DOI] [PubMed] [Google Scholar]

[R5] Dosaka-Akita H, Hommura F, Mishina T, Ogura S, Shimizu M, Katoh H, Kawakami Y. A risk-stratification model of non-small cell lung cancers using cyclin E, Ki-67, and ras p21: different roles of G1 cyclins in cell proliferation and prognosis. Cancer Research. 2001;61:2500. [PubMed] [Google Scholar]

[R6] Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression. Annals of Statistics. 2004;32:407–451. [Google Scholar]

[R7] Fan J, Li R. Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. Journal of the American Statistical Association. 2001;96:1348–1360. [Google Scholar]

[R8] Fan J, Li R. Variable Selection for Cox’s Proportional Hazards Model and Frailty Model. The Annals of Statistics. 2002;30:74–99. [Google Scholar]

[R9] Fan J, Li R. New Estimation and Model Selection Procedures for Semiparametric Modeling in Longitudinal Data Analysis. Journal of the American Statistical Association. 2004;99:710–723. [Google Scholar]

[R10] Fan J, Peng H. On Nonconcave Penalized Likelihood With Diverging Number of Parameters. The Annals of Statistics. 2004;32:928–961. [Google Scholar]

[R11] Freedman A, Slattery M, Ballard-Barbash R, Willis G, Cann B, Pee D, Gail M, Pfeiffer R. Colorectal cancer risk prediction tool for white men and women without known susceptibility. Journal of Clinical Oncology. 2009;27:686. doi: 10.1200/JCO.2008.17.4797. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] Gail M, Brinton L, Byar D, Corle D, Green S, Schairer C, Mulvihill J. Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. Journal of the National Cancer Institute. 1989;81:1879. doi: 10.1093/jnci/81.24.1879. [DOI] [PubMed] [Google Scholar]

[R13] Gail M, Costantino J. Validating and improving models for projecting the absolute risk of breast cancer. Journal of the National Cancer Institute. 2001;93:334. doi: 10.1093/jnci/93.5.334. [DOI] [PubMed] [Google Scholar]

[R14] Jin Z, Ying Z, Wei L. A simple resampling method by perturbing the minimand. Biometrika. 2001;88:381–390. [Google Scholar]

[R15] Johnson V, Brun-Vézinet F, Clotet B, Conway B, Kuritzkes D, Pillay D, Schapiro J, Telenti A, Richman D. Update of the drug resistance mutations in HIV-1: Fall 2005. Top HIV Med. 2005;13:125–131. [PubMed] [Google Scholar]

[R16] Knight K, Fu W. Asymptotics for Lasso-Type Estimators. The Annals of Statistics. 2000;28:1356–1378. [Google Scholar]

[R17] Kosorok M. Introduction to empirical processes and semiparametric inference. New York: Springer Verlag; 2008. [Google Scholar]

[R18] Newey W, McFadden D. Large sample estimation and hypothesis testing. Handbook of econometrics. 1994;4:2111–2245. [Google Scholar]

[R19] Perou C, Sørlie T, Eisen M, van de Rijn M, Jeffrey S, Rees C, Pollack J, Ross D, Johnsen H, Akslen L, et al. Molecular portraits of human breast tumours. Nature. 2000;406:747–752. doi: 10.1038/35021093. [DOI] [PubMed] [Google Scholar]

[R20] Pötscher BM, Schneider U. On the distribution of the adaptive LASSO estimator. Journal of Statistical Planning and Inference. 2009;139:2775–2790. [Google Scholar]

[R21] Pötscher BM, Schneider U. Confidence Sets Based on Penalized Maximum Likelihood Estimators in Gaussian Regression. Electronic Journal of Statistics. 2010;4:334–360. [Google Scholar]

[R22] Prado J, Wrin T, Beauchaine J, Ruiz L, Petropoulos C, Frost S, Clotet B, D’Aquila R, Martinez-Picado J. Amprenavir-resistant HIV-1 exhibits lopinavir cross-resistance and reduced replication capacity. Aids. 2002;16:1009. doi: 10.1097/00002030-200205030-00007. [DOI] [PubMed] [Google Scholar]

[R23] Rhee S, Gonzales M, Kantor R, Betts B, Ravela J, Shafer R. HIV reverse transcriptase and sequence database. Nucleic Acids Res. 2003;31:298–303. doi: 10.1093/nar/gkg100. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] Schumi J, DeGruttola V. Resampling-based analyses of the effects of combinations of HIV genetic mutations on drug susceptibility. Statistics in Medicine. 2008;27 doi: 10.1002/sim.3181. [DOI] [PubMed] [Google Scholar]

[R25] Scott D. Multivariate density estimation: theory, practice, and visualization. Wiley-Interscience; 1992. [Google Scholar]

[R26] Silverman B. Density estimation for statistics and data analysis. Chapman & Hall/CRC; 1986. [Google Scholar]

[R27] Spiegelman D, Colditz G, Hunter D, Hertzmark E. Validation of the Gail et al. model for predicting individual breast cancer risk. Journal of the National Cancer Institute. 1994;86:600. doi: 10.1093/jnci/86.8.600. [DOI] [PubMed] [Google Scholar]

[R28] Thompson I, Ankerst D, Chi C, Goodman P, Tangen C, Lucia M, Feng Z, Parnes H, Coltman C., Jr Assessing prostate cancer risk: results from the Prostate Cancer Prevention Trial. Journal of the National Cancer Institute. 2006;98:529. doi: 10.1093/jnci/djj131. [DOI] [PubMed] [Google Scholar]

[R29] Tibshirani R. Regression Shrinkage and Selection Via the Lasso. Journal of the Royal Statistical Society, Series B. 1996;58:267–288. [Google Scholar]

[R30] Van Marck H, Dierynck I, Kraus G, Hallenberger S, Pattery T, Muyldermans G, Geeraert L, Borozdina L, Bonesteel R, Aston C, et al. The Impact of Individual Human Immunodeficiency Virus Type 1 Protease Mutations on Drug Susceptibility Is Highly Influenced by Complex Interactions with the Background Protease Sequence. Journal of Virology. 2009;83:9512. doi: 10.1128/JVI.00291-09. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] Wang H, Leng C. Unified LASSO estimation via least squares approximation. Journal of the American Statistical Association. 2007;102:1039–1048. [Google Scholar]

[R32] Wu M. PhD thesis. Harvard School of Public Health, Department of Biostatistics; Boston, MA: 2009. A parametric permutation test for regression coefficients in LASSO regularized regression. [Google Scholar]

[R33] Zhang H, Ahn J, Lin X, Park C. Gene Selection using Support Vector Machines with Non-convex Penalty. Bioinformatics. 2006;22:88–95. doi: 10.1093/bioinformatics/bti736. [DOI] [PubMed] [Google Scholar]

[R34] Zou H. The Adaptive Lasso and Its Oracle Properties. Journal of the American Statistical Association. 2006;101:1418–1429. [Google Scholar]

[R35] Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society. 2005;B67:301–320. [Google Scholar]

[R36] Zou H, Hastie T, Tibshirani R. On the “degrees of freedom” of the lasso. Annals of Statistics. 2007;35:2173–2192. [Google Scholar]

[R37] Zou H, Li R. One-step sparse estimates in nonconcave penalized likelihood models. The Annals of Statistics. 2008;36:1509–1533. doi: 10.1214/009053607000000802. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] Zou H, Zhang H. On the adaptive elastic-net with a diverging number of parameters. The Annals of Statistics. 2009;37:1733–1751. doi: 10.1214/08-AOS625. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A Perturbation Method for Inference on Regularized Regression Estimates

Jessica Minnier

Lu Tian

Tianxi Cai

Abstract

1. INTRODUCTION