Variable screening via quantile partial correlation

Shujie Ma; Runze Li; Chih-Ling Tsai

doi:10.1080/01621459.2016.1156545

. Author manuscript; available in PMC: 2018 Mar 30.

Published in final edited form as: J Am Stat Assoc. 2017 Mar 30;112(518):650–663. doi: 10.1080/01621459.2016.1156545

Variable screening via quantile partial correlation

Shujie Ma ¹, Runze Li ^2,^✉, Chih-Ling Tsai ³

PMCID: PMC5603281 NIHMSID: NIHMS781375 PMID: 28943683

Abstract

In quantile linear regression with ultra-high dimensional data, we propose an algorithm for screening all candidate variables and subsequently selecting relevant predictors. Specifically, we first employ quantile partial correlation for screening, and then we apply the extended Bayesian information criterion (EBIC) for best subset selection. Our proposed method can successfully select predictors when the variables are highly correlated, and it can also identify variables that make a contribution to the conditional quantiles but are marginally uncorrelated or weakly correlated with the response. Theoretical results show that the proposed algorithm can yield the sure screening set. By controlling the false selection rate, model selection consistency can be achieved theoretically. In practice, we proposed using EBIC for best subset selection so that the resulting model is screening consistent. Simulation studies demonstrate that the proposed algorithm performs well, and an empirical example is presented.

Keywords: Quantile correlation, Quantile partial correlation, Screening, Variable selection

1 Introduction

Advances in modern technology have enabled people to collect massive data with a large number of variables, many of which may be irrelevant to the response variable. Examples can be found in gene expression microarray data, single nucleotide polymorphism (SNP) data, imaging data, high-frequency financial data, and others. Hence, extracting useful variables for the prediction of the response in the high-dimensional data has become a focal research area in the past two decades. Apparently, the traditional variable selection methods such as best subset selection and backward elimination become computationally infeasible when the number of predictors is large. As a result, a variety of penalization methods have been developed. These methods include, but are not limited to, Lasso and Adaptive Lasso (Tibshirani, 1996; Zou, 2006; Huang et al., 2008a), bridge regression (Huang et al., 2008b), SCAD (Fan and Li, 2001), elastic net (Zou and Hastie, 2005), and MCP (Zhang, 2010).

When the dimension is much larger than the sample size, penalized estimation can perform poorly or even become infeasible (Fan and Lv, 2010). Then the variable screening method becomes a natural way to consider in this context, which assumes that the relevant features lie in a low dimensional manifold. Thus, the ultrahigh-dimensional problem can be greatly simplified into a low-dimensional one. Recently, Fan and Lv (2008) introduced the marginal screening method (the sure independent screening, SIS) to select relevant variables based on the marginal correlation of each variable and the response. Its good numerical performance and novel theoretical properties have made SIS popular in ultrahigh dimensional analysis. As a result, SIS and its extensions have been applied to many important settings including generalized linear model (Fan and Song, 2010), multi-index semi-parametric models (Zhu et al., 2011), nonparametric regression (Fan et al., 2011; Liu et al., 2014), quantile regression (He et al., 2013; Wu and Yin, 2015) and so forth.

The marginal screening method employs the marginal correlation to measure the strength of association between predictors and the response. Hence, it can miss some relevant variables that are associated with the response conditionally but not marginally. Furthermore, the marginal correlation can be misleading when there exist non-negligible correlations among the predictors. As a result, an irrelevant variable can be selected prior to relevant variables. Moreover the issue of collinearity can yield spurious phenomena in the high-dimensional data as demonstrated by Fan and Lv (2008).

To handle the problem of high correlations between predictors, several methods have been developed in the literature. Bühlmann et al. (2009) proposed the PC-Simple algorithm, which uses partial correlation as a criterion to measure the association of each predictor with the response. Wang (2009) applied the forward selection method in the ultrahigh-dimensional setting and developed a forward regression (FR) algorithm to select the most relevant variable in each step sequentially by removing the confounding effects of the selected variables from the previous steps. It can be shown that Wang’s algorithm is also based on the partial correlation measure. Moreover, Cho and Fryzlewicz (2012) proposed a ‘tilted’ correlation to measure the contribution of each predictor to the response. Based on simulation studies, they found that their proposed tilted correlation screening (TCS2) algorithm performs well. The above studies demonstrate that the partial correlation plays an important role in the screening process.

In the area of quantile linear regression with low dimensional data, theoretical properties and practical applications have been well developed; see Koenker (2005). For high dimensional data, however, it is far from complete. Recently, Wang et al. (2012) and Lee et al. (2014), respectively, extended the penalized approach and Bayesian selection method from the classical mean regression model to quantile regression model. Their generalizations motivate us to propose a screening method for high-dimensional quantile regression model. Specifically, we adopt Li et al. (2015)’s quantile partial correlation (QPCOR) as a criterion to measure the association of each predictor with the response at each quantile, and then introduce a new screening procedure by using the sample QPCOR. Our goal is to identify a sparse set of ultra-high dimensional variables X = (X₁, …, X_p)^T that are relevant for modeling the conditional quantile of the response Y.

To employ QPCOR, we transform each predictor X_j by projecting it onto a set of variables, denoted by 𝒮_j, which is either the union of its related variables and the previously selected variables or only the previously selected variables. We then introduce an adaptive approach to choose the subset of related variables by adopting a sequential testing method based on the partial correlations of related variables. It is worth noting that the size of 𝒮_j cannot be too large since it would distort the association between X_j and Y. To this end, we suggest a hard threshold for determining which variables are related to X_j, and then obtain the upper bound of the maximal cardinality of the subsets 𝒮_j. In addition, we derive a uniform bound of the difference between the sample QPCOR and the population QPCOR, and subsequently establish the sure screening property of the proposed procedure as needed in screening methods (Fan and Lv, 2008; Fan et al., 2011; He et al., 2013). Moreover, we generalize Wang’s (2009) FR algorithm and Cho and Fryzlewicz’s (2012) TCS2 algorithm to the quantile regression model. After the screening procedure, we apply the extended Bayesian information criterion (EBIC) (Chen and Chen, 2008; Wang and Leng, 2009; Lee et al., 2014) for best subset selection. Consequently, our proposed approach not only selects relevant variables when the variables are highly correlated, but also identifies the variables that are marginally uncorrelated or weakly correlated with the response.

The paper is organized as follows. Section 2 introduces quantile partial correlation. Section 3 provides the theoretical properties of the quantile screening procedure including the sure screening property. Section 4 presents three algorithms, which consist of our proposed algorithm and the quantile version of the forward regression and tilted correlation screening algorithms. We also introduce the extended BIC criterion for best subset selection. Section 5 conducts simulation studies, while Section 6 illustrates the usefulness of the proposed method through the analysis of gene expression data. A discussion is given in Section 7. All technical proofs are relegated to the Appendix and Supplemental Materials, and additional simulation results are presented in the Supplemental Materials.

2 Quantile partial correlation (QPCOR)

Before we present the quantile partial correlation (QPCOR), we review the quantile correlation (QCOR) and its connection to regression coefficients in the linear quantile regression model.

Quantile correlation

For mean regression models with ultra-high dimensional covariates, Fan and Lv (2008) proposed the SIS procedure to select variables according to the magnitudes of their marginal Pearson correlations associated with the response. Analogously, in the quantile regression context, we introduce Li et al’s (2015) quantile correlation of Y and X_j:

q {cor}_{τ} {Y, X_{j}} = \frac{q {cov}_{τ} {Y, X_{j}}}{\sqrt{var {ψ_{τ} (Y - Q_{τ} (Y))} var (X_{j})}} = \frac{E {ψ_{τ} (Y - Q_{τ} (Y)) (X_{j} - E X_{j})}}{\sqrt{(τ - τ^{2}) σ_{X_{j}}^{2}}},

(1)

for 1 ≤ j ≤ p, where Q_τ(Y) is the τ^th unconditional quantile of Y such that P(Y < Q_τ(Y)) = τ and ψ_τ(w) = τ − I(w < 0). As a result, −1 ≤qcor_τ{Y, X_j} ≤ 1. As shown by Li et al. (2015), there is a nice relationship between the quantile correlation (1) and the slope of the τ-th quantile linear regression line with Y and X_j being the response and predictor, respectively. Consider the following minimizers:

(b_{0 τ}^{0}, b_{j τ}^{0}) = arg min_{b_{0 τ}, b_{j τ}} E [ρ_{τ} (Y - b_{0 τ} - b_{j τ} X_{j})],

(2)

where ρ_τ(w) = wτ − wI(w < 0) is the quantile loss function (see Koenker, 2005). Then $q {cor}_{τ} {Y, X_{j}} = ϱ (b_{j τ}^{0})$ , where ϱ(b_jτ) is a continuous and increasing function, and ϱ(b_jτ) = 0 if and only if b_jτ = 0. Accordingly, we can adopt the SIS procedure of Fan and Lv (2008) to rank the significance of predictors on the quantile of Y via the marginal quantile correlation qcor_τ{Y, X_j}. However, this marginal approach ignores possible effects from other variables and may yield misleading results when the predictors are correlated. To illustrate this phenomenon, we first introduce the quantile multiple regression model and its associated estimators given below.

Let Y and X = (X₁, …, X_p)^T be the response and predictors, respectively. Consider a linear quantile model:

Y = β_{0 τ}^{0} + β_{1 τ}^{0} X_{1} + \dots + β_{p τ}^{0} X_{p} + ε,

(3)

where the error term satisfies P(ε < 0|X) =τ. Then, the τ^th conditional quantile of Y given X is $Q_{τ} (Y | X) = β_{0 τ}^{0} + β_{1 τ}^{0} X_{1} + \dots + β_{p τ}^{0} X_{p}$ . Without loss of generality, we assume that E(X_j) = 0 and var(X_j) = 1 for all j = 1, …, p. Furthermore, denote f_ε(u|x) and f_Y(y|x) as the conditional density of ε and Y given X = x, respectively.

Assuming that the conditional density f_Y (y|x) exists, we can follow the same procedure as given in Theorem 2 of Angrist et al. (2006) and obtain the coefficient $b_{j τ}^{0}$ that is the minimizer of the weighted least squares:

b_{j τ}^{0} = arg min_{b_{j τ} \in R} E [{\tilde{w}}_{τ} (X) \cdot {(Q_{τ} (Y | X) - b_{j τ} X_{j})}^{2}],

where ${\tilde{w}}_{τ} (X) = \frac{1}{2} \int_{0}^{1} f_{ε} (u \cdot Δ_{τ} (X_{j}, b_{j τ}^{0}) | X) d u$ , and $Δ_{τ} (X_{j}, b_{j τ}^{0}) = b_{0 τ}^{0} + b_{j τ}^{0} X_{j} - Q_{τ} (Y | X)$ for j = 1, ⋯, p. As a result,

b_{j τ}^{0} = {E ({\tilde{w}}_{τ} (X) X_{j}^{2})}^{- 1} E ({\tilde{w}}_{τ} (X) X_{j} Q_{τ} (Y | X)) = β_{j τ}^{0} + d_{j τ},

where

d_{j τ} = \sum_{k \neq j} β_{k τ}^{0} {E ({\tilde{w}}_{τ} (X) X_{j}^{2})}^{- 1} E ({\tilde{w}}_{τ} (X) X_{j} X_{k}) .

The term d_jτ can be viewed as the “bias” of the quantile estimator. It can be considerably large when the components E(w̃_τ(X)X_jX_k) are non-negligible. Thus, QCOR may lead to inaccurate screening results. This motivates us to propose a screening procedure, based on the quantile partial correlation, to reduce the confounding effects from other predictors that are highly related to X_j.

Quantile partial correlation

To reduce the confounding effects, consider X_−j = (1, {X_k, k ≠ j}^T)^T. Then, let $θ_{j}^{0} = {arg min}_{θ_{j} \in R^{p}} E [{(X_{j} - X_{- j}^{T} θ_{j})}^{2}]$ and $α_{j}^{0} = {arg min}_{α_{j} \in R^{p}} E [ρ_{τ} (Y - X_{- j}^{T} α_{j})]$ so that $E [ψ_{τ} (Y - X_{- j}^{T} α_{j}^{0}) X_{- j}] = 0$ . Adopting Li et al.’s (2015) approach, we define the quantile partial correlation (QPCOR) as follows:

q p {cor}_{τ} {Y, X_{j} | X_{- j}} = \frac{cov {ψ_{τ} (Y - X_{- j}^{T} α_{j}^{0}), X_{j} - X_{- j}^{T} θ_{j}^{0}}}{\sqrt{var {ψ_{τ} (Y - X_{- j}^{T} α_{j}^{0})} var (X_{j} - X_{- j}^{T} θ_{j}^{0})}} = \frac{E {ψ_{τ} (Y - X_{- j}^{T} α_{j}^{0}) (X_{j} - X_{- j}^{T} θ_{j}^{0})}}{\sqrt{(τ - τ^{2}) σ_{j}^{2}}},

(4)

where $σ_{j}^{2} = var (X_{j} - X_{- j}^{T} θ_{j}^{0})$ . Based on the result after equation (2.2) on page 247 of Li et al. (2015), we have that $q p {cor}_{τ} {Y, X_{j} | X_{- j}} = ϱ (β_{j τ}^{*})$ , where $ϱ (β_{j τ}^{*})$ is a continuous and increasing function of $β_{j τ}^{*}$ , and $ϱ (β_{j τ}^{*}) = 0$ if and only if $β_{j τ}^{*} = 0$ , where

(β_{0 τ}^{*}, β_{j τ}^{*}) = arg min_{β_{0 τ}, β_{j τ}} E [ρ_{τ} (Y - X_{- j}^{T} α_{j}^{0} - β_{0 τ} - X_{j} β_{j τ})] .

(5)

In the general situation, the coefficients $β_{j τ}^{*}$ and $β_{j τ}^{0}$ in models (5) and (3) are not equal to each other. However, Lemma A.1 in the Appendix shows that $β_{j τ}^{0} = 0$ if and only if $β_{j τ}^{*} = 0$ . Hence, $β_{j τ}^{0} = 0$ if and only if $ϱ (β_{j τ}^{*}) = 0$ . Therefore, we use the QPCOR to select relevant variables in our screening procedure.

In general, the estimates of $θ_{j}^{0}$ and $α_{j}^{0}$ cannot be obtained when the dimension of X_−j is high. To this end, we remove the confounding effects from X_j that are induced by a subset of {k : k ≠ j}, and we denote the resulting set by 𝒮_j and name it the conditional set. Then, we propose a screening method via the sequential procedure. In each sequential step, let 𝒮_j contain either the union of the previously selected variables and the variables related to X_j or only the previously selected variables, which will be discussed in Section 4. For any arbitrary subset 𝒮 ⊂ {1, …, p}, we denote X_𝒮 the subvector of X associated with 𝒮. Accordingly, X_{𝒮_j} = (X₀, {X_k, k ∈ 𝒮_j}^T)^T with X₀ = 1. For the sake of screening, we modify QPCOR given in (4) as

q p {cor}_{τ} {Y, X_{j} | X_{𝒮_{j}}} = \frac{E {ψ_{τ} (Y - X_{𝒮_{j}}^{T} π_{j}^{0}) (X_{j} - X_{𝒮_{j}}^{T} ϑ_{j}^{0})}}{\sqrt{(τ - τ^{2}) σ_{j}^{2}}},

(6)

where $ϑ_{j}^{0} = {arg min}_{ϑ_{j} \in R^{| 𝒮_{j} |}} E {(X_{j} - X_{𝒮_{j}}^{T} ϑ_{j})}^{2}, π_{j}^{0} = {arg min}_{π_{j} \in R^{| 𝒮_{j} |}} E [ρ_{τ} (Y - X_{𝒮_{j}}^{T} π_{j})], σ_{j}^{2} = var (X_{j} - X_{𝒮_{j}}^{T} ϑ_{j}^{0})$ , and |𝒮| denotes the cardinality of a set 𝒮.

In practice, QCOR and QPCOR are unknown, and we employ the sample estimates of QCOR and QPCOR to study the screening process. These sample estimates are defined as follows. Let ${{(Y_{i}, X_{i}^{T})}^{T}, i = 1, \dots, n}$ be a data set of n random samples from the distribution of (Y, X^T)^T, where X_i = (X_i1, …, X_ip)^T. In this paper, we focus on the scenario in which p ≫ n and we sometimes denote p by p_n since it can be a function of n. In addition, let X_i,𝒮 be the subvector of X_i for any subset 𝒮. The sample estimate of QCOR in (1) is defined as

{\hat{q cor}}_{τ} {Y, X_{j}} = \frac{n^{- 1} \sum_{i = 1}^{n} {ψ_{τ} (Y_{i} - {\hat{Q}}_{τ} (Y)) (X_{i j} - {\bar{X}}_{j})}}{\sqrt{(τ - τ^{2}) {\hat{σ}}_{X_{j}}^{2}}},

(7)

where Q̂_τ(Y) = inf{y : F_n(y) ≥ τ} is the sample τth quantile of Y₁, …, Y_n. Additionally, $F_{n} (y) = n^{- 1} \sum_{i = 1}^{n} I (Y_{i} \leq y)$ is the empirical distribution function, ${\bar{X}}_{j} = n^{- 1} \sum_{i = 1}^{n} X_{i j}$ , and ${\hat{σ}}_{X_{j}}^{2} = n^{- 1} \sum_{i = 1}^{n} {(X_{i j} - {\bar{X}}_{j})}^{2}$ . The sample estimate of QPCOR in (6) is given as

{\hat{q p cor}}_{τ} {Y, X_{j} | X_{𝒮_{j}}} = \frac{n^{- 1} \sum_{i = 1}^{n} {ψ_{τ} (Y_{i} - X_{i, 𝒮_{j}}^{T} {\hat{π}}_{j}) (X_{i j} - X_{i, 𝒮_{j}}^{T} {\hat{ϑ}}_{j})}}{\sqrt{(τ - τ^{2}) {\hat{σ}}_{j}^{2}}},

(8)

where ${\hat{ϑ}}_{j} = {arg min}_{ϑ_{j} \in R^{| 𝒮_{j} |}} n^{- 1} \sum_{i = 1}^{n} {(X_{i j} - X_{i, 𝒮_{j}}^{T} ϑ_{j})}^{2}, {\hat{π}}_{j} = {arg min}_{π_{j} \in R^{| 𝒮_{j} |}} n^{- 1} \sum_{i = 1}^{n} [ρ_{τ} (Y_{i} - X_{i, 𝒮_{j}}^{T} π_{j})]$ , and ${\hat{σ}}_{j}^{2} = n^{- 1} \sum_{i = 1}^{n} {(X_{i j} - X_{i, 𝒮_{j}}^{T} {\hat{ϑ}}_{j})}^{2}$ . We next study the asymptotic property of the sample estimate of QPCOR and the screening property of the selected variables via this estimate.

3 Theoretical properties

To use the sample estimate of QPCOR, ${\hat{q p cor}}_{r} {Y, X_{j} | X_{𝒮_{j}}}$ , given in (8) as a criterion to identify important variables sequentially, we need to establish the uniform convergence of the sample QPCOR to its population counterpart qpcor_τ{Y, X_j |X_{𝒮_j}}. Let r_n = max_1≤j≤p |𝒮_j| be the maximal cardinality of the subsets 𝒮_j (j = 1, ⋯, p) given in the screening procedure, and allow r_n to increase with the sample size n. Note that 𝒮_j represents every possible conditional set. In addition, let λ_max(A) and λ_min(A) be the largest and smallest eigenvalues of the symmetric matrix A, respectively, and let ‖a‖ denote the L₂ norm for any vector a = (a₁, …, a_p)^T. Then, we make the following assumptions to facilitate the technical proofs, while these assumptions may not be the weakest ones.

(C1)
The conditional density f_Y|X=x (y) of y given X = x satisfies the Lipschitz condition of order 1 and f_Y|X=x (y) > 0 for any y in a neighborhood of $X_{𝒮_{j}}^{T} π_{j}^{0} = x_{𝒮_{j}}^{T} π_{j}^{0}$ , for 1 ≤ j ≤ p.
(C2)
The predictors satisfy: (i) sup_i,j |X_ij | ≤ M₁, ${sup}_{i, j} | X_{i, 𝒮_{j}}^{T} ϑ_{j}^{0} | \leq M_{2}, {sup}_{i, j} | X_{i, 𝒮_{j}}^{T} π_{j}^{0} | \leq M_{3}$ , and ${sup}_{j} ‖ n^{- 1} \sum_{i = 1}^{n} X_{i, 𝒮_{j}} X_{i j} ‖ \leq M_{4}$ for some positive finite constants M₁, M₂, M₃ and M₄;

(ii) For 1 ≤ j ≤ p, there exist positive finite constants m and M such that
$m \leq λ_{min} (E (X_{𝒮_{j}} X_{𝒮_{j}}^{T})) \leq λ_{max} (E (X_{𝒮_{j}} X_{𝒮_{j}}^{T})) \leq M .$

Condition (C1) is a standard condition in the literature on quantile regression. Condition (i) in (C2) assumes that the absolute values of the predictors are bounded, which is commonly assumed in high-dimensional analysis, see Wang et al. (2012) and Lee et al. (2014). This assumption can be relaxed to the moment condition given in Li et al. (2012) and Zhu et al. (2011) that there exists a positive constant t₀ such that max_1≤j≤p E{exp(tX_j)} < ∞ for 0 < t ≤ t₀. In this case, our theoretical results still hold with some modification to the proofs. To mitigate notational complexity and facilitate mathematical derivations, we assume that covariates are bounded. We also assume that, for each subset 𝒮_j used as a conditional set for X_j, the L₂ norm of the correlation vector is bounded. Condition (ii) in (C2) is the sparse Riesz condition (Chen and Chen, 2008; Lee et al., 2014), which is used for dealing with a large number of regressors. We next demonstrate the uniform convergence of ${\hat{q p cor}}_{τ} {Y, X_{j} | X_{𝒮_{j}}}$ to its population counterpart, qpcor_τ{Y, X_j|X_{𝒮_j}}.

Theorem 1

Under Conditions (C1) and (C2), for any C₁ > 0, there exist some positive constants C₂, C₃, $C_{2}^{*}$ and $C_{3}^{*}$ such that, for 0 < κ < 1/2 and r_n = Cn^ω for some 0 ≤ ω < min((1 − 2κ), 2κ) and a positive constant C, we have

P (sup_{1 \leq j \leq p_{n}} | {\hat{q p cor}}_{τ} {Y, X_{j} | X_{𝒮_{j}}} - q p {cor}_{τ} {Y, X_{j} | X_{𝒮_{j}}} | \geq C_{1} r_{n}^{1 / 2} n^{- κ}) \leq p_{n} {C_{2}^{*} r_{n}^{2} exp (- C_{2} r_{n}^{- 1} n^{1 - 2 κ}) + C_{3}^{*} r_{n}^{2} exp (- C_{3} n r_{n}^{- 2})},

(9)

when n is sufficiently large.

Remark 1

To handle ultra-high dimensional data, Theorem 1 indicates that we need to have $log (p_{n}) = o (r_{n}^{- 1} n^{1 - 2 κ} + n r_{n}^{- 2})$ . Accordingly, p_n grows with the sample size n at an exponential rate.

To study the screening property via the quantile partial correlation ${\hat{q p cor}}_{τ}$ , we consider $ℳ_{*} = {j : β_{j τ}^{0} > 0 for 1 \leq j \leq p}$ , which is the set of indices associated with the nonzero coefficients in the true sparse model (3) with nonsparsity size s_n = |ℳ_*|. Furthermore, we assume that the population QPCORs with nonzero coefficients in ℳ_* satisfy the following condition:

(C3)
${min}_{j \in ℳ_{*}} | q p {cor}_{τ} {Y, X_{j} | X_{𝒮_{j}}} | \geq C_{0} r_{n}^{1 / 2} n^{- κ}$ for some 0 < κ < 1/2 and C₀ > 0.

In our proposed algorithm, we select variables sequentially by finding the variable with the maximal sample QPCOR and then adding it to the selected active set in each step. Let the resulting active set via the screening procedure be ℳ̂_{ν_n} such that the sample QPCORs of the selected variables in ℳ̂_{ν_n} are greater than a threshold. That is,

{\hat{ℳ}}_{υ_{n}} = {j : {\hat{q p cor}}_{τ} {Y, X_{j} | X_{𝒮_{j}}} \geq υ_{n} for 1 \leq j \leq p},

where ν_n is a threshold value. The theorem below presents the sure screening property.

Theorem 2

Under the conditions in Theorem 1 and Condition (C3), taking C₂, C₃, $C_{2}^{*}, C_{3}^{*}$ , and κ as given in Theorem 1 and letting $υ_{n} = C_{4} r_{n}^{1 / 2} n^{- κ}$ with C₄ ≤ C₀/2, we have that

P (ℳ_{*} \subset {\hat{ℳ}}_{υ_{n}}) \geq 1 - s_{n} {C_{2}^{*} r_{n}^{2} exp (- C_{2} r_{n}^{- 1} n^{1 - 2 κ}) + C_{3}^{*} r_{n}^{2} exp (- C_{3} n r_{n}^{- 2})},

when n is sufficiently large.

It is worth noting that Theorem 2 indicates that the probability bound for the sure screening property depends on the number of nonzero coefficients s_n, but not on the number of covariates p_n. It also depends on r_n.

In addition to ensuring that relevant variables are selected, controlling the false selection rate is also critical. Ideally, we could assume that ${sup}_{j \notin ℳ_{*}} | q p {cor}_{τ} {Y, X_{j} | X_{𝒮_{j}}} | = o (r_{n}^{1 / 2} n^{- κ})$ and then employ Theorem 1, to find that, with probability tending to one, ${sup}_{j \notin ℳ_{*}} | {\hat{q p cor}}_{τ} {Y, X_{j} | X_{𝒮_{j}}} | \leq C_{1} r_{n}^{1 / 2} n^{- κ}$ for any constant C₁ > 0. Accordingly, by the choice of ν_n given in Theorem 2, we obtain model selection consistency,

P (ℳ_{*} = {\hat{ℳ}}_{υ_{n}}) \to 1 .

However, this ideal assumption may not be met in general. Hence, we consider a more practical assumption, $\sum_{j = 1}^{p} | q p {cor}_{τ} {Y, X_{j} | X_{𝒮_{j}}} | = O (n^{ς})$ for some ς > 0. Under this assumption, for any c > 0, the cardinality of ${j : | q p {cor}_{τ} {Y, X_{j} | X_{𝒮_{j}}} | \geq c r_{n}^{1 / 2} n^{- κ}}$ is no greater than $O (r_{n}^{- 1 / 2} n^{ς + κ}) = O (n^{ς + κ - ω / 2})$ and 0 ≤ ω < min((1−2κ), 2κ). Furthermore, on the set

Ω_{n} = {sup_{1 \leq j \leq p_{n}} | {\hat{q p cor}}_{τ} {Y, X_{j} | X_{𝒮_{j}}} - q p {cor}_{τ} {Y, X_{j} | X_{𝒮_{j}}} | \leq c r_{n}^{1 / 2} n^{- κ}},

we have

| {j : | {\hat{q p cor}}_{τ} {Y, X_{j} | X_{𝒮_{j}}} | \geq 2 c r_{n}^{1 / 2} n^{- κ}} | \leq | {j : | q p {cor}_{τ} {Y, X_{j} | X_{𝒮_{j}}} | \geq c r_{n}^{1 / 2} n^{- κ}} | \leq C^{*} n^{ς + κ - ω / 2}

(10)

for some constant 0 < C* < ∞. As a result, we obtain the following property which is used to control the size of the selected model.

Proposition 1

Under the conditions in Theorem 1 and Condition (C3), taking C₂, C₃, $C_{2}^{*}, C_{3}^{*}$ , and κ as given in Theorem 1, letting $υ_{n} = C_{4} r_{n}^{1 / 2} n^{- κ}$ with C₄ ≤ C₀/2, and assuming $\sum_{j = 1}^{p} | q p {cor}_{τ} {Y, X_{j} | X_{𝒮_{j}}} | = O (n^{ς})$ for some ς > 0, we have that

P (| {\hat{ℳ}}_{υ_{n}} | \leq C^{*} n^{ς + κ - ω / 2}) \geq 1 - p_{n} {C_{2}^{*} r_{n}^{2} exp (- C_{2} r_{n}^{- 1} n^{1 - 2 κ}) + C_{3}^{*} r_{n}^{2} exp (- C_{3} n r_{n}^{- 2})},

for some constant 0 < C* < ∞, when n is sufficiently large.

Remark 2

Proposition 1 indicates that the proposed screening procedure via the quantile partial correlation can reduce the ultra-high dimensionality of the original model to the selected model size with a polynomial order of n. This proposition and Theorem 2 imply that if we choose the first d variables sequentially based on the sample QPCOR with d = [n^{𝜘+ς+κ−ω/2}/ log(n)] for some 𝜘 > 0, then all relevant variables will be selected with high probability. Note that [b] stands for the integer part of b. By assuming ς < 1 + ω/2 − κ and letting 𝜘 = 1 + ω/2 − κ − ς, we also have d = [n/ log(n)], which is used in our numerical analysis and is commonly accepted in screening procedures (Fan and Lv, 2008; He et al., 2013).

4 QPCOR and selection

Applying quantile partial correlation, we first introduce the three screening algorithms. Based on the set of candidate models obtained via the screening procedure, we subsequently use the Bayesian information criterion to select the best model.

4.1 Screening algorithms

In this subsection, we employ QPCOR to propose a quantile screening procedure (QPCS) for selecting variables. For the sake of comparison, we also generalize Cho and Fryzlewicz’s (2012) TCS2 algorithm and Wang’s (2009) FR algorithm from classical mean regression model to quantile regression model. We name them QTCS and QFR, respectively. In developing QPCS, we need to remove the confounding effect from the target variable that is induced by its correlated variables in each step. To this end, we consider a sequential test to identify a confounding subset for each X_j (j = 1, ⋯, p). Let ρ_jk be the sample correlation coefficient of X_j and X_k. Then, define

𝒮_{j}^{ν} (m_{j}) = {k \neq j and k \in 𝒮_{j} : | ρ_{j k} | is among the first m_{j} largest correlations of all possible correlations},

and name it the confounding set. A careful choice of m_j is important in the high-dimensional setting. For example, if m_j is too large, then any vector in Rⁿ may be well approximated by some X_k with k ∈ 𝒮_j. We next consider a sequential testing procedure based on the partial correlations along the path ${𝒮_{j}^{ν} (m_{j})}_{m_{j} \geq 1}$ to select m_j. This allows us to find the smallest subset so that all covariates not in this subset will have a zero partial correlation with X_j. Let 𝕏 = (X₁, …, X_n)^T be the design matrix and denote $𝒬_{𝒮} = I - 𝕏_{𝒮} {(𝕏_{𝒮}^{T} 𝕏_{𝒮})}^{- 1} 𝕏_{𝒮}^{T} \in R^{n \times n}$ , where 𝕏_𝒮 is any submatrix of 𝕏. For m_j ≥ 1, define the partial correlation as $ϱ_{j k} (𝒮_{j}^{ν} (m_{j})) = c_{j k}^{- 1} 𝕏_{j}^{T} 𝒬_{𝒮_{j}^{ν} (m_{j})} 𝕏_{k} / n$ , where $c_{j k} = ‖ 𝒬_{𝒮_{j}^{ν} (m_{j})} 𝕏_{k} ‖ ‖ 𝒬_{𝒮_{j}^{ν} (m_{j})} 𝕏_{j} ‖ / n$ . As for m_j = 0, $𝒮_{j}^{ν} (m_{j})$ is an empty set and $ϱ_{j k} (𝒮_{j}^{ν} (m_{j})) = 𝕏_{j}^{T} 𝕏_{k} / n$ . Furthermore, let

F_{j k} (𝒮_{j}^{ν} (m_{j})) = 0.5 log [{1 + ϱ_{j k} (𝒮_{j}^{ν} (m_{j}))} / {1 - ϱ_{j k} (𝒮_{j}^{ν} (m_{j}))}],

which is the Fisher’s Z-transformation considered in Kalisch and Bühlmann (2007) for identifying nodes connected to the variable X_j conditional on a set of other nodes in a Gaussian graph. Then, sequentially select the smallest size, $m_{j}^{*}$ , that satisfies ${(n - | 𝒮_{j}^{ν} (m_{j}^{*}) | - 3)}^{1 / 2} {max}_{k \notin 𝒮_{j}^{ν, +} (m_{j}^{*})} | F_{j k} (𝒮_{j}^{ν} (m_{j}^{*})) | < z_{1 - α / 2}$ , where z_1−α/2 is the threshold of z values with a pre-specified significance level α and $𝒮_{j}^{ν, +} (m_{j}^{*}) = {j} \cup 𝒮_{j}^{ν} (m_{j}^{*})$ . The resulting $m_{j}^{*}$ can help us to determine the size of the selected confounding set, denoted by m̂_j. It is worth noting that based on our theoretical condition given in Theorem 1, we have r_n = o(n^1/2). Thus, m̂_j ≤ r_n = o(n^1/2). We then let m̂_j be bounded by c{n/ log(n)}^1/2 for some constant c > 0. Afterwards, let ${\hat{m}}_{j} = m_{j}^{*}$ if $m_{j}^{*} < c {n / log (n)}^{1 / 2}$ ; m̂_j = c{n/ log(n)}^1/2, otherwise. In practice, the constant c should not be too large so that the resulting size m̂_j is under control. In our numerical analysis, we choose c = 1. Denote the selected confounding set as $𝒮_{j}^{ν} = 𝒮_{j}^{ν} ({\hat{m}}_{j})$ . The above procedure allows us to find the confounding subset of the j-th variable, and we will use it in the screening algorithm given below.

Algorithm 1 (QPCS)

Start with an empty active set 𝒜⁽⁰⁾ = ∅.

Step 1. In the k^th step, for given 𝒜^(k−1), we update $𝒮_{j} = 𝒜^{(k - 1)} \cup 𝒮_{j}^{ν}$ . Then, employ the maximal sample QPCOR to find the variable index j* that satisfies $j^{*} = {arg max}_{j \notin 𝒜^{(k - 1)}} | {\hat{q p cor}}_{τ} {Y, X_{j} | X_{𝒮_{j}}} |$ . Update 𝒜^(k) = 𝒜^(k−1) ∪ {j*}.

Step 2. Repeat step 1 until the cardinality of active set |𝒜^(d*)| reaches a prespecified d*.

Step 3. Starting from the k = (d* + 1)^th step, we set $𝒮_{j} = 𝒜^{(d^{*})} \cup 𝒮_{j}^{ν}$ . In the k^th step, find $j^{*} = {arg max}_{j \notin 𝒜^{(k - 1)}} | {\hat{q p cor}}_{τ} {Y, X_{j} | X_{𝒮_{j}}} |$ . Update 𝒜^(k) = 𝒜^(k−1) ∪ {j*}.

Step 4. Repeat step 3 until the cardinality of active set |𝒜^(d)| reaches a prespecified value d < n.

In the above algorithm, the conditioning set 𝒮_j contains the selected variables up to step d* and a subset of variables with non-negligible correlations identified by the sequential testing procedure.

In linear regression modeling, Cho and Fryzlewicz’s (2012) proposed the TCS2 algorithm and then demonstrated that it usually performs well in comparison with LASSO, SCAD, MCP, FR, and iterative SIS (ISIS, see Fan and Lv, 2008). This inspires us to extend their TCS2 to quantile regression, and we name it QTCS.

Algorithm 2 (QTCS)

Start with an empty active set 𝒜⁽⁰⁾ = ∅.

Step 1. In the k^th step, for given 𝒜^(k−1), let 𝒮_j = 𝒜^(k−1). Then, find the variable index that has the maximal sample QPCOR such that $j' = {arg max}_{j \notin 𝒜^{(k - 1)}} | {\hat{q p cor}}_{τ} {Y, X_{j} | X_{𝒮_{j}}} |$ . If $𝒮_{j'}^{ν} / 𝒜^{(k - 1)} = \emptyset$ , let j* = j′ and go to step 3.

Step 2. If $𝒮_{j'}^{ν} / 𝒜^{(k - 1)} \neq \emptyset$ , then screen the sample QPCOR for all X_j in which $j \in 𝒮_{j'}^{ν} \cup {j'}$ . Let $𝒮_{j} = 𝒜^{(k - 1)} \cup 𝒮_{j}^{ν}$ , and find $j^{*} = {arg max}_{j \in 𝒮_{j'}^{ν} \cup {j'}} | \hat{q p {cor}_{τ}} {Y, X_{j} | X_{𝒮_{j}}} |$ .

Step 3. Update 𝒜^(k) = 𝒜^(k−1) ∪ {j*}.

Step 4. Repeat steps 1–3 until the cardinality of active set, |𝒜^(d*)|, reaches a prespecified d* = [C*{n/ log(n)}^1/2] for some constant C* > 0.

Step 5. Starting from the k = (d* + 1)^th step, repeat steps 1–3 by letting 𝒮_j = 𝒜^(d*) and $𝒮_{j} = 𝒜^{(d^{*})} \cup 𝒮_{j}^{ν}$ in steps 1 and 2, respectively. Repeat the procedure until the cardinality of active set, |𝒜^(d)|, reaches a prespecified value d < n.

Based on extensive simulation studies in linear model settings, Wang (2009) indicated that FR is a promising method for variable screening by comparing with LASSO, SCAD, SIS, and ISIS. This motivates us to generalize his forward selection screening algorithm to quantile regression, and name it QFR.

Algorithm 3 (QFR)

Start with an empty active set 𝒜⁽⁰⁾ = ∅.

Step 1. In the k^th step, for given 𝒜^(k−1), let 𝒮_j = 𝒜^(k−1) for k ≤ d* and 𝒮_j = 𝒜^(d*) for k > d*. Then, find the variable index that has the maximal sample QPCOR such that $j^{*} = {arg max}_{j \notin 𝒜^{(k - 1)}} | {\hat{q p cor}}_{τ} {Y, X_{j} | X_{𝒜^{(k - 1)}}} |$ . Update 𝒜^(k) = 𝒜^(k−1) ∪ {j*}.

Step 2. Repeat step 2 until the cardinality of active set, |𝒜^(k)|, reaches a prespecified value d < n.

For the sake of comparison, Table 1 summarizes the three algorithms. Without the thresholding step, by replacing the sample QPCOR with the titled correlation defined in Cho and Fryzlewicz (2012) and the residual sum of squares given in Wang (2009), respectively, the QTCS and QFR become TCS2 and FR. To utilize the above three algorithms, we need to specify d* and d. It is worth noting that the thresholding size d* needs to satisfy d* ≤ r_n = o(n^1/2) (see Algorithm 1). Hence, we consider d* = [C*{n/ log(n)}^1/2] for some C* > 0. In addition, choosing a value of C* needs to assure that d* does not exceed d due to the requirement of the screening algorithms. With Remark 3, we have set d = [n/ log(n)]. To meet the requirement d* < d with n = 200 and n = 120 used in our simulation and empirical examples, respectively, we choose C* = 2, which yields good performance in our numerical studies. However, this does not exclude other possible choices that also satisfy this requirement.

Table 1.

Comparison of the QPCS, QTCS and QFR algorithms.

QPCS

QTCS

QFR

Initialization

A⁽⁰⁾ = ∅

Action

one variable is selected

Conditional set S_j for k ≤ d*

A^{(k - 1)} \cup S_{j}^{ν}

A^(k−1) or

A^{(k - 1)} \cup S_{j}^{ν}

current set A^(k−1)

Conditional set S_j for k > d*

A^{(d *)} \cup S_{j}^{ν}

A^(d*) or A^(d*)

A^{(d *)} \cup S_{j}^{ν}

A^(d*)

Open in a new tab

Since $| 𝒮_{j}^{ν} | = m_{j} \leq [c {n / log (n)}^{1 / 2}]$ , we have $| 𝒮_{j} | \leq | 𝒮_{j}^{ν} | + d^{*} \leq \tilde{C} {n / log (n)}^{1 / 2}$ for some constant 0 < C̃ < ∞ and j = 1, ⋯, p. This provides an upper bound for the conditional set of each variable, which is not very large. Otherwise, any vector in Rⁿ can be well-approximated by the variables in this set. A similar consideration can be found in Cho and Fryzlewicz (2012) when they discussed their “conditioning set” 𝒞_j. Note that 𝒞_j is our confounding set $𝒮_{j}^{ν}$ , which is different from the conditional set 𝒮_j. In their Assumption 3 of page 598, they consider a bound for the size of the conditioning set 𝒞_j such that |𝒞_j| ≤ Cn^ξ with ξ ∈ [0, 2(γ − δ)) for δ ∈ [0, 1/2) and γ ∈ (δ, 1/2). Thus, |𝒞_j| = o(n^1−2δ) for δ ∈ [0, 1/2). It is of interest to note that, based on the condition given in Theorem 1, our conditional set |𝒮_j| ≤ r_n = o(n^1−2κ) for 0 < κ < 1/2.

Remark 3

From Table 1 and the above discussion, we find that both QPCS and QTCS can prevent more overfitting than QFR. This is because they consider the confounding effect of explanatory variables, while QFR does not take it into account. Although |𝒮_j| in QPCS and |𝒞_j| (i.e., $| 𝒮_{j}^{ν} |$ ) in QTCS have the same order, the confounding set $𝒮_{j}^{ν}$ in QPCS is always included in the conditional set 𝒮_j in every screening step, while 𝒞_j in QTCS may not be always included in 𝒮_j. Accordingly, QPCS is likely to reduce more overfitting than QTCS.

Based on the quantile partial correlation, we have introduced three screening algorithms. Although the quantile correlation is not the focus of our paper, one can employ it to propose the sure independent screening procedure for quantile regression. Specifically, we adopt the SIS method of Fan and Lv (2008) by replacing their Pearson correlation with the quantile correlation QCOR. The resulting selected model is

{\hat{ℳ}}_{Q COR} = {j : | {\hat{q cor}}_{τ} {Y, X_{j}} | is among the first d largest quantile correlations} .

He et al. (2013) also applied the SIS method for model selection in nonparametric quantile regression. In classical mean regression, Fan and Lv (2008) also introduced ISIS for selecting variables. As aforementioned, Wang (2009) and Cho and Fryzlewicz (2012) have demonstrated that their FR and TCS2 algorithms perform well in comparison with ISIS, respectively. Thus, we will focus our numerical comparison of the newly proposed procedure with the corresponding procedures proposed in Wang (2009) and Cho and Fryzlewicz (2012).

4.2 Best subset selection

In the previous subsection, the proposed QPCS algorithm generates a solution path 𝔸 = {𝒜^(k), 1 ≤ k ≤ d}, which includes the d selected models 𝒜⁽¹⁾ ⊂ 𝒜⁽²⁾ ⊂ ⋯ ⊂ 𝒜^(d). To find the best model among them, we consider an extended Bayesian information criterion (EBIC) for best subset selection. This criterion has been used for classical mean regression model in high dimensional data analysis (e.g., see Chen and Chen, 2008 and Wang, 2009). As for our quantile regression model, we follow the approach of Lee et al. (2014) and adopt the criterion:

E B I C (𝒜) = log (n^{- 1} \sum_{i = 1}^{n} ρ_{τ} (Y_{i} - X_{i, 𝒜}^{T} {\hat{β}}_{𝒜})) + | 𝒜 | \frac{log n}{2 n} C_{n},

(11)

where C_n is a positive constant that diverges along with the sample size n, and

{\hat{β}}_{𝒜} = arg min_{β_{𝒜} \in R^{| 𝒜 |}} n^{- 1} \sum_{i = 1}^{n} ρ_{τ} (Y_{i} - X_{i, 𝒜}^{T} β_{𝒜}) .

Let k̂ = arg min_1≤k≤dEBIC(𝒜^(k)), and denote the resulting best model ℳ̂_EBIC = 𝒜^(k̂). We make the following condition, which corresponds to Condition (A2)(ii) in Lee et al. (2014) and is needed for establishing the consistency of EBIC.

(C4)
There exist positive finite constants m′ and M′ such that
$m' \leq λ_{min} (E (X_{𝒮} X_{𝒮}^{T})) \leq λ_{max} (E (X_{𝒮} X_{𝒮}^{T})) \leq M'$
uniformly for any subset 𝒮 ⊂ {1, …, p} satisfying |𝒮| ≤ C*n^ς+κ−ω/2.

We next establish the screening consistency of the best model selected by EBIC.

Theorem 3

Under the conditions given in Proposition 1 and Condition (C4), and assuming that ς < 1/2 + ω/2 − κ, $C_{n}^{- 1} = o (1)$ , C_n log(n)n^{(ς+κ−ω/2)−1} = o(1), and E|ε| < ∞, we have P(ℳ_* ⊂ ℳ̂_EBIC) → 1 as n → ∞.

When C_n = 1, EBIC reduces to the classical BIC (Schwarz (1978)). Recently, Wang and Leng (2009) and Lee et al. (2014) used C_n = log(log(d)) and C_n = log(d), respectively, in their simulation studies when the number of predictors diverged along with the sample size. We use both approaches in our numerical studies. In addition, EBIC can be applied not only to QPCS, but also to QTCS and QFR for best subset selection. It is worth noting that our proposed QPCS algorithm yields a family of nested candidate models, 𝒜⁽¹⁾ ⊂ 𝒜⁽²⁾ ⊂ ⋯ ⊂ 𝒜^(d). Thus, we propose using the model selection criterion EBIC to select the best model. On the other hand, the screening procedure of SIS only produces a single final model, followed by the SCAD or other penalized methods for variable selections.

In addition to classical model selection, another popular variable selection approach is penalization. In other words, by using the selected model 𝒜^(d) with d < n from the screening procedure, we can obtain the estimated parameters by minimizing

n^{- 1} \sum_{i = 1}^{n} ρ_{τ} (Y_{i} - X_{i, 𝒜^{(d)}}^{T} β_{𝒜^{(d)}}) + \sum_{j = 1}^{| 𝒜^{(d)} |} p_{λ} (| β_{j, 𝒜^{(d)}} |),

with respect to parameters β_𝒜^(d) = (β_1,𝒜^(d), ⋯, β_{𝒜^(d),𝒜(d)})^T, where p_λ(·) is a penalty function with a regularization parameter λ. In our numerical studies, we consider the LASSO penalty for demonstration, but other penalties such as SCAD and MCP can also be applied. It is worth noting that the penalization method only employs the largest selected model 𝒜^(d), while EBIC uses the entire solution path obtained from the screening procedure.

5 Simulation studies

In this section, we conduct simulation studies to compare the finite sample performance of the four screening procedures QPCS, QTCS, QFR, and SIS. We further illustrate the extended BIC approach for best subset selection by using C_n = log(log(d)) and C_n = log(d), respectively, and denote the corresponding methods by EBIC1 and EBIC2. We also compare the EBIC method with the LASSO penalization method after the screening, where the tuning parameter for LASSO is selected by the BIC method. Moreover, we compare our method with the l₁ penalization of Belloni and Chernozhukov (2011) and the ISIS-SCAD method of Fan and Lv (2008) in the last example. The tuning parameter for the l₁ penalization method is selected by the approach described in Section 2.3 of Belloni and Chernozhukov (2011), and the tuning parameter for the ISIS-SCAD method is selected by the extended BIC as given in the R package ‘SIS’.

To demonstrate the performance of the QPCS, QTCS, QFR, and SIS screening procedures, we present two examples. We consider three quantiles τ = 0.2, 0.5, and 0.8, and all simulation results are based on 200 realizations with n = 200 and p = 1, 000. Moreover, seven measurements are used to assess the screening and selection performance: the rank of selected variables, see Liu et al. (2014), the number of true positive and false positive selections, see Liu et al. (2014), and the number of correct-fitting, over-fitting, and under-fitting selections, see Wang et al. (2007). We next describe these measures in detail.

R_j: the average rank of X_j;
M: the average minimum size of the selected model that contains all the relevant (i.e., true) predictors;
TP: the average number of true positives (i.e., the average number of relevant predictors being correctly selected);
FP: the average number of false positives (i.e., the average number of irrelevant predictors being incorrectly selected);
C: the proportion in which exactly relevant predictors are selected;
O: the proportion in which all relevant predictors and some irrelevant predictors are selected;
I: the proportion in which some relevant predictors not are selected.

Note that the average or proportion used in the above measures is calculated from 200 realizations.

Example 1

We generate the response from Model D considered by Cho and Fryzlewicz (2012) and originally taken from Fan and Lv (2008):

Y = β X_{1} + β X_{2} + β X_{3} - 3 β \sqrt{ρ} X_{4} + ε,

where the predictors X are simulated from N(0, Σ), where Σ ={σ_ij} is a p × p covariance matrix satisfying σ_ii = 1 and σ_ij = ρ, j ≠ i, except that $σ_{4 j} = σ_{i 4} = \sqrt{ρ}$ . Thus, X₄ is marginally uncorrelated with Y at the population level. To take quantiles into account in regression coefficients, we let β = 2.5(1 + |τ − 0.5|) rather than β = 2.5 given in Cho and Fryzlewicz (2012). The random error ε is generated according to the standard normal distribution and the Laplace distribution. We also let ρ = 0.5 and ρ = 0.95 represent a moderate correlation and a high correlation, respectively. For the sake of saving space, we report the results for ρ = 0.5 in Tables S1–S3 of the Supplemental Materials.

Table 2 reports R_j (j = 1, …, 4) and M for p = 1, 000 and ρ = 0.95. When the predictors are highly correlated (ρ = 0.95), SIS cannot successfully identify all four relevant predictors. In Table S1 of the Supplemental Materials, we find that even under moderate correlation (ρ = 0.5), the SIS approach fails to identify the fourth predictor, which is marginally uncorrelated with Y (see the large values of R₄ in Table S1).

Table 2.

The average rank of the relevant predictors R_j and the average number of the minimum size of the selected model M with p = 1, 000 and ρ = 0.95 in Example 1.

		Standard Normal					Laplace Distribution

τ	Method	R₁	R₂	R₃	R₄	M	R₁	R₂	R₃	R₄	M
	QPCS	2.205	1.945	2.060	3.825	4.015	4.060	2.520	4.885	8.845	13.500
0.2	QTCS	4.050	3.570	4.120	7.180	8.540	6.970	14.290	8.500	19.595	33.530
	QFR	5.215	4.150	4.560	275.955	276.490	9.600	8.980	12.505	480.455	483.040
	SIS	343.580	327.575	330.285	499.975	682.580	304.795	302.405	307.890	499.880	668.845

	QPCS	2.125	2.140	2.090	3.795	4.095	2.275	3.375	3.565	4.040	6.725
0.5	QTCS	4.270	3.875	3.825	17.550	18.615	4.340	5.405	4.730	36.810	39.565
	QFR	4.470	4.335	3.995	345.760	345.875	5.000	5.455	8.110	487.020	487.545
	SIS	345.635	345.250	337.305	500.985	691.675	320.695	310.385	319.025	510.840	685.620

	QPCS	1.935	2.085	2.145	3.855	4.010	5.015	3.835	3.180	11.195	14.520
0.8	QTCS	3.775	3.905	4.115	6.540	7.615	15.445	7.470	8.720	32.495	46.225
	QFR	4.410	4.375	4.150	252.240	252.315	9.425	7.945	12.175	459.700	459.715
	SIS	338.835	335.465	346.660	501.060	686.38	311.050	308.080	320.905	492.905	662.565

Open in a new tab

The aim of the QFR method is to remove the effect from the predictors identified in the previous steps. It performs reasonably well for the moderate correlation (see small values of R_j and M in Table S1). When ρ = 0.95, however, QFR is not capable of identifying the fourth predictor. This finding is not surprising since FR is not designed to remove high collinearity (i.e., confounding) effect. As for the QPCS and QTCS screening procedures, Table 2 indicates that both are able to control the effect of collinearity and identify relevant variables. However, QPCS is uniformly superior to QTCS in all measures. This is because QPCS can prevent more overfitting than QTCS by removing the confounding effect in every sequential step.

After examining screening performance, we next evaluate best subset selection. Since SIS does not show strong performance, we only consider variable selection via other three screening procedures. Table 3 reports TP and FP calculated under three selection methods, EBIC1, EBIC2 and LASSO, for p = 1, 000 and ρ = 0.95. Furthermore, Table 4 correspondingly presents the percentages of correct-fitting (C), over-fitting (O), and incorrect-fitting (I) for p = 1, 000 and ρ = 0.95.

Table 3.

Variable selection results of TP and FP for the extended BIC and LASSO with p = 1, 000 and ρ = 0.95 in Example 1.

		QPCS			PTCS			QFR

τ		EBIC1	EBIC2	LASSO	EBIC1	EBIC2	LASSO	EBIC1	EBIC2	LASSO
		Standard Normal

0.2	TP	4.000	4.000	2.035	3.945	3.830	2.035	3.385	3.210	2.020
	FP	10.405	0.430	0.345	10.465	2.600	1.375	11.780	4.680	1.895
0.5	TP	4.000	3.980	2.490	3.790	3.200	2.500	3.265	2.470	2.330
	FP	9.605	0.145	2.300	10.365	1.895	3.555	11.105	2.680	4.750
0.8	TP	4.000	4.000	2.995	3.955	3.785	2.685	3.350	3.085	2.410
	FP	10.580	0.305	0.350	10.585	2.415	2.030	11.985	4.585	2.875

		Laplace Distribution

0.2	TP	3.960	3.740	1.930	3.605	2.680	1.970	2.905	1.970	1.955
	FP	9.740	0.765	1.125	10.785	2.100	2.145	11.660	3.155	2.750
0.5	TP	3.960	3.830	2.460	3.580	2.515	2.335	3.010	2.455	2.430
	FP	6.625	0.150	2.650	8.685	1.360	2.865	10.215	1.675	3.250
0.8	TP	3.920	3.625	2.875	3.525	2.690	2.560	2.910	2.105	1.880
	FP	10.205	0.760	1.205	10.825	2.320	2.895	11.985	3.030	3.610

Open in a new tab

Table 4.

Variable selection results of C, O, and I for the extended BIC and LASSO with p = 1, 000 and ρ = 0.95 in Example 1.

		QPCS			QTCS			QFR

τ		EBIC1	EBIC2	LASSO	EBIC1	EBIC2	LASSO	EBIC1	EBIC2	LASSO
		Standard Normal

	C	0.000	0.695	0.000	0.000	0.125	0.000	0.000	0.000	0.000
0.2	O	1.000	0.305	0.000	0.950	0.760	0.000	0.400	0.390	0.000
	I	0.000	0.000	1.000	0.050	0.115	1.000	0.600	0.610	1.000
	C	0.000	0.845	0.000	0.000	0.155	0.000	0.000	0.000	0.000
0.5	O	1.000	0.135	0.000	0.795	0.485	0.000	0.300	0.250	0.000
	I	0.000	0.020	1.000	0.205	0.360	1.000	0.700	0.750	1.000
	C	0.000	0.765	0.000	0.000	0.130	0.000	0.000	0.000	0.000
0.8	O	1.000	0.235	0.000	0.965	0.760	0.000	0.350	0.335	0.000
	I	0.000	0.000	1.000	0.035	0.110	1.000	0.650	0.665	1.000

		Laplace Distribution

	C	0.000	0.530	0.000	0.000	0.110	0.000	0.000	0.000	0.000
0.2	O	0.960	0.290	0.000	0.715	0.340	0.000	0.050	0.045	0.000
	I	0.040	0.180	1.000	0.285	0.550	1.000	0.950	0.955	1.000
	C	0.040	0.805	0.000	0.010	0.160	0.000	0.000	0.005	0.000
0.5	O	0.930	0.085	0.000	0.605	0.195	0.000	0.080	0.020	0.000
	I	0.070	0.110	1.000	0.385	0.645	1.000	0.920	0.795	1.000
	C	0.000	0.470	0.000	0.000	0.045	0.000	0.000	0.000	0.000
0.8	O	0.935	0.340	0.005	0.650	0.370	0.000	0.080	0.080	0.000
	I	0.065	0.190	0.095	0.350	0.585	1.000	0.920	0.920	1.000

Open in a new tab

We observe that the TP values for the LASSO method are much smaller than 4, which is the number of true predictors in the model. In addition, the FP values are not large since LASSO tends to select a small number of variables among highly correlated covariates. Moreover, the proportion of incorrect fitting is most often 1 for the LASSO method. Hence, LASSO is not an effective method for variable selection in this context. On the contrary, EBIC1 tends to exhibit overfitting, as evidenced by FPs twice as large as true number of predictors. In addition, the proportion of overfitting is often very large as shown in Table 4, except for QFR. This is because QFR tends to fit incorrectly under this scenario, which is consistent with the findings in Table 3. In comparison with EBIC1, EBIC2 yields much better FP with slightly weaker TP. In addition, EBIC2 is uniformly superior to LASSO in both TP and FP measures. Consequently, EBIC2 is a favorable choice. Moreover, Tables 3 and 4 indicate that QPCS-EBIC2 outperforms its competitors in best subset selection. Finally, it is not surprising that the performance of all screening and selection procedures deteriorates when ρ becomes large, or ε has a heavy-tailed distribution. To save space, we report the additional simulation results for p = 2, 000 in the Supplementary Materials (see Tables S4–S6.) These simulations yield the same conclusion as that of p = 1, 000.

Example 2

We generate the response from the model:

Y = β X_{1} + β X_{2} + β X_{3} - 3 β \sqrt{ρ} X_{4} + 0.25 β X_{5} + ε,

where β, ρ, and X are defined as in Example 1 except that σ_5j = σ_i5 = 0 such that X₅ is uncorrelated with X_j (j ≠ 5). In addition, X₅ has a small contribution to Y. This model is also considered by Cho and Fryzlewicz (2012) and Fan and Lv (2008). To save space, we report the results for ρ = 0.5 in Tables S7–S9 of the Supplemental Materials.

Tables 5 and S7 report R_j (j = 1, …, 4) and M for ρ = 0.95 and 0.5, respectively. From Table S7 in the Supplementary Materials, we observe that SIS gives large values for R₄, R₅ and M for ρ = 0.5. Hence, SIS is not able to identify variables X₄ and X₅ in this case. When ρ = 0.95, SIS is able to identify X₅ due to its lack of correlation with other variables, whereas it fails to identify variables X₁ to X₄ since they are highly correlated with others. As a result, QFR, QPCS and QTCS outperform SIS. For these three procedures, we obtain the same conclusion as in Example 1. Both QPCS and QTCS are superior to QFR, and QPCS performs the best. Tables 6 and 7 summarize the results of subset selection, by presenting TP, FP, and the proportions of correct-fitting (C), over-fitting (O) and incorrect-fitting (I) calculated via EBIC1, EBIC2 and LASSO. Both tables show that QPCS-EBIC2 performs the best. The same findings emerge from Tables S8 and S9 of the Supplementary Materials when ρ = 0.5.

Table 5.

The average rank of the relevant predictors R_j and the average minimum size of the selected model M with p = 1, 000 and ρ = 0.95 in Example 2.

		Standard normal						Laplace Distribution

τ	Method	R₁	R₂	R₃	R₄	R₅	M	R₁	R₂	R₃	R₄	R₅	M
ρ = 0.95

	QPCS	2.720	2.560	2.685	4.560	2.655	5.060	6.380	4.625	5.990	13.455	3.155	21.485
0.2	QTCS	5.020	4.870	4.870	8.450	1.010	9.450	7.445	15.180	18.815	20.085	1.120	42.720
	QFR	5.775	5.175	5.340	235.085	1.010	235.185	10.605	8.685	14.225	444.770	1.145	446.630
	SIS	388.100	373.320	377.390	491.820	1.015	700.930	330.035	330.145	340.550	509.575	1.830	689.145

	QPCS	2.765	2.805	2.825	4.510	2.495	5.19	2.950	4.540	2.970	5.140	2.580	7.420
0.5	QTCS	4.580	5.655	5.235	20.290	1.045	21.415	6.765	12.340	6.870	26.250	1.010	35.080
	QFR	5.650	5.315	6.165	358.660	1.045	358.77	9.715	6.085	9.905	474.955	1.010	475.385
	SIS	342.660	326.660	330.650	505.585	1.015	686.76	359.165	349.170	350.990	499.660	5.920	698.655

	QPCS	2.775	2.785	2.790	4.645	2.475	5.200	6.510	3.325	3.980	16.295	5.235	23.035
0.8	QTCS	4.810	5.040	4.625	7.055	1.015	8.145	6.945	18.015	9.310	19.055	1.065	36.94
	QFR	5.755	5.740	5.290	214.880	1.015	214.965	15.905	15.115	10.305	461.625	1.095	468.02
	SIS	383.340	385.060	383.290	499.870	5.855	713.315	347.490	344.295	375.930	518.075	1.045	715.87

Open in a new tab

Table 6.

Variable selection results of TP and FP for the extended BIC and LASSO with p = 1, 000 and ρ = 0.95 in Example 2.

		QPCS			QTCS			QFR

τ		EBIC1	EBIC2	LASSO	EBIC1	EBIC2	LASSO	EBIC1	EBIC2	LASSO
		Standard Normal

0.2	TP	5.000	5.000	2.995	4.920	4.785	2.995	4.340	4.135	2.995
	FP	9.420	0.450	0.580	10.085	2.765	2.040	10.905	4.680	2.825
0.5	TP	5.000	4.965	3.495	4.800	4.075	3.500	4.220	3.255	3.220
	FP	8.550	0.250	2.260	9.655	1.755	2.385	10.496	2.555	2.605
0.8	TP	5.000	4.995	3.970	4.975	4.810	3.995	4.370	4.120	3.985
	FP	9.380	0.525	0.610	9.735	2.535	2.425	10.995	4.445	3.310

		Laplace Distribution

0.2	TP	4.910	4.495	2.890	4.635	3.685	2.855	3.885	3.030	2.860
	FP	9.550	0.885	1.480	9.935	1.865	2.645	11.080	2.995	3.500
0.5	TP	4.950	4.605	3.475	4.640	3.445	3.060	3.985	2.800	2.405
	FP	6.580	0.205	2.625	8.480	1.135	2.860	9.400	1.325	3.145
0.8	TP	4.905	4.570	3.845	4.600	3.850	3.825	3.875	3.205	2.825
	FP	9.570	1.025	3.975	9.770	2.045	3.975	11.145	2.945	3.915

Open in a new tab

Table 7.

Variable selection results of C, O, and I for the extended BIC and LASSO with p = 1, 000 and ρ = 0.95 in Example 2.

		QPCS			QTCS			QFR

τ		EBIC1	EBIC2	LASSO	EBIC1	EBIC2	LASSO	EBIC1	EBIC2	LASSO
		Standard Normal

	C	0.000	0.675	0.005	0.000	0.150	0.000	0.000	0.000	0.000
0.2	O	1.000	0.325	0.000	0.925	0.720	0.000	0.355	0.335	0.000
	I	0.000	0.000	0.995	0.075	0.130	1.000	0.645	0.665	1.000
	C	0.000	0.775	0.025	0.000	0.175	0.000	0.000	0.010	0.000
0.5	O	1.000	0.190	0.165	0.815	0.455	0.000	0.265	0.210	0.000
	I	0.000	0.035	0.810	0.185	0.370	1.000	0.735	0.780	1.000
	C	0.000	0.615	0.290	0.000	0.155	0.000	0.000	0.005	0.000
0.8	O	1.000	0.380	0.060	0.975	0.745	0.000	0.390	0.365	0.000
	I	0.000	0.005	0.650	0.025	0.100	1.000	0.610	0.630	1.000

		Laplace Distribution

	C	0.000	0.420	0.000	0.000	0.125	0.000	0.000	0.005	0.000
0.2	O	0.920	0.345	0.010	0.730	0.345	0.000	0.065	0.055	0.000
	I	0.080	0.235	0.990	0.270	0.530	1.000	0.935	0.940	1.000
	C	0.000	0.735	0.000	0.000	0.165	0.000	0.000	0.020	0.000
0.5	O	0.960	0.115	0.145	0.700	0.175	0.000	0.100	0.025	0.000
	I	0.040	0.150	0.855	0.300	0.660	1.000	0.900	0.955	1.000
	C	0.000	0.350	0.110	0.000	0.115	0.000	0.000	0.000	0.000
0.8	O	0.925	0.440	0.045	0.700	0.375	0.000	0.050	0.040	0.000
	I	0.075	0.210	0.845	0.300	0.510	1.000	0.950	0.960	1.000

Open in a new tab

Example 3

In the first two examples, we have demonstrated the performance of the proposed variable screening procedure. In this example, we compare our recommended method QPCS-EBIC2 with other two methods, the l₁ penalization and ISIS-SCAD. We use the “SIS” R package to implement ISIS-SCAD, in which it first implements the Iterative Sure Independence Screening, and then fits the final regression model by using the SCAD penalty. In short, we denote these three methods as QPCS, l₁, and ISIS, respectively. We simulate data from the same data generating process given in Example 1 with p = 1, 000. Since ISIS is designed for mean regression models, we only consider τ = 0.5 for fair comparison.

Table 8 reports TP and FP, and Table 9 presents the percentages of correct-fitting (C), over-fitting (O), and incorrect-fitting (I). For the l₁ penalization method, we observed that both of the true positive and false positive values are very small when ρ = 0.95, since it only selects one or zero variable from highly correlated covariates. For ρ = 0.5 and 0.05, however, it has very large false positives. It is also worth noting that l₁ has very large portions of incorrect fitting even at the moderate correlation of ρ = 0.5. This is because it often has missed the fourth variable, X₄, which is highly correlated with the other three variables. As a result, l₁ can be seriously affected by the correlation and its performance deteriorates as the correlation becomes larger. As for ISIS, its performance at ρ = 0.95 is worse than that at ρ = 0.5 and 0.05. In addition, it has larger false positive values and its correct-fitting rates close to zero when ρ = 0.95. In contrast to l₁ and ISIS, Tables 8 and 9 indicate that QPCS has the best performance in all cases. Specifically, its correct-fitting percentages are more than 80% even at ρ = 0.95, the numbers of true positives are close to 4, and the numbers of false positives are small. It is of interest that QPCS performs slightly better for the Laplace error distribution than for the Normal error distribution when ρ = 0.05 and 0.5. This finding may be related to the fact that at τ = 0.5 the parameter estimate from quantile regression is the MLE under the Laplace distribution.

Table 8.

Variable selection results of TP and FP for the QPCS, l₁, and ISIS methods with p = 1, 000 and τ = 0.5 in Example 3.

	ρ = 0.95			ρ = 0.50			ρ = 0.05

	QPCS	l₁	ISIS	QPCS	l₁	ISIS	QPCS	l₁	ISIS
	Standard Normal

TP	3.980	0.425	3.825	4.000	3.065	4.000	4.000	4.000	4.000
FP	0.145	0.350	4.950	0.145	45.365	1.035	0.125	16.815	1.150

	Laplace Distribution

TP	3.830	0.435	3.710	4.000	3.005	3.985	4.000	4.000	4.000
FP	0.150	0.280	4.905	0.015	39.110	1.860	0.015	16.585	1.765

Open in a new tab

Table 9.

Variable selection results of C, O, and I for the QPCS, l₁, and ISIS methods with p = 1, 000 and τ = 0.5 in Example 3.

	ρ = 0.95			ρ = 0.50			ρ = 0.05

	QPCS	l₁	ISIS	QPCS	l₁	ISIS	QPCS	l₁	ISIS
	Standard Normal

C	0.845	0.000	0.070	0.875	0.000	0.385	0.890	0.000	0.355
O	0.135	0.000	0.855	0.125	0.065	0.615	0.110	1.000	0.645
I	0.020	1.000	0.075	0.000	0.935	0.000	0.000	0.000	0.000

	Laplace Distribution

C	0.805	0.000	0.055	0.985	0.000	0.195	0.965	0.000	0.180
O	0.085	0.000	0.870	0.015	0.005	0.800	0.035	1.000	0.820
I	0.110	1.000	0.075	0.000	0.995	0.005	0.000	0.000	0.000

Open in a new tab

Upon an anonymous referee’s suggestion, we further compare QPCS with the SCAD penalization (Wang et al., 2012) and l₁ when the error is the t-distribution with three degrees of freedom. The tuning parameter for the SCAD method is selected by PBIC as given in the R package ‘rqPen’. The results demonstrate that QPCS outperforms l₁ and SCAD. To save space, we report the simulation results in Case 1 of Example S3 in the Supplemental Materials. Inspired by an anonymous referee’s comment, we have conducted a simulation experiment using a block diagonal covariance matrix. The detailed descriptions of simulation settings and the results are given in Case 2 of Example S3 in the Supplemental Materials. The numerical results also demonstrate the superiority of QPCS in comparing with l₁ and ISIS.

Remark 4

In our simulation studies, we assume that the covariance matrix of covariates is exchangeable (i.e., compound symmetry), except for Case 2 of Example 3. Hence, for a given correlation coefficient ρ and the subset S_j, the largest and the smallest eigenvalues of the exchangeable covariance matrix Σ_{S_j} are (1 − ρ) + ρ|S_j| and 1 − ρ, respectively. In our proposed algorithm, we require that |S_j | = o(n^1/2). As a result, the maximal eigenvalue of Σ_{S_j} does not satisfy the boundedness condition given in (C2)(ii), when |S_j| is allowed to diverge with the sample size n. Although Condition (C2)(ii) is not satisfied in this scenario, our proposed method shows good performance due to the fact that |S_j| is often small (much smaller than n) in practice. It is worth noting that this can be an interesting future research subject to relax Condition (C2)(ii) so that the proposed method is applicable for a wide variety of covariance structures.

6 Application

In this section, we apply the proposed methods to gene expression data that was used by Scheetz et al. (2006) for investigating gene regulation in the mammalian eye and identifying genetic variations relevant to human eye disease. The dataset contains gene expression values of 31,042 probe sets on 120 rats. The expression levels of genes are analyzed on a log scale with base 2. The response variable is the expression of gene TRIM32 (probe 1389163 at), which is associated with human hereditary diseases of the retina. The purpose of this study is to analyze how the response variable depends on the expression of other genes. Before applying the screening method, we adopt the preprocessing procedure of Scheetz et al. (2006) to first remove each probe for which the maximum expression among the 120 rats is less than the 25th percentile of the entire sample of expression values, and then remove any probe for which the range of the expression among 120 rats is less than 2. As a result, there are 18,958 probes left in our analysis. Following the approach of Wang et al. (2012) and Lee et al. (2014), we subsequently select 3,000 genes with the largest variance in expression value, and then select the top 300 gene expression values in a ranking of their (absolute value) correlation with the response variable. For further illustration, we also consider the top 400, 500, and 800 gene expression values. Afterwards, we apply our proposed method to identify relevant genes for the response variable at quantiles τ = 0.3, 0.5, and 0.7 as in Wang et al. (2012). Note that Lee et al. (2014) considered τ = 0.25, 0.5, and 0.75.

To assess the finite sample performance, we consider 50 random partitions. For each partition, we randomly divide the data into a training dataset with 80 observations and a testing dataset with 40 observations. From the training dataset, we conduct screening and subset selection, and then fit the quantile regression model with the selected predictors. Subsequently, we employ the resulting quantile regression estimators and the testing data to compute the prediction error $\sum_{i = 1}^{40} ρ_{τ} (Y_{i} - Ŷ_{i})$ . A smaller value of the prediction error indicates better performance. In simulation studies, we find that SIS does not perform satisfactory, and EBIC1 and LASSO are inferior to EBIC2. As a result, we only employ three proposed screening procedures QPCS, QTCS, and QFR to screen predictors and one selection criterion EBIC2 for best subset selection. The resulting three methods are denoted QPCS-EBIC2, QTCS-EBIC2, and QFR-EBIC2, respectively.

For p = 300, Wang et al. (2012)’s methods revealed that the averaged number of relevant predictors ranges from 9.08 to 21.66 and the averaged prediction errors are between 1.30 and 1.82 across the three quantiles. Recently, Lee et al. (2014) found that their method not only obtains relevant predictor sizes of 2.24, 2.16, and 1.16 when τ = 0.25, τ = 0.5, and τ = 0.75, respectively, but also yields comparable prediction errors of 1.42, 1.64, and 1.30 accordingly. By applying our proposed methods, Table 10 shows that the size selected via QPCS-EBIC2, QTCS-EBIC2, and QFR-EBIC2 ranges from 1.68 to 2.50 across the three quantiles. In addition, the resulting PEs are between 0.502 to 1.016, and they are smaller than those values obtained via the approaches of Wang et al. (2012) and Lee et al. (2014). Among our three proposed methods, QPCS-EBIC2 is superior to QTCS-EBIC2 and QFR-EBIC2 in terms of average prediction error measure, although its sizes are slightly larger than those of QTCS-EBIC2 when τ = 0.5 and 0.7. As p increases to 400, 500, and 800, however, QPCS-EBIC2 has the smallest values in both size and PE across all three quantiles. This finding is consistent with simulation results. In sum, the proposed quantile partial correlation screening algorithm can be considered for quantile regression selection with high dimensional data.

Table 10.

The average number of selected variables (Size) and its standard error as shown in the parentheses and the average prediction errors (PE) and its standard error as shown in the parentheses for the three screening methods with EBIC2 at τ = 0.3, 0.5, 0.7 and p = 300, 400, 500, 800 among the 50 random partitions.

		τ = 0.3		τ = 0.5		τ = 0.7

		Size	PE	size	PE	Size	PE
	QPCS1-EBIC2	1.86(0.130)	0.502(0.053)	1.96(0.201)	0.966(0.095)	1.80(0.178)	0.845(0.091)
p = 300	QPCS2-EBIC2	2.38(0.156)	0.545(0.077)	1.92(0.156)	1.016(0.109)	1.68(0.135)	0.891(0.080)
	QFR-EBIC2	2.50(0.207)	0.507(0.053)	2.24(0.136)	0.985(0.113)	2.02(0.175)	0.877(0.108)

	QPCS1-EBIC2	1.64(0.120)	0.552(0.051)	1.98(0.177)	0.839(0.068)	1.70(0.137)	0.715(0.067)
p = 400	QPCS2-EBIC2	2.66(0.182)	0.571(0.038)	2.06(0.122)	0.877(0.069)	1.78(0.122)	0.957(0.094)
	QFR-MBIC2	2.94(0.190)	0.584(0.058)	2.52(0.146)	0.985(0.060)	2.52(0.135)	0.799(0.075)

	QPCS1-EBIC2	1.94(0.174)	0.558(0.051)	1.68(0.157)	0.684(0.055)	1.70(0.167)	0.811(0.114)
p = 500	QPCS2-EBIC2	2.38(0.140)	0.574(0.047)	2.26(0.158)	1.072(0.087)	1.94(0.174)	1.128(0.090)
	QFR-EBIC2	3.48(0.237)	0.600(0.061)	2.26(0.123)	0.887(0.069)	2.08(0.121)	1.028(0.094)

	QPCS1-EBIC2	1.92(0.219)	0.665(0.081)	1.64(0.145)	0.647(0.053)	1.70(0.194)	0.834(0.089)
p = 800	QPCS2-EBIC2	2.80(0.202)	0.670(0.081)	2.16(0.172)	1.020(0.070)	2.40(0.194)	1.099(0.118)
	QFR-EBIC2	3.62(0.269)	0.693(0.082)	2.80(0.162)	0.783(0.050)	2.54(0.132)	0.991(0.018)

Open in a new tab

7 Discussion

In sparse ultra-high dimensional quantile regression, we introduce three algorithms, QPCS, QTCS, and QFR, that use quantile correlation and quantile partial correlation to screen explanatory variables. We then employ an extended BIC for model selection. Simulation results demonstrate that the QPCS algorithm supports our theoretical findings. In addition, we find that QPCS performs well in the following settings: (1) highly correlated covariates, (2) ultra-high dimension, (3) covariates being either marginally uncorrelated or weakly correlated with the response, and (4) heavy-tailed errors. Moreover, our simulation results show that it is superior to LASSO, SCAD, SIS, and ISIS-SCAD.

To broaden the usefulness of QPCS, we discuss some extensions for future research in variable screening. There are three possible avenues. The first one is to extend quantile correlation and quantile partial correlation to various quantile regression models such as single-index quantile regression. We have conducted simulation studies by changing the conditional quantile function in Example 1 to its exponential form. The results, not presented here, show that QPCS still performs well under this setting.

The second avenue is considering an alternative correlation measure used in the QPCS algorithm. In simple linear regression, it is known that the square of the coefficient of correlation is the same as the coefficient of determination. In multiple linear regression, Kutner et al. (2005) indicated that the square of the partial correlation is the same as the coefficient of partial determination. In addition, Nagelkerke (1991) proposed a general definition of the coefficient of determination via the log-likelihood function of the response variable. Accordingly, as long as the likelihood function (or its related version, such as partial likelihood or quasi-likelihood) is available for any specific regression model and the maximum likelihood estimators of the regression parameters are also attainable, one can replace the quantile correlation and quantile partial correlation used in the QPCS algorithm by their corresponding coefficient of determination and coefficient of partial determination. This approach can be used for various regression models, e.g., generalized linear models, extreme value regression models, and parametric survival models.

Instead of the determination measure, the third avenue for future research considers the residual sum of squares. This is particularly useful for regression models that have no log-likelihood function. In linear regression, it can be easily shown that maximizing correlation is the same as minimizing the residual sum of squares. Analogously, maximizing partial correlation is the same as minimizing the partial residual sum of squares. It is known that the residual sum of squares is the objective function of regression estimators based on the L₂-norm distance. As a result, the partial residual sum of squares is the difference between the two nested objective functions, which is namely the partial objective function. In general, the objective function can be a distance metric such as the L_p-norm distance or another robust function. This motivates us to replace the quantile correlation and quantile partial correlation used in the QPCS algorithm by the objective function and partial objective function, respectively. This approach can be used for many regression models such as generalized additive models, semiparametric models, and robust regression models. In sum, the above three avenues can shed light on areas of future research that warrant thorough investigation and study.

Supplementary Material

suppl

NIHMS781375-supplement-suppl.pdf^{(318.6KB, pdf)}

Acknowledgments

Ma’s research was supported by NSF grant DMS 1306972 and Hellman Fellowship. Li’s research was supported by NSF grant DMS 1512422 and NIDA, NIH grants P50 DA036107 and P50 DA039838. The content is solely the responsibility of the authors and does not necessarily represent the official views of NSF, NIDA or NIH. The authors are grateful to the Editor, the Associate Editor, and three anonymous reviewers for their constructive comments that helped us improve the article substantially.

Appendix

Before proving the three theorems and one proposition, we present the following five lemmas. Lemma A.1 shows the relationship between $β_{j τ}^{0}$ and $β_{j τ}^{*}$ for j = 1, ⋯, p. Lemmas A.2 and A.3 are used in the proofs of Lemmas A.4 and A.5, while Lemmas A.4 and A.5 are needed in the proof of Theorem 1. The proofs of these five lemmas are given in the Supplemental Materials. For the sake of convenience, we denote lim_n→∞a_n/b_n = c with c > 0 and lim_n→∞a_n/b_n = 1 by a_n ≍ b_n and a_n ~ b_n, respectively, for any positive series a_n and b_n. In addition, for any matrix $A = {(A_{i j})}_{i = 1, j = 1}^{s, t}$ , denote |A| = max_{1≤i≤s,1≤j≤t} |A_ij |. Moreover, in the following lemmas, we assume that 0 < κ < 1/2, and r_n = Cn^ω for some 0 ≤ ω < min((1 − 2κ), 2κ) and a positive constant C, as stated in Theorem 1.

Lemma A.1

Assume that $β_{τ}^{0} = {(β_{0 τ}^{0}, \dots, β_{p τ}^{0})}^{T}$ is the unique minimizer of E[ρ_τ(Y − β_0τ − β_1τX₁ − ⋯ − β_pτ X_p)], and, for given 1 ≤ j ≤ p, $α_{j}^{0}$ and ${(β_{0 τ}^{*}, β_{j τ}^{*})}^{T}$ are unique minimizers of $E [ρ_{τ} (Y - X_{- j}^{T} α_{j})]$ and E[ρ_τ (Y* − β_0τ − X_jβ_jτ)], respectively, where $Y^{*} = Y - X_{- j}^{T} α_{j}^{0}$ . Then we have $β_{j τ}^{0} = 0$ if and only if $β_{j τ}^{*} = 0$ .

Lemma A.2

Under Condition (C2) and the assumption n⁻¹δ_n = O(1), for every 1 ≤ j ≤p_n and c₁ > 0, there exist some positive constants c₂ and c₃ such that

P (‖ {\hat{ϑ}}_{j} - ϑ_{j}^{0} ‖ \geq c_{1} r_{n} n^{- 1} δ_{n}) \leq 4 r_{n}^{2} exp (- c_{2} δ_{n}^{2} n^{- 1}) + 2 r_{n}^{2} exp (- c_{3} n r_{n}^{- 2}),

when n is sufficiently large.

Lemma A.3

Under Conditions (C1) and (C2), for every 1 ≤ j ≤ p_n and for any given constant c₄ > 0, there exists some positive constant c₅ such that

P (‖ {\hat{π}}_{j} - π_{j}^{0} ‖ \geq c_{4} n^{- κ}) \leq 3 exp (- c_{5} r_{n}^{- 1} n^{1 - 2 κ}),

when n is sufficiently large.

Lemma A.4

Under Conditions (C1) and (C2), for every 1 ≤ j ≤ p_n and for any given constant c₆ > 0, there exist some positive constants c₇ and c₈ such that

P [| n^{- 1} \sum_{i = 1}^{n} {ψ_{τ} (Y_{i} - X_{i, 𝒮_{j}}^{T} {\hat{π}}_{j}) (X_{i j} - X_{i, 𝒮_{j}}^{T} {\hat{ϑ}}_{j})} - E {ψ_{τ} (Y - X_{𝒮_{j}}^{T} π_{j}^{0}) X_{j}} | \geq c_{6} r_{n}^{1 / 2} n^{- κ}] \leq 7 exp (- c_{7} r_{n}^{- 1} n^{1 - 2 κ}) + 6 r_{n}^{2} exp (- c_{8} n r_{n}^{- 2}),

when n is sufficiently large.

Lemma A.5

Under Condition (C2) and the assumption of r_n in Theorem 1, for every 1 ≤ j ≤ p_n and for any c₉ > 0, there exist some positive constants c₁₀ and c₁₁ such that

P (| {\hat{σ}}_{j}^{2} - σ_{j}^{2} | \geq c_{9} r_{n}^{1 / 2} n^{- κ}) \leq (2 + 8 r_{n}^{2}) exp (- c_{10} r_{n}^{- 1} n^{1 - 2 κ}) + (2 + 6 r_{n}^{2}) exp (- c_{11} n r_{n}^{- 2}),

(A.1)

when n is sufficiently large. Note that ${\hat{σ}}_{j}^{2}$ and $σ_{j}^{2}$ have been defined after equations (6) and (8), respectively. Moreover, for a ∈ (0, 1),

P (| {\hat{σ}}_{j}^{2} - σ_{j}^{2} | \geq a σ_{j}^{2}) \leq (2 + 8 r_{n}^{2}) exp (- c_{10} r_{n}^{- 1} n^{1 - 2 κ}) + (2 + 6 r_{n}^{2}) exp (- c_{11} n r_{n}^{- 2}),

(A.2)

when n is sufficiently large.

Proof of Theorem 1

Denote $ϕ_{j n} = n^{- 1} \sum_{i = 1}^{n} {ψ_{τ} (Y_{i} - X_{i, 𝒮_{j}}^{T} {\hat{π}}_{j}) (X_{i j} - X_{i, 𝒮_{j}}^{T} {\hat{ϑ}}_{j})}$ and $ϕ_{j} = E {(ψ_{τ} (Y - X_{𝒮_{j}}^{T} π_{j}^{0}) X_{j})}$ . Then

| {\hat{q p cor}}_{τ} {Y, X_{j}, X_{𝒮_{j}}} - q p {cor}_{τ} {Y, X_{j}, X_{𝒮_{j}}} | = | {({\hat{σ}}_{j}^{2})}^{- 1} (ϕ_{j n} - ϕ_{j}) - {({\hat{σ}}_{j}^{2})}^{- 1} {(σ_{j}^{2})}^{- 1} ϕ_{j} ({\hat{σ}}_{j}^{2} - σ_{j}^{2}) | \leq {({\hat{σ}}_{j}^{2})}^{- 1} | ϕ_{j n} - ϕ_{j} | + {({\hat{σ}}_{j}^{2})}^{- 1} {(σ_{j}^{2})}^{- 1} | ϕ_{j} | | {\hat{σ}}_{j}^{2} - σ_{j}^{2} | .

(A.3)

After algebraic simplification, we have that, for any a ∈ (0, 1), $| {({\hat{σ}}_{j}^{2})}^{- 1} - {(σ_{j}^{2})}^{- 1} | \geq a^{*} {(σ_{j}^{2})}^{- 1}$ implies $| σ_{j}^{2} - σ_{j}^{2} | \geq a σ_{j}^{2}$ ,where a* = (1 − a)⁻¹ − 1. Hence, by (A.2) in Lemma A.5 and ${(σ_{j}^{2})}^{- 1} \leq c_{σ}^{- 1}$ , we obtain

P ({({\hat{σ}}_{j}^{2})}^{- 1} \geq (1 + a^{*}) c_{σ}^{- 1}) \leq P (| {({\hat{σ}}_{j}^{2})}^{- 1} - {(σ_{j}^{2})}^{- 1} | \geq a^{*} {(σ_{j}^{2})}^{- 1}) \leq P (| {\hat{σ}}_{j}^{2} - σ_{j}^{2} | \geq a σ_{j}^{2}) \leq (2 + 8 r_{n}^{2}) exp (- c_{10} r_{n}^{- 1} n^{1 - 2 κ}) + (2 + 6 r_{n}^{2}) exp (- c_{11} n r_{n}^{- 2}) .

(A.4)

This, in conjunction with Lemma A.4, implies that for any c₆ > 0,

P ({({\hat{σ}}_{j}^{2})}^{- 1} | ϕ_{j n} - ϕ_{j} | \geq (1 + a^{*}) c_{σ}^{- 1} c_{6} r_{n}^{1 / 2} n^{- κ}) \leq (9 + 8 r_{n}^{2}) exp (- c_{1}^{* * *} r_{n}^{- 1} n^{1 - 2 κ}) + (2 + 12 r_{n}^{2}) exp (- c_{1}^{* *} n r_{n}^{- 2}),

(A.5)

for some positive constants $0 < c_{1}^{* * *}, c_{1}^{* *} < \infty$ .

It is worth noting that |ϕ_j| ≤ M₁. Then, employing (A.1) in Lemma A.5 and (A.4), we have that for any c₉ > 0,

P ({({\hat{σ}}_{j}^{2})}^{- 1} {(σ_{j}^{2})}^{- 1} | ϕ_{j} | | {\hat{σ}}_{j}^{2} - σ_{j}^{2} | \geq (1 + a^{*}) c_{σ}^{- 1} c_{σ}^{- 1} M_{1} c_{9} r_{n}^{1 / 2} n^{- κ}) \leq (4 + 16 r_{n}^{2}) exp (- c_{10} r_{n}^{- 1} n^{1 - 2 κ}) + (4 + 12 r_{n}^{2}) exp (- c_{11} n r_{n}^{- 2})

(A.6)

By (A.3), (A.5), and (A.6), we have that, for any C₁ > 0, there exist some positive constants C₂ and C₃ such that

P (| {\hat{q p cor}}_{τ} {Y, X_{j}, X_{𝒮_{j}}} - q p {cor}_{τ} {Y, X_{j}, X_{𝒮_{j}}} | \geq C_{1} r_{n}^{1 / 2} n^{- κ}) \leq (13 + 24 r_{n}^{2}) exp (- C_{2} r_{n}^{- 1} n^{1 - 2 κ}) + (6 + 24 r_{n}^{2}) exp (- C_{3} n r_{n}^{- 2}) \leq C_{2}^{*} r_{n}^{2} exp (- C_{2} r_{n}^{- 1} n^{1 - 2 κ}) + C_{3}^{*} r_{n}^{2} exp (- C_{3} n r_{n}^{- 2}),

(A.7)

for some positive constants $C_{2}^{*}$ and $C_{3}^{*}$ . The last inequality follows from $13 + 24 r_{n}^{2} \leq 37 r_{n}^{2}$ and $6 + 24 r_{n}^{2} \leq 30 r_{n}^{2}$ with $C_{2}^{*} = 37$ and $C_{3}^{*} = 30$ . This completes the proof.

Proof of Theorem 2

On the event

A_{n} = {sup_{j \in ℳ_{*}} | {\hat{q p cor}}_{τ} {Y, X_{j}, X_{𝒮_{j}}} - q p {cor}_{τ} {Y, X_{j}, X_{𝒮_{j}}} | \leq C_{0} r_{n}^{1 / 2} n^{- κ} / 2},

we apply Condition (C3) and obtain $| {\hat{q p cor}}_{τ} {Y, X_{j}, X_{𝒮_{j}}} | \geq C_{0} r_{n}^{1 / 2} n^{- κ} / 2$ . Hence, by the choice of $υ_{n} = C_{4} r_{n}^{1 / 2} n^{- κ}$ with C₄ ≤ C₀/2, we have ℳ_* ⊂ ℳ̂_{ν_n} on the event A_n. This, together with (A.7) and the union bound of probability, yields that

P (A_{n}^{C}) \leq s_{n} {C_{2}^{*} r_{n}^{2} exp (- C_{2} r_{n}^{- 1} n^{1 - 2 κ}) + C_{3}^{*} r_{n}^{2} exp (- C_{3} n r_{n}^{- 2})},

which completes the proof.

Proof of Proposition 1

Employing equation (10) by letting 2c = C₄, we have that, on the event Ω_n, |ℳ̂_{ν_n}| ≤ C*n^ς+κ−ω/2, where 0 < C* < ∞. This, in conjunction with Theorem 1, leads to

P (Ω_{n}) \geq 1 - p_{n} {C_{2}^{*} r_{n}^{2} exp (- C_{2} r_{n}^{- 1} n^{1 - 2 κ}) + C_{3}^{*} r_{n}^{2} exp (- C_{3} n r_{n}^{- 2})} .

Accordingly, Proposition 1 follows.

Proof of Theorem 3

Define k_min = min_1≤k≤d{k : ℳ* ⊂ 𝒜^(k)}. By the assumption that ς + κ − ω/2 < 1/2 and the result in Proposition 1, k_min is well defined and satisfies k_min ≤ C′n^{ς+ κ−ω/2} = o(n^1/2) for some constant 0 < C′ < ∞. For any 1 ≤ k < k_min, 𝒜^(k) are underfitted models such that ℳ* ⊄ 𝒜^(k) and 𝒜^(k) are nested. By (A.18) in the supplementary materials of Lee et al. (2014), with probability approaching 1, we can choose a sequence of constants {L_n} such that L_n → ∞, L_n/C_n → 0, and

| n^{- 1} \sum_{i = 1}^{n} {ρ_{τ} (Y_{i} - X_{i, 𝒜^{(k_{min})}}^{T} {\hat{β}}_{𝒜^{(k_{min})}}) - ρ_{τ} (ε_{i})} | = | n^{- 1} \sum_{i = 1}^{n} {ρ_{τ} (ε_{i} - X_{i, 𝒜^{(k_{min})}}^{T} ({\hat{β}}_{𝒜^{(k_{min})}} - β_{𝒜^{(k_{min})}}^{0})) - ρ_{τ} (ε_{i})} | \leq n^{- 1} L_{n} k_{min} log (d) \leq C ″ n^{- 1} L_{n} n^{ς + κ - ω / 2} log (n)

for some constant 0 < C″ < ∞, where $ε_{i} = Y_{i} - X_{i, 𝒜^{(k_{min})}}^{T} β_{𝒜^{(k_{min})}}^{0}$ and $β_{𝒜^{(k_{min})}}^{0} = {β_{k τ}^{0} : k \in 𝒜^{(k_{min})}}^{T}$ . Under the assumption that E|ε| < ∞, we obtain that $n^{- 1} \sum_{i = 1}^{n} ρ_{τ} (ε_{i}) \to_{p} E ρ_{τ} (ε)$ and c′ ≤ E_{ρ_τ} (ε) ≤ c″ for some constants 0 < c′, c″ < ∞. In addition, by assuming that (n^ς+κ−ω/2n⁻¹ log n)C_n = o(1), we have n⁻¹ L_nn^ς+κ−ω/2 log(n) = o(1). The above results imply that, with probability approaching 1,

c' + o (1) \leq n^{- 1} \sum_{i = 1}^{n} {ρ_{τ} (ε_{i} - X_{i, 𝒜^{(k_{min})}}^{T} ({\hat{β}}_{𝒜^{(k_{min})}} - β_{𝒜^{(k_{min})}}^{0})) \leq c ″ + o (1) .

(A.8)

Moreover, by employing the same techniques as those used in the proof of (A.20) from the supplementary materials of Lee et al. (2014), we have, with probability approaching 1,

n^{- 1} \sum_{i = 1}^{n} ρ_{τ} ((Y_{i} - X_{i, 𝒜^{(k)}}^{T} {\hat{β}}_{𝒜^{(k)}}) - n^{- 1} \sum_{i = 1}^{n} ρ_{τ} (Y_{i} - X_{i, 𝒜^{(k_{min})}}^{T} {\hat{β}}_{𝒜^{(k_{min})}}) \geq 2 c ‴

(A.9)

for any 1 ≤ k < k_min, for some constant 0 < c‴ < ∞. Then, we have, with probability approaching 1, as n → ∞,

min_{1 \leq k < k_{min}} E B I C (𝒜^{(k)}) - E B I C (𝒜^{(k_{min})}) = min_{1 \leq k < k_{min}} {log (1 + \frac{n^{- 1} \sum_{i = 1}^{n} ρ_{τ} ((Y_{i} - X_{i, 𝒜^{(k)}}^{T} {\hat{β}}_{𝒜^{(k)}}) - n^{- 1} \sum_{i = 1}^{n} ρ_{τ} (Y_{i} - X_{i, 𝒜^{(k_{min})}}^{T} {\hat{β}}_{𝒜^{(k_{min})}})}{n^{- 1} \sum_{i = 1}^{n} ρ_{τ} (Y_{i} - X_{i, 𝒜^{(k_{min})}}^{T} {\hat{β}}_{𝒜^{(k_{min})}})}) + (k - k_{min}) \frac{log n}{2 n} C_{n}} \geq min {log 2, \frac{c ‴}{n^{- 1} \sum_{i = 1}^{n} ρ_{τ} (Y_{i} - X_{i, 𝒜^{(k_{min})}}^{T} {\hat{β}}_{𝒜^{(k_{min})}})}} - C' n^{ς + κ - ω / 2} \frac{log n}{2 n} C_{n} \geq min {log 2, \frac{c ‴}{c ″ + o (1)}} - C' n^{ς + κ - ω / 2} \frac{log n}{2 n} C_{n} > 0,

where the first inequality follows from the fact that log(1 + x) ≥ min{x/2, log 2} for any x > 0, (A.9) and k_min ≤ C^{′ς+κ−ω/2}, and the second inequality follows from (A.8) and the assumption that (n^ς+κ−ω/2n⁻¹ log n)C_n = o(1). The above result implies that P(k̂ ≥ k_min) → 1, as n → ∞, which completes the proof.

Contributor Information

Shujie Ma, Assistant Professor, Department of Statistics, University of California-Riverside, Riverside, CA 92521.

Runze Li, Verne M. Willaman Professor, Department of Statistics, the Pennsylvania State University, University Park, PA 16802.

Chih-Ling Tsai, Distinguished Professor and Robert W. Glock Endowed Chair in Management, Graduate School of Management, University of California at Davis, Davis, CA 95616.

References

Angrist J, Chernozhukov V, Fernández-Val I. Quantile regression under misspecification, with an application to the U.S. wage structure. Econometrika. 2006;74:539–563. [Google Scholar]
Belloni A, Chernozhukov V. ℓ1-penalized quantile regression in high-dimensional sparse model. Annals of Statistics. 2011;95:759–771. [Google Scholar]
Bühlmann P, Kalisch M, Maathuis M. Variable selection for high-dimensional models: partially faithful distributions and the PC-simple algorithm. Biometrika. 2009;97:1–19. [Google Scholar]
Chen J, Chen Z. Extended Bayesian Information Criteria for model selection with large model spaces. Biometrika. 2008;95:759–771. [Google Scholar]
Cho H, Fryzlewicz P. High dimensional variable selection via tilting. Journal of the Royal Statistical Society: Series B. 2012;74:593–622. [Google Scholar]
Fan J, Feng Y, Song R. Nonparametric independence screening in sparse ultra-high dimensional additive models. Journal of the American Statistical Association. 2011;106:544–557. doi: 10.1198/jasa.2011.tm09779. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association. 2001;96:1348–1360. [Google Scholar]
Fan J, Lv J. Sure independence screening for ultra-high dimensional feature space (with discussion) Journal of the Royal Statistical Society: Series B. 2008;70:849–911. doi: 10.1111/j.1467-9868.2008.00674.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fan J, Lv J. A selective overview of variable selection in high dimensional feature space. Statistica Sinica. 2010;20:101–148. [PMC free article] [PubMed] [Google Scholar]
Fan J, Song R. Sure independence screening in generalized linear models with NP-dimensionality. The Annals of Statistics. 2010;38:3567–3604. [Google Scholar]
He X, Wang L, Hong HG. Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. Annals of Statistics. 2013;41:342–369. [Google Scholar]
Hjort NL, Pollard D. Asymptotics for minimisers of convex processes. 1993 unpublished. arXiv:1107.3806. [Google Scholar]
Huang J, Ma SG, Zhang CH. Adaptive lasso for sparse high-dimensional regression models. Statistica Sinica. 2008a;18:1603–1618. [Google Scholar]
Huang J, Horowitz JL, Ma S. Asymptotic properties of bridge estimators in sparse high-dimensional regression models. Annals of Statistics. 2008b;36:587–613. [Google Scholar]
Kalisch M, Bühlmann P. Estimating high-dimensional directed acyclic graphs with the PC-algorithm. Journal of Machine Learning Research. 2007;8:613–636. [Google Scholar]
Koenker R. Quantile Regression. New York: Cambridge University Press; 2005. [Google Scholar]
Kutner MH, Nachtsheim CJ, Neter J, Li W. Applied Linear Statistical Models. 5th. New York: McGraw-Hill and Irwin; 2005. [Google Scholar]
Lee ER, Noh H, Park BU. Model selection via Bayesian Information Criterion for quantile regression models. Journal of the American Statistical Association. 2014;109:216–229. [Google Scholar]
Li G, Li Y, Tsai CL. Quantile correlations and quantile autoregressive modeling. Journal of the American Statistical Association. 2015;110:246–261. [Google Scholar]
Li R, Zhong W, Zhu L. Feature screening via distance correlation learning. Journal of American Statistical Association. 2012;107:1129–1139. doi: 10.1080/01621459.2012.695654. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu J, Li R, Wu R. Feature selection for varying coefficient models with ultrahigh dimensional covariates. Journal of the American Statistical Association. 2014;109:266–274. doi: 10.1080/01621459.2013.850086. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nagelkerke NJD. A note on a general definition of the coefficient of determination. Biometrika. 1991;78:691–692. [Google Scholar]
Scheetz TE, Kim K-YA, Swiderski RE, Philp AR, Braun TA, Knudtson KL, Dorrance AM, DiBona GF, Huang J, Casavant TL, Sheffield VC, Stone EM. Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proceedings of the National Academy of Sciences of the United States of America. 2006;103:14429–14434. doi: 10.1073/pnas.0602562103. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schwarz C. Estimating the dimension of a model. The Annals of Statistics. 1978;6:461–464. [Google Scholar]
Tibshirani R. Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society, Series B. 1996;58:267–288. [Google Scholar]
Wang H, Li R, Tsai C-L. Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika. 2007;94:553–568. doi: 10.1093/biomet/asm053. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang H. Forward regression for ultra-high dimensional variable screening. Journal of the American Statistical Association. 2009;104:1512–1524. [Google Scholar]
Wang H, Li B, Leng C. Shrinkage tuning parameter selection with a diverging number of parameters. Journal of Royal Statistical Society, Series B. 2009;71:671–683. [Google Scholar]
Wang L, Wu Y, Li R. Quantile regression for analyzing heterogeneity in ultra-high dimension. Journal of the American Statistical Association. 2012;107:214–222. doi: 10.1080/01621459.2012.656014. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wu Y, Yin G. Conditional quantile screening in ultrahigh-dimensional heterogeneous data. Biometrika. 2015;102:65–76. [Google Scholar]
Zhang CH. Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics. 2010;38:894–942. [Google Scholar]
Zhu L-P, Li L, Li R, Zhu L-X. Model-free feature screening for ultrahigh-dimensional data. Journal of the American Statistical Association. 2011;106:1464–1475. doi: 10.1198/jasa.2011.tm10563. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zou H. The Adaptive lasso and its oracle properties. Journal of the American Statistical Association. 2006;101:1418–1429. [Google Scholar]
Zou H, Hastie T. Regression shrinkage and selection via the elastic net with application to microarrays. Journal of the Royal Statistical Society, Series B. 2005;67:301–320. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

suppl

NIHMS781375-supplement-suppl.pdf^{(318.6KB, pdf)}

[R1] Angrist J, Chernozhukov V, Fernández-Val I. Quantile regression under misspecification, with an application to the U.S. wage structure. Econometrika. 2006;74:539–563. [Google Scholar]

[R2] Belloni A, Chernozhukov V. ℓ1-penalized quantile regression in high-dimensional sparse model. Annals of Statistics. 2011;95:759–771. [Google Scholar]

[R3] Bühlmann P, Kalisch M, Maathuis M. Variable selection for high-dimensional models: partially faithful distributions and the PC-simple algorithm. Biometrika. 2009;97:1–19. [Google Scholar]

[R4] Chen J, Chen Z. Extended Bayesian Information Criteria for model selection with large model spaces. Biometrika. 2008;95:759–771. [Google Scholar]

[R5] Cho H, Fryzlewicz P. High dimensional variable selection via tilting. Journal of the Royal Statistical Society: Series B. 2012;74:593–622. [Google Scholar]

[R6] Fan J, Feng Y, Song R. Nonparametric independence screening in sparse ultra-high dimensional additive models. Journal of the American Statistical Association. 2011;106:544–557. doi: 10.1198/jasa.2011.tm09779. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association. 2001;96:1348–1360. [Google Scholar]

[R8] Fan J, Lv J. Sure independence screening for ultra-high dimensional feature space (with discussion) Journal of the Royal Statistical Society: Series B. 2008;70:849–911. doi: 10.1111/j.1467-9868.2008.00674.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] Fan J, Lv J. A selective overview of variable selection in high dimensional feature space. Statistica Sinica. 2010;20:101–148. [PMC free article] [PubMed] [Google Scholar]

[R10] Fan J, Song R. Sure independence screening in generalized linear models with NP-dimensionality. The Annals of Statistics. 2010;38:3567–3604. [Google Scholar]

[R11] He X, Wang L, Hong HG. Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. Annals of Statistics. 2013;41:342–369. [Google Scholar]

[R12] Hjort NL, Pollard D. Asymptotics for minimisers of convex processes. 1993 unpublished. arXiv:1107.3806. [Google Scholar]

[R13] Huang J, Ma SG, Zhang CH. Adaptive lasso for sparse high-dimensional regression models. Statistica Sinica. 2008a;18:1603–1618. [Google Scholar]

[R14] Huang J, Horowitz JL, Ma S. Asymptotic properties of bridge estimators in sparse high-dimensional regression models. Annals of Statistics. 2008b;36:587–613. [Google Scholar]

[R15] Kalisch M, Bühlmann P. Estimating high-dimensional directed acyclic graphs with the PC-algorithm. Journal of Machine Learning Research. 2007;8:613–636. [Google Scholar]

[R16] Koenker R. Quantile Regression. New York: Cambridge University Press; 2005. [Google Scholar]

[R17] Kutner MH, Nachtsheim CJ, Neter J, Li W. Applied Linear Statistical Models. 5th. New York: McGraw-Hill and Irwin; 2005. [Google Scholar]

[R18] Lee ER, Noh H, Park BU. Model selection via Bayesian Information Criterion for quantile regression models. Journal of the American Statistical Association. 2014;109:216–229. [Google Scholar]

[R19] Li G, Li Y, Tsai CL. Quantile correlations and quantile autoregressive modeling. Journal of the American Statistical Association. 2015;110:246–261. [Google Scholar]

[R20] Li R, Zhong W, Zhu L. Feature screening via distance correlation learning. Journal of American Statistical Association. 2012;107:1129–1139. doi: 10.1080/01621459.2012.695654. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] Liu J, Li R, Wu R. Feature selection for varying coefficient models with ultrahigh dimensional covariates. Journal of the American Statistical Association. 2014;109:266–274. doi: 10.1080/01621459.2013.850086. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] Nagelkerke NJD. A note on a general definition of the coefficient of determination. Biometrika. 1991;78:691–692. [Google Scholar]

[R23] Scheetz TE, Kim K-YA, Swiderski RE, Philp AR, Braun TA, Knudtson KL, Dorrance AM, DiBona GF, Huang J, Casavant TL, Sheffield VC, Stone EM. Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proceedings of the National Academy of Sciences of the United States of America. 2006;103:14429–14434. doi: 10.1073/pnas.0602562103. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] Schwarz C. Estimating the dimension of a model. The Annals of Statistics. 1978;6:461–464. [Google Scholar]

[R25] Tibshirani R. Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society, Series B. 1996;58:267–288. [Google Scholar]

[R26] Wang H, Li R, Tsai C-L. Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika. 2007;94:553–568. doi: 10.1093/biomet/asm053. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] Wang H. Forward regression for ultra-high dimensional variable screening. Journal of the American Statistical Association. 2009;104:1512–1524. [Google Scholar]

[R28] Wang H, Li B, Leng C. Shrinkage tuning parameter selection with a diverging number of parameters. Journal of Royal Statistical Society, Series B. 2009;71:671–683. [Google Scholar]

[R29] Wang L, Wu Y, Li R. Quantile regression for analyzing heterogeneity in ultra-high dimension. Journal of the American Statistical Association. 2012;107:214–222. doi: 10.1080/01621459.2012.656014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] Wu Y, Yin G. Conditional quantile screening in ultrahigh-dimensional heterogeneous data. Biometrika. 2015;102:65–76. [Google Scholar]

[R31] Zhang CH. Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics. 2010;38:894–942. [Google Scholar]

[R32] Zhu L-P, Li L, Li R, Zhu L-X. Model-free feature screening for ultrahigh-dimensional data. Journal of the American Statistical Association. 2011;106:1464–1475. doi: 10.1198/jasa.2011.tm10563. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] Zou H. The Adaptive lasso and its oracle properties. Journal of the American Statistical Association. 2006;101:1418–1429. [Google Scholar]

[R34] Zou H, Hastie T. Regression shrinkage and selection via the elastic net with application to microarrays. Journal of the Royal Statistical Society, Series B. 2005;67:301–320. [Google Scholar]

PERMALINK

Variable screening via quantile partial correlation

Shujie Ma

Runze Li

Chih-Ling Tsai

Abstract

1 Introduction

2 Quantile partial correlation (QPCOR)

Quantile correlation

Quantile partial correlation

3 Theoretical properties

Theorem 1

Remark 1

Theorem 2

Proposition 1

Remark 2

4 QPCOR and selection

4.1 Screening algorithms

Algorithm 1 (QPCS)

Algorithm 2 (QTCS)

Algorithm 3 (QFR)

Table 1.

Remark 3

4.2 Best subset selection

Theorem 3

5 Simulation studies

Example 1

Table 2.

Table 3.

Table 4.

Example 2

Table 5.

Table 6.

Table 7.

Example 3

Table 8.

Table 9.

Remark 4

6 Application

Table 10.

7 Discussion

Supplementary Material

Acknowledgments

Appendix

Lemma A.1

Lemma A.2

Lemma A.3

Lemma A.4

Lemma A.5

Proof of Theorem 1

Proof of Theorem 2

Proof of Proposition 1

Proof of Theorem 3

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases