Abstract
In quantile linear regression with ultra-high dimensional data, we propose an algorithm for screening all candidate variables and subsequently selecting relevant predictors. Specifically, we first employ quantile partial correlation for screening, and then we apply the extended Bayesian information criterion (EBIC) for best subset selection. Our proposed method can successfully select predictors when the variables are highly correlated, and it can also identify variables that make a contribution to the conditional quantiles but are marginally uncorrelated or weakly correlated with the response. Theoretical results show that the proposed algorithm can yield the sure screening set. By controlling the false selection rate, model selection consistency can be achieved theoretically. In practice, we proposed using EBIC for best subset selection so that the resulting model is screening consistent. Simulation studies demonstrate that the proposed algorithm performs well, and an empirical example is presented.
Keywords: Quantile correlation, Quantile partial correlation, Screening, Variable selection
1 Introduction
Advances in modern technology have enabled people to collect massive data with a large number of variables, many of which may be irrelevant to the response variable. Examples can be found in gene expression microarray data, single nucleotide polymorphism (SNP) data, imaging data, high-frequency financial data, and others. Hence, extracting useful variables for the prediction of the response in the high-dimensional data has become a focal research area in the past two decades. Apparently, the traditional variable selection methods such as best subset selection and backward elimination become computationally infeasible when the number of predictors is large. As a result, a variety of penalization methods have been developed. These methods include, but are not limited to, Lasso and Adaptive Lasso (Tibshirani, 1996; Zou, 2006; Huang et al., 2008a), bridge regression (Huang et al., 2008b), SCAD (Fan and Li, 2001), elastic net (Zou and Hastie, 2005), and MCP (Zhang, 2010).
When the dimension is much larger than the sample size, penalized estimation can perform poorly or even become infeasible (Fan and Lv, 2010). Then the variable screening method becomes a natural way to consider in this context, which assumes that the relevant features lie in a low dimensional manifold. Thus, the ultrahigh-dimensional problem can be greatly simplified into a low-dimensional one. Recently, Fan and Lv (2008) introduced the marginal screening method (the sure independent screening, SIS) to select relevant variables based on the marginal correlation of each variable and the response. Its good numerical performance and novel theoretical properties have made SIS popular in ultrahigh dimensional analysis. As a result, SIS and its extensions have been applied to many important settings including generalized linear model (Fan and Song, 2010), multi-index semi-parametric models (Zhu et al., 2011), nonparametric regression (Fan et al., 2011; Liu et al., 2014), quantile regression (He et al., 2013; Wu and Yin, 2015) and so forth.
The marginal screening method employs the marginal correlation to measure the strength of association between predictors and the response. Hence, it can miss some relevant variables that are associated with the response conditionally but not marginally. Furthermore, the marginal correlation can be misleading when there exist non-negligible correlations among the predictors. As a result, an irrelevant variable can be selected prior to relevant variables. Moreover the issue of collinearity can yield spurious phenomena in the high-dimensional data as demonstrated by Fan and Lv (2008).
To handle the problem of high correlations between predictors, several methods have been developed in the literature. Bühlmann et al. (2009) proposed the PC-Simple algorithm, which uses partial correlation as a criterion to measure the association of each predictor with the response. Wang (2009) applied the forward selection method in the ultrahigh-dimensional setting and developed a forward regression (FR) algorithm to select the most relevant variable in each step sequentially by removing the confounding effects of the selected variables from the previous steps. It can be shown that Wang’s algorithm is also based on the partial correlation measure. Moreover, Cho and Fryzlewicz (2012) proposed a ‘tilted’ correlation to measure the contribution of each predictor to the response. Based on simulation studies, they found that their proposed tilted correlation screening (TCS2) algorithm performs well. The above studies demonstrate that the partial correlation plays an important role in the screening process.
In the area of quantile linear regression with low dimensional data, theoretical properties and practical applications have been well developed; see Koenker (2005). For high dimensional data, however, it is far from complete. Recently, Wang et al. (2012) and Lee et al. (2014), respectively, extended the penalized approach and Bayesian selection method from the classical mean regression model to quantile regression model. Their generalizations motivate us to propose a screening method for high-dimensional quantile regression model. Specifically, we adopt Li et al. (2015)’s quantile partial correlation (QPCOR) as a criterion to measure the association of each predictor with the response at each quantile, and then introduce a new screening procedure by using the sample QPCOR. Our goal is to identify a sparse set of ultra-high dimensional variables X = (X1, …, Xp)T that are relevant for modeling the conditional quantile of the response Y.
To employ QPCOR, we transform each predictor Xj by projecting it onto a set of variables, denoted by 𝒮j, which is either the union of its related variables and the previously selected variables or only the previously selected variables. We then introduce an adaptive approach to choose the subset of related variables by adopting a sequential testing method based on the partial correlations of related variables. It is worth noting that the size of 𝒮j cannot be too large since it would distort the association between Xj and Y. To this end, we suggest a hard threshold for determining which variables are related to Xj, and then obtain the upper bound of the maximal cardinality of the subsets 𝒮j. In addition, we derive a uniform bound of the difference between the sample QPCOR and the population QPCOR, and subsequently establish the sure screening property of the proposed procedure as needed in screening methods (Fan and Lv, 2008; Fan et al., 2011; He et al., 2013). Moreover, we generalize Wang’s (2009) FR algorithm and Cho and Fryzlewicz’s (2012) TCS2 algorithm to the quantile regression model. After the screening procedure, we apply the extended Bayesian information criterion (EBIC) (Chen and Chen, 2008; Wang and Leng, 2009; Lee et al., 2014) for best subset selection. Consequently, our proposed approach not only selects relevant variables when the variables are highly correlated, but also identifies the variables that are marginally uncorrelated or weakly correlated with the response.
The paper is organized as follows. Section 2 introduces quantile partial correlation. Section 3 provides the theoretical properties of the quantile screening procedure including the sure screening property. Section 4 presents three algorithms, which consist of our proposed algorithm and the quantile version of the forward regression and tilted correlation screening algorithms. We also introduce the extended BIC criterion for best subset selection. Section 5 conducts simulation studies, while Section 6 illustrates the usefulness of the proposed method through the analysis of gene expression data. A discussion is given in Section 7. All technical proofs are relegated to the Appendix and Supplemental Materials, and additional simulation results are presented in the Supplemental Materials.
2 Quantile partial correlation (QPCOR)
Before we present the quantile partial correlation (QPCOR), we review the quantile correlation (QCOR) and its connection to regression coefficients in the linear quantile regression model.
Quantile correlation
For mean regression models with ultra-high dimensional covariates, Fan and Lv (2008) proposed the SIS procedure to select variables according to the magnitudes of their marginal Pearson correlations associated with the response. Analogously, in the quantile regression context, we introduce Li et al’s (2015) quantile correlation of Y and Xj:
| (1) |
for 1 ≤ j ≤ p, where Qτ(Y) is the τth unconditional quantile of Y such that P(Y < Qτ(Y)) = τ and ψτ(w) = τ − I(w < 0). As a result, −1 ≤qcorτ{Y, Xj} ≤ 1. As shown by Li et al. (2015), there is a nice relationship between the quantile correlation (1) and the slope of the τ-th quantile linear regression line with Y and Xj being the response and predictor, respectively. Consider the following minimizers:
| (2) |
where ρτ(w) = wτ − wI(w < 0) is the quantile loss function (see Koenker, 2005). Then , where ϱ(bjτ) is a continuous and increasing function, and ϱ(bjτ) = 0 if and only if bjτ = 0. Accordingly, we can adopt the SIS procedure of Fan and Lv (2008) to rank the significance of predictors on the quantile of Y via the marginal quantile correlation qcorτ{Y, Xj}. However, this marginal approach ignores possible effects from other variables and may yield misleading results when the predictors are correlated. To illustrate this phenomenon, we first introduce the quantile multiple regression model and its associated estimators given below.
Let Y and X = (X1, …, Xp)T be the response and predictors, respectively. Consider a linear quantile model:
| (3) |
where the error term satisfies P(ε < 0|X) =τ. Then, the τth conditional quantile of Y given X is . Without loss of generality, we assume that E(Xj) = 0 and var(Xj) = 1 for all j = 1, …, p. Furthermore, denote fε(u|x) and fY(y|x) as the conditional density of ε and Y given X = x, respectively.
Assuming that the conditional density fY (y|x) exists, we can follow the same procedure as given in Theorem 2 of Angrist et al. (2006) and obtain the coefficient that is the minimizer of the weighted least squares:
where , and for j = 1, ⋯, p. As a result,
where
The term djτ can be viewed as the “bias” of the quantile estimator. It can be considerably large when the components E(w̃τ(X)XjXk) are non-negligible. Thus, QCOR may lead to inaccurate screening results. This motivates us to propose a screening procedure, based on the quantile partial correlation, to reduce the confounding effects from other predictors that are highly related to Xj.
Quantile partial correlation
To reduce the confounding effects, consider X−j = (1, {Xk, k ≠ j}T)T. Then, let and so that . Adopting Li et al.’s (2015) approach, we define the quantile partial correlation (QPCOR) as follows:
| (4) |
where . Based on the result after equation (2.2) on page 247 of Li et al. (2015), we have that , where is a continuous and increasing function of , and if and only if , where
| (5) |
In the general situation, the coefficients and in models (5) and (3) are not equal to each other. However, Lemma A.1 in the Appendix shows that if and only if . Hence, if and only if . Therefore, we use the QPCOR to select relevant variables in our screening procedure.
In general, the estimates of and cannot be obtained when the dimension of X−j is high. To this end, we remove the confounding effects from Xj that are induced by a subset of {k : k ≠ j}, and we denote the resulting set by 𝒮j and name it the conditional set. Then, we propose a screening method via the sequential procedure. In each sequential step, let 𝒮j contain either the union of the previously selected variables and the variables related to Xj or only the previously selected variables, which will be discussed in Section 4. For any arbitrary subset 𝒮 ⊂ {1, …, p}, we denote X𝒮 the subvector of X associated with 𝒮. Accordingly, X𝒮j = (X0, {Xk, k ∈ 𝒮j}T)T with X0 = 1. For the sake of screening, we modify QPCOR given in (4) as
| (6) |
where , and |𝒮| denotes the cardinality of a set 𝒮.
In practice, QCOR and QPCOR are unknown, and we employ the sample estimates of QCOR and QPCOR to study the screening process. These sample estimates are defined as follows. Let be a data set of n random samples from the distribution of (Y, XT)T, where Xi = (Xi1, …, Xip)T. In this paper, we focus on the scenario in which p ≫ n and we sometimes denote p by pn since it can be a function of n. In addition, let Xi,𝒮 be the subvector of Xi for any subset 𝒮. The sample estimate of QCOR in (1) is defined as
| (7) |
where Q̂τ(Y) = inf{y : Fn(y) ≥ τ} is the sample τth quantile of Y1, …, Yn. Additionally, is the empirical distribution function, , and . The sample estimate of QPCOR in (6) is given as
| (8) |
where , and . We next study the asymptotic property of the sample estimate of QPCOR and the screening property of the selected variables via this estimate.
3 Theoretical properties
To use the sample estimate of QPCOR, , given in (8) as a criterion to identify important variables sequentially, we need to establish the uniform convergence of the sample QPCOR to its population counterpart qpcorτ{Y, Xj |X𝒮j}. Let rn = max1≤j≤p |𝒮j| be the maximal cardinality of the subsets 𝒮j (j = 1, ⋯, p) given in the screening procedure, and allow rn to increase with the sample size n. Note that 𝒮j represents every possible conditional set. In addition, let λmax(A) and λmin(A) be the largest and smallest eigenvalues of the symmetric matrix A, respectively, and let ‖a‖ denote the L2 norm for any vector a = (a1, …, ap)T. Then, we make the following assumptions to facilitate the technical proofs, while these assumptions may not be the weakest ones.
-
(C1)
The conditional density fY|X=x (y) of y given X = x satisfies the Lipschitz condition of order 1 and fY|X=x (y) > 0 for any y in a neighborhood of , for 1 ≤ j ≤ p.
-
(C2)
The predictors satisfy: (i) supi,j |Xij | ≤ M1, , and for some positive finite constants M1, M2, M3 and M4;
(ii) For 1 ≤ j ≤ p, there exist positive finite constants m and M such that
Condition (C1) is a standard condition in the literature on quantile regression. Condition (i) in (C2) assumes that the absolute values of the predictors are bounded, which is commonly assumed in high-dimensional analysis, see Wang et al. (2012) and Lee et al. (2014). This assumption can be relaxed to the moment condition given in Li et al. (2012) and Zhu et al. (2011) that there exists a positive constant t0 such that max1≤j≤p E{exp(tXj)} < ∞ for 0 < t ≤ t0. In this case, our theoretical results still hold with some modification to the proofs. To mitigate notational complexity and facilitate mathematical derivations, we assume that covariates are bounded. We also assume that, for each subset 𝒮j used as a conditional set for Xj, the L2 norm of the correlation vector is bounded. Condition (ii) in (C2) is the sparse Riesz condition (Chen and Chen, 2008; Lee et al., 2014), which is used for dealing with a large number of regressors. We next demonstrate the uniform convergence of to its population counterpart, qpcorτ{Y, Xj|X𝒮j}.
Theorem 1
Under Conditions (C1) and (C2), for any C1 > 0, there exist some positive constants C2, C3, and such that, for 0 < κ < 1/2 and rn = Cnω for some 0 ≤ ω < min((1 − 2κ), 2κ) and a positive constant C, we have
| (9) |
when n is sufficiently large.
Remark 1
To handle ultra-high dimensional data, Theorem 1 indicates that we need to have . Accordingly, pn grows with the sample size n at an exponential rate.
To study the screening property via the quantile partial correlation , we consider , which is the set of indices associated with the nonzero coefficients in the true sparse model (3) with nonsparsity size sn = |ℳ*|. Furthermore, we assume that the population QPCORs with nonzero coefficients in ℳ* satisfy the following condition:
-
(C3)
for some 0 < κ < 1/2 and C0 > 0.
In our proposed algorithm, we select variables sequentially by finding the variable with the maximal sample QPCOR and then adding it to the selected active set in each step. Let the resulting active set via the screening procedure be ℳ̂νn such that the sample QPCORs of the selected variables in ℳ̂νn are greater than a threshold. That is,
where νn is a threshold value. The theorem below presents the sure screening property.
Theorem 2
Under the conditions in Theorem 1 and Condition (C3), taking C2, C3, , and κ as given in Theorem 1 and letting with C4 ≤ C0/2, we have that
when n is sufficiently large.
It is worth noting that Theorem 2 indicates that the probability bound for the sure screening property depends on the number of nonzero coefficients sn, but not on the number of covariates pn. It also depends on rn.
In addition to ensuring that relevant variables are selected, controlling the false selection rate is also critical. Ideally, we could assume that and then employ Theorem 1, to find that, with probability tending to one, for any constant C1 > 0. Accordingly, by the choice of νn given in Theorem 2, we obtain model selection consistency,
However, this ideal assumption may not be met in general. Hence, we consider a more practical assumption, for some ς > 0. Under this assumption, for any c > 0, the cardinality of is no greater than and 0 ≤ ω < min((1−2κ), 2κ). Furthermore, on the set
we have
| (10) |
for some constant 0 < C* < ∞. As a result, we obtain the following property which is used to control the size of the selected model.
Proposition 1
Under the conditions in Theorem 1 and Condition (C3), taking C2, C3, , and κ as given in Theorem 1, letting with C4 ≤ C0/2, and assuming for some ς > 0, we have that
for some constant 0 < C* < ∞, when n is sufficiently large.
Remark 2
Proposition 1 indicates that the proposed screening procedure via the quantile partial correlation can reduce the ultra-high dimensionality of the original model to the selected model size with a polynomial order of n. This proposition and Theorem 2 imply that if we choose the first d variables sequentially based on the sample QPCOR with d = [n𝜘+ς+κ−ω/2/ log(n)] for some 𝜘 > 0, then all relevant variables will be selected with high probability. Note that [b] stands for the integer part of b. By assuming ς < 1 + ω/2 − κ and letting 𝜘 = 1 + ω/2 − κ − ς, we also have d = [n/ log(n)], which is used in our numerical analysis and is commonly accepted in screening procedures (Fan and Lv, 2008; He et al., 2013).
4 QPCOR and selection
Applying quantile partial correlation, we first introduce the three screening algorithms. Based on the set of candidate models obtained via the screening procedure, we subsequently use the Bayesian information criterion to select the best model.
4.1 Screening algorithms
In this subsection, we employ QPCOR to propose a quantile screening procedure (QPCS) for selecting variables. For the sake of comparison, we also generalize Cho and Fryzlewicz’s (2012) TCS2 algorithm and Wang’s (2009) FR algorithm from classical mean regression model to quantile regression model. We name them QTCS and QFR, respectively. In developing QPCS, we need to remove the confounding effect from the target variable that is induced by its correlated variables in each step. To this end, we consider a sequential test to identify a confounding subset for each Xj (j = 1, ⋯, p). Let ρjk be the sample correlation coefficient of Xj and Xk. Then, define
and name it the confounding set. A careful choice of mj is important in the high-dimensional setting. For example, if mj is too large, then any vector in Rn may be well approximated by some Xk with k ∈ 𝒮j. We next consider a sequential testing procedure based on the partial correlations along the path to select mj. This allows us to find the smallest subset so that all covariates not in this subset will have a zero partial correlation with Xj. Let 𝕏 = (X1, …, Xn)T be the design matrix and denote , where 𝕏𝒮 is any submatrix of 𝕏. For mj ≥ 1, define the partial correlation as , where . As for mj = 0, is an empty set and . Furthermore, let
which is the Fisher’s Z-transformation considered in Kalisch and Bühlmann (2007) for identifying nodes connected to the variable Xj conditional on a set of other nodes in a Gaussian graph. Then, sequentially select the smallest size, , that satisfies , where z1−α/2 is the threshold of z values with a pre-specified significance level α and . The resulting can help us to determine the size of the selected confounding set, denoted by m̂j. It is worth noting that based on our theoretical condition given in Theorem 1, we have rn = o(n1/2). Thus, m̂j ≤ rn = o(n1/2). We then let m̂j be bounded by c{n/ log(n)}1/2 for some constant c > 0. Afterwards, let if ; m̂j = c{n/ log(n)}1/2, otherwise. In practice, the constant c should not be too large so that the resulting size m̂j is under control. In our numerical analysis, we choose c = 1. Denote the selected confounding set as . The above procedure allows us to find the confounding subset of the j-th variable, and we will use it in the screening algorithm given below.
Algorithm 1 (QPCS)
Start with an empty active set 𝒜(0) = ∅.
Step 1. In the kth step, for given 𝒜(k−1), we update . Then, employ the maximal sample QPCOR to find the variable index j* that satisfies . Update 𝒜(k) = 𝒜(k−1) ∪ {j*}.
Step 2. Repeat step 1 until the cardinality of active set |𝒜(d*)| reaches a prespecified d*.
Step 3. Starting from the k = (d* + 1)th step, we set . In the kth step, find . Update 𝒜(k) = 𝒜(k−1) ∪ {j*}.
Step 4. Repeat step 3 until the cardinality of active set |𝒜(d)| reaches a prespecified value d < n.
In the above algorithm, the conditioning set 𝒮j contains the selected variables up to step d* and a subset of variables with non-negligible correlations identified by the sequential testing procedure.
In linear regression modeling, Cho and Fryzlewicz’s (2012) proposed the TCS2 algorithm and then demonstrated that it usually performs well in comparison with LASSO, SCAD, MCP, FR, and iterative SIS (ISIS, see Fan and Lv, 2008). This inspires us to extend their TCS2 to quantile regression, and we name it QTCS.
Algorithm 2 (QTCS)
Start with an empty active set 𝒜(0) = ∅.
Step 1. In the kth step, for given 𝒜(k−1), let 𝒮j = 𝒜(k−1). Then, find the variable index that has the maximal sample QPCOR such that . If , let j* = j′ and go to step 3.
Step 2. If , then screen the sample QPCOR for all Xj in which . Let , and find .
Step 3. Update 𝒜(k) = 𝒜(k−1) ∪ {j*}.
Step 4. Repeat steps 1–3 until the cardinality of active set, |𝒜(d*)|, reaches a prespecified d* = [C*{n/ log(n)}1/2] for some constant C* > 0.
Step 5. Starting from the k = (d* + 1)th step, repeat steps 1–3 by letting 𝒮j = 𝒜(d*) and in steps 1 and 2, respectively. Repeat the procedure until the cardinality of active set, |𝒜(d)|, reaches a prespecified value d < n.
Based on extensive simulation studies in linear model settings, Wang (2009) indicated that FR is a promising method for variable screening by comparing with LASSO, SCAD, SIS, and ISIS. This motivates us to generalize his forward selection screening algorithm to quantile regression, and name it QFR.
Algorithm 3 (QFR)
Start with an empty active set 𝒜(0) = ∅.
Step 1. In the kth step, for given 𝒜(k−1), let 𝒮j = 𝒜(k−1) for k ≤ d* and 𝒮j = 𝒜(d*) for k > d*. Then, find the variable index that has the maximal sample QPCOR such that . Update 𝒜(k) = 𝒜(k−1) ∪ {j*}.
Step 2. Repeat step 2 until the cardinality of active set, |𝒜(k)|, reaches a prespecified value d < n.
For the sake of comparison, Table 1 summarizes the three algorithms. Without the thresholding step, by replacing the sample QPCOR with the titled correlation defined in Cho and Fryzlewicz (2012) and the residual sum of squares given in Wang (2009), respectively, the QTCS and QFR become TCS2 and FR. To utilize the above three algorithms, we need to specify d* and d. It is worth noting that the thresholding size d* needs to satisfy d* ≤ rn = o(n1/2) (see Algorithm 1). Hence, we consider d* = [C*{n/ log(n)}1/2] for some C* > 0. In addition, choosing a value of C* needs to assure that d* does not exceed d due to the requirement of the screening algorithms. With Remark 3, we have set d = [n/ log(n)]. To meet the requirement d* < d with n = 200 and n = 120 used in our simulation and empirical examples, respectively, we choose C* = 2, which yields good performance in our numerical studies. However, this does not exclude other possible choices that also satisfy this requirement.
Table 1.
Comparison of the QPCS, QTCS and QFR algorithms.
| QPCS | QTCS | QFR | ||
|---|---|---|---|---|
| Initialization | A(0) = ∅ | A(0) = ∅ | A(0) = ∅ | |
| Action | one variable is selected | one variable is selected | one variable is selected | |
| Conditional set Sj for k ≤ d* | A(k−1) or | current set A(k−1) | ||
| Conditional set Sj for k > d* | A(d*) or A(d*) | A(d*) | ||
Since , we have for some constant 0 < C̃ < ∞ and j = 1, ⋯, p. This provides an upper bound for the conditional set of each variable, which is not very large. Otherwise, any vector in Rn can be well-approximated by the variables in this set. A similar consideration can be found in Cho and Fryzlewicz (2012) when they discussed their “conditioning set” 𝒞j. Note that 𝒞j is our confounding set , which is different from the conditional set 𝒮j. In their Assumption 3 of page 598, they consider a bound for the size of the conditioning set 𝒞j such that |𝒞j| ≤ Cnξ with ξ ∈ [0, 2(γ − δ)) for δ ∈ [0, 1/2) and γ ∈ (δ, 1/2). Thus, |𝒞j| = o(n1−2δ) for δ ∈ [0, 1/2). It is of interest to note that, based on the condition given in Theorem 1, our conditional set |𝒮j| ≤ rn = o(n1−2κ) for 0 < κ < 1/2.
Remark 3
From Table 1 and the above discussion, we find that both QPCS and QTCS can prevent more overfitting than QFR. This is because they consider the confounding effect of explanatory variables, while QFR does not take it into account. Although |𝒮j| in QPCS and |𝒞j| (i.e., ) in QTCS have the same order, the confounding set in QPCS is always included in the conditional set 𝒮j in every screening step, while 𝒞j in QTCS may not be always included in 𝒮j. Accordingly, QPCS is likely to reduce more overfitting than QTCS.
Based on the quantile partial correlation, we have introduced three screening algorithms. Although the quantile correlation is not the focus of our paper, one can employ it to propose the sure independent screening procedure for quantile regression. Specifically, we adopt the SIS method of Fan and Lv (2008) by replacing their Pearson correlation with the quantile correlation QCOR. The resulting selected model is
He et al. (2013) also applied the SIS method for model selection in nonparametric quantile regression. In classical mean regression, Fan and Lv (2008) also introduced ISIS for selecting variables. As aforementioned, Wang (2009) and Cho and Fryzlewicz (2012) have demonstrated that their FR and TCS2 algorithms perform well in comparison with ISIS, respectively. Thus, we will focus our numerical comparison of the newly proposed procedure with the corresponding procedures proposed in Wang (2009) and Cho and Fryzlewicz (2012).
4.2 Best subset selection
In the previous subsection, the proposed QPCS algorithm generates a solution path 𝔸 = {𝒜(k), 1 ≤ k ≤ d}, which includes the d selected models 𝒜(1) ⊂ 𝒜(2) ⊂ ⋯ ⊂ 𝒜(d). To find the best model among them, we consider an extended Bayesian information criterion (EBIC) for best subset selection. This criterion has been used for classical mean regression model in high dimensional data analysis (e.g., see Chen and Chen, 2008 and Wang, 2009). As for our quantile regression model, we follow the approach of Lee et al. (2014) and adopt the criterion:
| (11) |
where Cn is a positive constant that diverges along with the sample size n, and
Let k̂ = arg min1≤k≤dEBIC(𝒜(k)), and denote the resulting best model ℳ̂EBIC = 𝒜(k̂). We make the following condition, which corresponds to Condition (A2)(ii) in Lee et al. (2014) and is needed for establishing the consistency of EBIC.
-
(C4)There exist positive finite constants m′ and M′ such that
uniformly for any subset 𝒮 ⊂ {1, …, p} satisfying |𝒮| ≤ C*nς+κ−ω/2.
We next establish the screening consistency of the best model selected by EBIC.
Theorem 3
Under the conditions given in Proposition 1 and Condition (C4), and assuming that ς < 1/2 + ω/2 − κ, , Cn log(n)n(ς+κ−ω/2)−1 = o(1), and E|ε| < ∞, we have P(ℳ* ⊂ ℳ̂EBIC) → 1 as n → ∞.
When Cn = 1, EBIC reduces to the classical BIC (Schwarz (1978)). Recently, Wang and Leng (2009) and Lee et al. (2014) used Cn = log(log(d)) and Cn = log(d), respectively, in their simulation studies when the number of predictors diverged along with the sample size. We use both approaches in our numerical studies. In addition, EBIC can be applied not only to QPCS, but also to QTCS and QFR for best subset selection. It is worth noting that our proposed QPCS algorithm yields a family of nested candidate models, 𝒜(1) ⊂ 𝒜(2) ⊂ ⋯ ⊂ 𝒜(d). Thus, we propose using the model selection criterion EBIC to select the best model. On the other hand, the screening procedure of SIS only produces a single final model, followed by the SCAD or other penalized methods for variable selections.
In addition to classical model selection, another popular variable selection approach is penalization. In other words, by using the selected model 𝒜(d) with d < n from the screening procedure, we can obtain the estimated parameters by minimizing
with respect to parameters β𝒜(d) = (β1,𝒜(d), ⋯, β𝒜(d),𝒜(d))T, where pλ(·) is a penalty function with a regularization parameter λ. In our numerical studies, we consider the LASSO penalty for demonstration, but other penalties such as SCAD and MCP can also be applied. It is worth noting that the penalization method only employs the largest selected model 𝒜(d), while EBIC uses the entire solution path obtained from the screening procedure.
5 Simulation studies
In this section, we conduct simulation studies to compare the finite sample performance of the four screening procedures QPCS, QTCS, QFR, and SIS. We further illustrate the extended BIC approach for best subset selection by using Cn = log(log(d)) and Cn = log(d), respectively, and denote the corresponding methods by EBIC1 and EBIC2. We also compare the EBIC method with the LASSO penalization method after the screening, where the tuning parameter for LASSO is selected by the BIC method. Moreover, we compare our method with the l1 penalization of Belloni and Chernozhukov (2011) and the ISIS-SCAD method of Fan and Lv (2008) in the last example. The tuning parameter for the l1 penalization method is selected by the approach described in Section 2.3 of Belloni and Chernozhukov (2011), and the tuning parameter for the ISIS-SCAD method is selected by the extended BIC as given in the R package ‘SIS’.
To demonstrate the performance of the QPCS, QTCS, QFR, and SIS screening procedures, we present two examples. We consider three quantiles τ = 0.2, 0.5, and 0.8, and all simulation results are based on 200 realizations with n = 200 and p = 1, 000. Moreover, seven measurements are used to assess the screening and selection performance: the rank of selected variables, see Liu et al. (2014), the number of true positive and false positive selections, see Liu et al. (2014), and the number of correct-fitting, over-fitting, and under-fitting selections, see Wang et al. (2007). We next describe these measures in detail.
Rj: the average rank of Xj;
M: the average minimum size of the selected model that contains all the relevant (i.e., true) predictors;
TP: the average number of true positives (i.e., the average number of relevant predictors being correctly selected);
FP: the average number of false positives (i.e., the average number of irrelevant predictors being incorrectly selected);
C: the proportion in which exactly relevant predictors are selected;
O: the proportion in which all relevant predictors and some irrelevant predictors are selected;
I: the proportion in which some relevant predictors not are selected.
Note that the average or proportion used in the above measures is calculated from 200 realizations.
Example 1
We generate the response from Model D considered by Cho and Fryzlewicz (2012) and originally taken from Fan and Lv (2008):
where the predictors X are simulated from N(0, Σ), where Σ ={σij} is a p × p covariance matrix satisfying σii = 1 and σij = ρ, j ≠ i, except that . Thus, X4 is marginally uncorrelated with Y at the population level. To take quantiles into account in regression coefficients, we let β = 2.5(1 + |τ − 0.5|) rather than β = 2.5 given in Cho and Fryzlewicz (2012). The random error ε is generated according to the standard normal distribution and the Laplace distribution. We also let ρ = 0.5 and ρ = 0.95 represent a moderate correlation and a high correlation, respectively. For the sake of saving space, we report the results for ρ = 0.5 in Tables S1–S3 of the Supplemental Materials.
Table 2 reports Rj (j = 1, …, 4) and M for p = 1, 000 and ρ = 0.95. When the predictors are highly correlated (ρ = 0.95), SIS cannot successfully identify all four relevant predictors. In Table S1 of the Supplemental Materials, we find that even under moderate correlation (ρ = 0.5), the SIS approach fails to identify the fourth predictor, which is marginally uncorrelated with Y (see the large values of R4 in Table S1).
Table 2.
The average rank of the relevant predictors Rj and the average number of the minimum size of the selected model M with p = 1, 000 and ρ = 0.95 in Example 1.
| Standard Normal | Laplace Distribution | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| τ | Method | R1 | R2 | R3 | R4 | M | R1 | R2 | R3 | R4 | M |
| QPCS | 2.205 | 1.945 | 2.060 | 3.825 | 4.015 | 4.060 | 2.520 | 4.885 | 8.845 | 13.500 | |
| 0.2 | QTCS | 4.050 | 3.570 | 4.120 | 7.180 | 8.540 | 6.970 | 14.290 | 8.500 | 19.595 | 33.530 |
| QFR | 5.215 | 4.150 | 4.560 | 275.955 | 276.490 | 9.600 | 8.980 | 12.505 | 480.455 | 483.040 | |
| SIS | 343.580 | 327.575 | 330.285 | 499.975 | 682.580 | 304.795 | 302.405 | 307.890 | 499.880 | 668.845 | |
| QPCS | 2.125 | 2.140 | 2.090 | 3.795 | 4.095 | 2.275 | 3.375 | 3.565 | 4.040 | 6.725 | |
| 0.5 | QTCS | 4.270 | 3.875 | 3.825 | 17.550 | 18.615 | 4.340 | 5.405 | 4.730 | 36.810 | 39.565 |
| QFR | 4.470 | 4.335 | 3.995 | 345.760 | 345.875 | 5.000 | 5.455 | 8.110 | 487.020 | 487.545 | |
| SIS | 345.635 | 345.250 | 337.305 | 500.985 | 691.675 | 320.695 | 310.385 | 319.025 | 510.840 | 685.620 | |
| QPCS | 1.935 | 2.085 | 2.145 | 3.855 | 4.010 | 5.015 | 3.835 | 3.180 | 11.195 | 14.520 | |
| 0.8 | QTCS | 3.775 | 3.905 | 4.115 | 6.540 | 7.615 | 15.445 | 7.470 | 8.720 | 32.495 | 46.225 |
| QFR | 4.410 | 4.375 | 4.150 | 252.240 | 252.315 | 9.425 | 7.945 | 12.175 | 459.700 | 459.715 | |
| SIS | 338.835 | 335.465 | 346.660 | 501.060 | 686.38 | 311.050 | 308.080 | 320.905 | 492.905 | 662.565 | |
The aim of the QFR method is to remove the effect from the predictors identified in the previous steps. It performs reasonably well for the moderate correlation (see small values of Rj and M in Table S1). When ρ = 0.95, however, QFR is not capable of identifying the fourth predictor. This finding is not surprising since FR is not designed to remove high collinearity (i.e., confounding) effect. As for the QPCS and QTCS screening procedures, Table 2 indicates that both are able to control the effect of collinearity and identify relevant variables. However, QPCS is uniformly superior to QTCS in all measures. This is because QPCS can prevent more overfitting than QTCS by removing the confounding effect in every sequential step.
After examining screening performance, we next evaluate best subset selection. Since SIS does not show strong performance, we only consider variable selection via other three screening procedures. Table 3 reports TP and FP calculated under three selection methods, EBIC1, EBIC2 and LASSO, for p = 1, 000 and ρ = 0.95. Furthermore, Table 4 correspondingly presents the percentages of correct-fitting (C), over-fitting (O), and incorrect-fitting (I) for p = 1, 000 and ρ = 0.95.
Table 3.
Variable selection results of TP and FP for the extended BIC and LASSO with p = 1, 000 and ρ = 0.95 in Example 1.
| QPCS | PTCS | QFR | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| τ | EBIC1 | EBIC2 | LASSO | EBIC1 | EBIC2 | LASSO | EBIC1 | EBIC2 | LASSO | |
| Standard Normal | ||||||||||
| 0.2 | TP | 4.000 | 4.000 | 2.035 | 3.945 | 3.830 | 2.035 | 3.385 | 3.210 | 2.020 |
| FP | 10.405 | 0.430 | 0.345 | 10.465 | 2.600 | 1.375 | 11.780 | 4.680 | 1.895 | |
| 0.5 | TP | 4.000 | 3.980 | 2.490 | 3.790 | 3.200 | 2.500 | 3.265 | 2.470 | 2.330 |
| FP | 9.605 | 0.145 | 2.300 | 10.365 | 1.895 | 3.555 | 11.105 | 2.680 | 4.750 | |
| 0.8 | TP | 4.000 | 4.000 | 2.995 | 3.955 | 3.785 | 2.685 | 3.350 | 3.085 | 2.410 |
| FP | 10.580 | 0.305 | 0.350 | 10.585 | 2.415 | 2.030 | 11.985 | 4.585 | 2.875 | |
| Laplace Distribution | ||||||||||
| 0.2 | TP | 3.960 | 3.740 | 1.930 | 3.605 | 2.680 | 1.970 | 2.905 | 1.970 | 1.955 |
| FP | 9.740 | 0.765 | 1.125 | 10.785 | 2.100 | 2.145 | 11.660 | 3.155 | 2.750 | |
| 0.5 | TP | 3.960 | 3.830 | 2.460 | 3.580 | 2.515 | 2.335 | 3.010 | 2.455 | 2.430 |
| FP | 6.625 | 0.150 | 2.650 | 8.685 | 1.360 | 2.865 | 10.215 | 1.675 | 3.250 | |
| 0.8 | TP | 3.920 | 3.625 | 2.875 | 3.525 | 2.690 | 2.560 | 2.910 | 2.105 | 1.880 |
| FP | 10.205 | 0.760 | 1.205 | 10.825 | 2.320 | 2.895 | 11.985 | 3.030 | 3.610 | |
Table 4.
Variable selection results of C, O, and I for the extended BIC and LASSO with p = 1, 000 and ρ = 0.95 in Example 1.
| QPCS | QTCS | QFR | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| τ | EBIC1 | EBIC2 | LASSO | EBIC1 | EBIC2 | LASSO | EBIC1 | EBIC2 | LASSO | |
| Standard Normal | ||||||||||
| C | 0.000 | 0.695 | 0.000 | 0.000 | 0.125 | 0.000 | 0.000 | 0.000 | 0.000 | |
| 0.2 | O | 1.000 | 0.305 | 0.000 | 0.950 | 0.760 | 0.000 | 0.400 | 0.390 | 0.000 |
| I | 0.000 | 0.000 | 1.000 | 0.050 | 0.115 | 1.000 | 0.600 | 0.610 | 1.000 | |
| C | 0.000 | 0.845 | 0.000 | 0.000 | 0.155 | 0.000 | 0.000 | 0.000 | 0.000 | |
| 0.5 | O | 1.000 | 0.135 | 0.000 | 0.795 | 0.485 | 0.000 | 0.300 | 0.250 | 0.000 |
| I | 0.000 | 0.020 | 1.000 | 0.205 | 0.360 | 1.000 | 0.700 | 0.750 | 1.000 | |
| C | 0.000 | 0.765 | 0.000 | 0.000 | 0.130 | 0.000 | 0.000 | 0.000 | 0.000 | |
| 0.8 | O | 1.000 | 0.235 | 0.000 | 0.965 | 0.760 | 0.000 | 0.350 | 0.335 | 0.000 |
| I | 0.000 | 0.000 | 1.000 | 0.035 | 0.110 | 1.000 | 0.650 | 0.665 | 1.000 | |
| Laplace Distribution | ||||||||||
| C | 0.000 | 0.530 | 0.000 | 0.000 | 0.110 | 0.000 | 0.000 | 0.000 | 0.000 | |
| 0.2 | O | 0.960 | 0.290 | 0.000 | 0.715 | 0.340 | 0.000 | 0.050 | 0.045 | 0.000 |
| I | 0.040 | 0.180 | 1.000 | 0.285 | 0.550 | 1.000 | 0.950 | 0.955 | 1.000 | |
| C | 0.040 | 0.805 | 0.000 | 0.010 | 0.160 | 0.000 | 0.000 | 0.005 | 0.000 | |
| 0.5 | O | 0.930 | 0.085 | 0.000 | 0.605 | 0.195 | 0.000 | 0.080 | 0.020 | 0.000 |
| I | 0.070 | 0.110 | 1.000 | 0.385 | 0.645 | 1.000 | 0.920 | 0.795 | 1.000 | |
| C | 0.000 | 0.470 | 0.000 | 0.000 | 0.045 | 0.000 | 0.000 | 0.000 | 0.000 | |
| 0.8 | O | 0.935 | 0.340 | 0.005 | 0.650 | 0.370 | 0.000 | 0.080 | 0.080 | 0.000 |
| I | 0.065 | 0.190 | 0.095 | 0.350 | 0.585 | 1.000 | 0.920 | 0.920 | 1.000 | |
We observe that the TP values for the LASSO method are much smaller than 4, which is the number of true predictors in the model. In addition, the FP values are not large since LASSO tends to select a small number of variables among highly correlated covariates. Moreover, the proportion of incorrect fitting is most often 1 for the LASSO method. Hence, LASSO is not an effective method for variable selection in this context. On the contrary, EBIC1 tends to exhibit overfitting, as evidenced by FPs twice as large as true number of predictors. In addition, the proportion of overfitting is often very large as shown in Table 4, except for QFR. This is because QFR tends to fit incorrectly under this scenario, which is consistent with the findings in Table 3. In comparison with EBIC1, EBIC2 yields much better FP with slightly weaker TP. In addition, EBIC2 is uniformly superior to LASSO in both TP and FP measures. Consequently, EBIC2 is a favorable choice. Moreover, Tables 3 and 4 indicate that QPCS-EBIC2 outperforms its competitors in best subset selection. Finally, it is not surprising that the performance of all screening and selection procedures deteriorates when ρ becomes large, or ε has a heavy-tailed distribution. To save space, we report the additional simulation results for p = 2, 000 in the Supplementary Materials (see Tables S4–S6.) These simulations yield the same conclusion as that of p = 1, 000.
Example 2
We generate the response from the model:
where β, ρ, and X are defined as in Example 1 except that σ5j = σi5 = 0 such that X5 is uncorrelated with Xj (j ≠ 5). In addition, X5 has a small contribution to Y. This model is also considered by Cho and Fryzlewicz (2012) and Fan and Lv (2008). To save space, we report the results for ρ = 0.5 in Tables S7–S9 of the Supplemental Materials.
Tables 5 and S7 report Rj (j = 1, …, 4) and M for ρ = 0.95 and 0.5, respectively. From Table S7 in the Supplementary Materials, we observe that SIS gives large values for R4, R5 and M for ρ = 0.5. Hence, SIS is not able to identify variables X4 and X5 in this case. When ρ = 0.95, SIS is able to identify X5 due to its lack of correlation with other variables, whereas it fails to identify variables X1 to X4 since they are highly correlated with others. As a result, QFR, QPCS and QTCS outperform SIS. For these three procedures, we obtain the same conclusion as in Example 1. Both QPCS and QTCS are superior to QFR, and QPCS performs the best. Tables 6 and 7 summarize the results of subset selection, by presenting TP, FP, and the proportions of correct-fitting (C), over-fitting (O) and incorrect-fitting (I) calculated via EBIC1, EBIC2 and LASSO. Both tables show that QPCS-EBIC2 performs the best. The same findings emerge from Tables S8 and S9 of the Supplementary Materials when ρ = 0.5.
Table 5.
The average rank of the relevant predictors Rj and the average minimum size of the selected model M with p = 1, 000 and ρ = 0.95 in Example 2.
| Standard normal | Laplace Distribution | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| τ | Method | R1 | R2 | R3 | R4 | R5 | M | R1 | R2 | R3 | R4 | R5 | M |
| ρ = 0.95 | |||||||||||||
| QPCS | 2.720 | 2.560 | 2.685 | 4.560 | 2.655 | 5.060 | 6.380 | 4.625 | 5.990 | 13.455 | 3.155 | 21.485 | |
| 0.2 | QTCS | 5.020 | 4.870 | 4.870 | 8.450 | 1.010 | 9.450 | 7.445 | 15.180 | 18.815 | 20.085 | 1.120 | 42.720 |
| QFR | 5.775 | 5.175 | 5.340 | 235.085 | 1.010 | 235.185 | 10.605 | 8.685 | 14.225 | 444.770 | 1.145 | 446.630 | |
| SIS | 388.100 | 373.320 | 377.390 | 491.820 | 1.015 | 700.930 | 330.035 | 330.145 | 340.550 | 509.575 | 1.830 | 689.145 | |
| QPCS | 2.765 | 2.805 | 2.825 | 4.510 | 2.495 | 5.19 | 2.950 | 4.540 | 2.970 | 5.140 | 2.580 | 7.420 | |
| 0.5 | QTCS | 4.580 | 5.655 | 5.235 | 20.290 | 1.045 | 21.415 | 6.765 | 12.340 | 6.870 | 26.250 | 1.010 | 35.080 |
| QFR | 5.650 | 5.315 | 6.165 | 358.660 | 1.045 | 358.77 | 9.715 | 6.085 | 9.905 | 474.955 | 1.010 | 475.385 | |
| SIS | 342.660 | 326.660 | 330.650 | 505.585 | 1.015 | 686.76 | 359.165 | 349.170 | 350.990 | 499.660 | 5.920 | 698.655 | |
| QPCS | 2.775 | 2.785 | 2.790 | 4.645 | 2.475 | 5.200 | 6.510 | 3.325 | 3.980 | 16.295 | 5.235 | 23.035 | |
| 0.8 | QTCS | 4.810 | 5.040 | 4.625 | 7.055 | 1.015 | 8.145 | 6.945 | 18.015 | 9.310 | 19.055 | 1.065 | 36.94 |
| QFR | 5.755 | 5.740 | 5.290 | 214.880 | 1.015 | 214.965 | 15.905 | 15.115 | 10.305 | 461.625 | 1.095 | 468.02 | |
| SIS | 383.340 | 385.060 | 383.290 | 499.870 | 5.855 | 713.315 | 347.490 | 344.295 | 375.930 | 518.075 | 1.045 | 715.87 | |
Table 6.
Variable selection results of TP and FP for the extended BIC and LASSO with p = 1, 000 and ρ = 0.95 in Example 2.
| QPCS | QTCS | QFR | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| τ | EBIC1 | EBIC2 | LASSO | EBIC1 | EBIC2 | LASSO | EBIC1 | EBIC2 | LASSO | |
| Standard Normal | ||||||||||
| 0.2 | TP | 5.000 | 5.000 | 2.995 | 4.920 | 4.785 | 2.995 | 4.340 | 4.135 | 2.995 |
| FP | 9.420 | 0.450 | 0.580 | 10.085 | 2.765 | 2.040 | 10.905 | 4.680 | 2.825 | |
| 0.5 | TP | 5.000 | 4.965 | 3.495 | 4.800 | 4.075 | 3.500 | 4.220 | 3.255 | 3.220 |
| FP | 8.550 | 0.250 | 2.260 | 9.655 | 1.755 | 2.385 | 10.496 | 2.555 | 2.605 | |
| 0.8 | TP | 5.000 | 4.995 | 3.970 | 4.975 | 4.810 | 3.995 | 4.370 | 4.120 | 3.985 |
| FP | 9.380 | 0.525 | 0.610 | 9.735 | 2.535 | 2.425 | 10.995 | 4.445 | 3.310 | |
| Laplace Distribution | ||||||||||
| 0.2 | TP | 4.910 | 4.495 | 2.890 | 4.635 | 3.685 | 2.855 | 3.885 | 3.030 | 2.860 |
| FP | 9.550 | 0.885 | 1.480 | 9.935 | 1.865 | 2.645 | 11.080 | 2.995 | 3.500 | |
| 0.5 | TP | 4.950 | 4.605 | 3.475 | 4.640 | 3.445 | 3.060 | 3.985 | 2.800 | 2.405 |
| FP | 6.580 | 0.205 | 2.625 | 8.480 | 1.135 | 2.860 | 9.400 | 1.325 | 3.145 | |
| 0.8 | TP | 4.905 | 4.570 | 3.845 | 4.600 | 3.850 | 3.825 | 3.875 | 3.205 | 2.825 |
| FP | 9.570 | 1.025 | 3.975 | 9.770 | 2.045 | 3.975 | 11.145 | 2.945 | 3.915 | |
Table 7.
Variable selection results of C, O, and I for the extended BIC and LASSO with p = 1, 000 and ρ = 0.95 in Example 2.
| QPCS | QTCS | QFR | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| τ | EBIC1 | EBIC2 | LASSO | EBIC1 | EBIC2 | LASSO | EBIC1 | EBIC2 | LASSO | |
| Standard Normal | ||||||||||
| C | 0.000 | 0.675 | 0.005 | 0.000 | 0.150 | 0.000 | 0.000 | 0.000 | 0.000 | |
| 0.2 | O | 1.000 | 0.325 | 0.000 | 0.925 | 0.720 | 0.000 | 0.355 | 0.335 | 0.000 |
| I | 0.000 | 0.000 | 0.995 | 0.075 | 0.130 | 1.000 | 0.645 | 0.665 | 1.000 | |
| C | 0.000 | 0.775 | 0.025 | 0.000 | 0.175 | 0.000 | 0.000 | 0.010 | 0.000 | |
| 0.5 | O | 1.000 | 0.190 | 0.165 | 0.815 | 0.455 | 0.000 | 0.265 | 0.210 | 0.000 |
| I | 0.000 | 0.035 | 0.810 | 0.185 | 0.370 | 1.000 | 0.735 | 0.780 | 1.000 | |
| C | 0.000 | 0.615 | 0.290 | 0.000 | 0.155 | 0.000 | 0.000 | 0.005 | 0.000 | |
| 0.8 | O | 1.000 | 0.380 | 0.060 | 0.975 | 0.745 | 0.000 | 0.390 | 0.365 | 0.000 |
| I | 0.000 | 0.005 | 0.650 | 0.025 | 0.100 | 1.000 | 0.610 | 0.630 | 1.000 | |
| Laplace Distribution | ||||||||||
| C | 0.000 | 0.420 | 0.000 | 0.000 | 0.125 | 0.000 | 0.000 | 0.005 | 0.000 | |
| 0.2 | O | 0.920 | 0.345 | 0.010 | 0.730 | 0.345 | 0.000 | 0.065 | 0.055 | 0.000 |
| I | 0.080 | 0.235 | 0.990 | 0.270 | 0.530 | 1.000 | 0.935 | 0.940 | 1.000 | |
| C | 0.000 | 0.735 | 0.000 | 0.000 | 0.165 | 0.000 | 0.000 | 0.020 | 0.000 | |
| 0.5 | O | 0.960 | 0.115 | 0.145 | 0.700 | 0.175 | 0.000 | 0.100 | 0.025 | 0.000 |
| I | 0.040 | 0.150 | 0.855 | 0.300 | 0.660 | 1.000 | 0.900 | 0.955 | 1.000 | |
| C | 0.000 | 0.350 | 0.110 | 0.000 | 0.115 | 0.000 | 0.000 | 0.000 | 0.000 | |
| 0.8 | O | 0.925 | 0.440 | 0.045 | 0.700 | 0.375 | 0.000 | 0.050 | 0.040 | 0.000 |
| I | 0.075 | 0.210 | 0.845 | 0.300 | 0.510 | 1.000 | 0.950 | 0.960 | 1.000 | |
Example 3
In the first two examples, we have demonstrated the performance of the proposed variable screening procedure. In this example, we compare our recommended method QPCS-EBIC2 with other two methods, the l1 penalization and ISIS-SCAD. We use the “SIS” R package to implement ISIS-SCAD, in which it first implements the Iterative Sure Independence Screening, and then fits the final regression model by using the SCAD penalty. In short, we denote these three methods as QPCS, l1, and ISIS, respectively. We simulate data from the same data generating process given in Example 1 with p = 1, 000. Since ISIS is designed for mean regression models, we only consider τ = 0.5 for fair comparison.
Table 8 reports TP and FP, and Table 9 presents the percentages of correct-fitting (C), over-fitting (O), and incorrect-fitting (I). For the l1 penalization method, we observed that both of the true positive and false positive values are very small when ρ = 0.95, since it only selects one or zero variable from highly correlated covariates. For ρ = 0.5 and 0.05, however, it has very large false positives. It is also worth noting that l1 has very large portions of incorrect fitting even at the moderate correlation of ρ = 0.5. This is because it often has missed the fourth variable, X4, which is highly correlated with the other three variables. As a result, l1 can be seriously affected by the correlation and its performance deteriorates as the correlation becomes larger. As for ISIS, its performance at ρ = 0.95 is worse than that at ρ = 0.5 and 0.05. In addition, it has larger false positive values and its correct-fitting rates close to zero when ρ = 0.95. In contrast to l1 and ISIS, Tables 8 and 9 indicate that QPCS has the best performance in all cases. Specifically, its correct-fitting percentages are more than 80% even at ρ = 0.95, the numbers of true positives are close to 4, and the numbers of false positives are small. It is of interest that QPCS performs slightly better for the Laplace error distribution than for the Normal error distribution when ρ = 0.05 and 0.5. This finding may be related to the fact that at τ = 0.5 the parameter estimate from quantile regression is the MLE under the Laplace distribution.
Table 8.
Variable selection results of TP and FP for the QPCS, l1, and ISIS methods with p = 1, 000 and τ = 0.5 in Example 3.
| ρ = 0.95 | ρ = 0.50 | ρ = 0.05 | |||||||
|---|---|---|---|---|---|---|---|---|---|
| QPCS | l1 | ISIS | QPCS | l1 | ISIS | QPCS | l1 | ISIS | |
| Standard Normal | |||||||||
| TP | 3.980 | 0.425 | 3.825 | 4.000 | 3.065 | 4.000 | 4.000 | 4.000 | 4.000 |
| FP | 0.145 | 0.350 | 4.950 | 0.145 | 45.365 | 1.035 | 0.125 | 16.815 | 1.150 |
| Laplace Distribution | |||||||||
| TP | 3.830 | 0.435 | 3.710 | 4.000 | 3.005 | 3.985 | 4.000 | 4.000 | 4.000 |
| FP | 0.150 | 0.280 | 4.905 | 0.015 | 39.110 | 1.860 | 0.015 | 16.585 | 1.765 |
Table 9.
Variable selection results of C, O, and I for the QPCS, l1, and ISIS methods with p = 1, 000 and τ = 0.5 in Example 3.
| ρ = 0.95 | ρ = 0.50 | ρ = 0.05 | |||||||
|---|---|---|---|---|---|---|---|---|---|
| QPCS | l1 | ISIS | QPCS | l1 | ISIS | QPCS | l1 | ISIS | |
| Standard Normal | |||||||||
| C | 0.845 | 0.000 | 0.070 | 0.875 | 0.000 | 0.385 | 0.890 | 0.000 | 0.355 |
| O | 0.135 | 0.000 | 0.855 | 0.125 | 0.065 | 0.615 | 0.110 | 1.000 | 0.645 |
| I | 0.020 | 1.000 | 0.075 | 0.000 | 0.935 | 0.000 | 0.000 | 0.000 | 0.000 |
| Laplace Distribution | |||||||||
| C | 0.805 | 0.000 | 0.055 | 0.985 | 0.000 | 0.195 | 0.965 | 0.000 | 0.180 |
| O | 0.085 | 0.000 | 0.870 | 0.015 | 0.005 | 0.800 | 0.035 | 1.000 | 0.820 |
| I | 0.110 | 1.000 | 0.075 | 0.000 | 0.995 | 0.005 | 0.000 | 0.000 | 0.000 |
Upon an anonymous referee’s suggestion, we further compare QPCS with the SCAD penalization (Wang et al., 2012) and l1 when the error is the t-distribution with three degrees of freedom. The tuning parameter for the SCAD method is selected by PBIC as given in the R package ‘rqPen’. The results demonstrate that QPCS outperforms l1 and SCAD. To save space, we report the simulation results in Case 1 of Example S3 in the Supplemental Materials. Inspired by an anonymous referee’s comment, we have conducted a simulation experiment using a block diagonal covariance matrix. The detailed descriptions of simulation settings and the results are given in Case 2 of Example S3 in the Supplemental Materials. The numerical results also demonstrate the superiority of QPCS in comparing with l1 and ISIS.
Remark 4
In our simulation studies, we assume that the covariance matrix of covariates is exchangeable (i.e., compound symmetry), except for Case 2 of Example 3. Hence, for a given correlation coefficient ρ and the subset Sj, the largest and the smallest eigenvalues of the exchangeable covariance matrix ΣSj are (1 − ρ) + ρ|Sj| and 1 − ρ, respectively. In our proposed algorithm, we require that |Sj | = o(n1/2). As a result, the maximal eigenvalue of ΣSj does not satisfy the boundedness condition given in (C2)(ii), when |Sj| is allowed to diverge with the sample size n. Although Condition (C2)(ii) is not satisfied in this scenario, our proposed method shows good performance due to the fact that |Sj| is often small (much smaller than n) in practice. It is worth noting that this can be an interesting future research subject to relax Condition (C2)(ii) so that the proposed method is applicable for a wide variety of covariance structures.
6 Application
In this section, we apply the proposed methods to gene expression data that was used by Scheetz et al. (2006) for investigating gene regulation in the mammalian eye and identifying genetic variations relevant to human eye disease. The dataset contains gene expression values of 31,042 probe sets on 120 rats. The expression levels of genes are analyzed on a log scale with base 2. The response variable is the expression of gene TRIM32 (probe 1389163 at), which is associated with human hereditary diseases of the retina. The purpose of this study is to analyze how the response variable depends on the expression of other genes. Before applying the screening method, we adopt the preprocessing procedure of Scheetz et al. (2006) to first remove each probe for which the maximum expression among the 120 rats is less than the 25th percentile of the entire sample of expression values, and then remove any probe for which the range of the expression among 120 rats is less than 2. As a result, there are 18,958 probes left in our analysis. Following the approach of Wang et al. (2012) and Lee et al. (2014), we subsequently select 3,000 genes with the largest variance in expression value, and then select the top 300 gene expression values in a ranking of their (absolute value) correlation with the response variable. For further illustration, we also consider the top 400, 500, and 800 gene expression values. Afterwards, we apply our proposed method to identify relevant genes for the response variable at quantiles τ = 0.3, 0.5, and 0.7 as in Wang et al. (2012). Note that Lee et al. (2014) considered τ = 0.25, 0.5, and 0.75.
To assess the finite sample performance, we consider 50 random partitions. For each partition, we randomly divide the data into a training dataset with 80 observations and a testing dataset with 40 observations. From the training dataset, we conduct screening and subset selection, and then fit the quantile regression model with the selected predictors. Subsequently, we employ the resulting quantile regression estimators and the testing data to compute the prediction error . A smaller value of the prediction error indicates better performance. In simulation studies, we find that SIS does not perform satisfactory, and EBIC1 and LASSO are inferior to EBIC2. As a result, we only employ three proposed screening procedures QPCS, QTCS, and QFR to screen predictors and one selection criterion EBIC2 for best subset selection. The resulting three methods are denoted QPCS-EBIC2, QTCS-EBIC2, and QFR-EBIC2, respectively.
For p = 300, Wang et al. (2012)’s methods revealed that the averaged number of relevant predictors ranges from 9.08 to 21.66 and the averaged prediction errors are between 1.30 and 1.82 across the three quantiles. Recently, Lee et al. (2014) found that their method not only obtains relevant predictor sizes of 2.24, 2.16, and 1.16 when τ = 0.25, τ = 0.5, and τ = 0.75, respectively, but also yields comparable prediction errors of 1.42, 1.64, and 1.30 accordingly. By applying our proposed methods, Table 10 shows that the size selected via QPCS-EBIC2, QTCS-EBIC2, and QFR-EBIC2 ranges from 1.68 to 2.50 across the three quantiles. In addition, the resulting PEs are between 0.502 to 1.016, and they are smaller than those values obtained via the approaches of Wang et al. (2012) and Lee et al. (2014). Among our three proposed methods, QPCS-EBIC2 is superior to QTCS-EBIC2 and QFR-EBIC2 in terms of average prediction error measure, although its sizes are slightly larger than those of QTCS-EBIC2 when τ = 0.5 and 0.7. As p increases to 400, 500, and 800, however, QPCS-EBIC2 has the smallest values in both size and PE across all three quantiles. This finding is consistent with simulation results. In sum, the proposed quantile partial correlation screening algorithm can be considered for quantile regression selection with high dimensional data.
Table 10.
The average number of selected variables (Size) and its standard error as shown in the parentheses and the average prediction errors (PE) and its standard error as shown in the parentheses for the three screening methods with EBIC2 at τ = 0.3, 0.5, 0.7 and p = 300, 400, 500, 800 among the 50 random partitions.
| τ = 0.3 | τ = 0.5 | τ = 0.7 | |||||
|---|---|---|---|---|---|---|---|
| Size | PE | size | PE | Size | PE | ||
| QPCS1-EBIC2 | 1.86(0.130) | 0.502(0.053) | 1.96(0.201) | 0.966(0.095) | 1.80(0.178) | 0.845(0.091) | |
| p = 300 | QPCS2-EBIC2 | 2.38(0.156) | 0.545(0.077) | 1.92(0.156) | 1.016(0.109) | 1.68(0.135) | 0.891(0.080) |
| QFR-EBIC2 | 2.50(0.207) | 0.507(0.053) | 2.24(0.136) | 0.985(0.113) | 2.02(0.175) | 0.877(0.108) | |
| QPCS1-EBIC2 | 1.64(0.120) | 0.552(0.051) | 1.98(0.177) | 0.839(0.068) | 1.70(0.137) | 0.715(0.067) | |
| p = 400 | QPCS2-EBIC2 | 2.66(0.182) | 0.571(0.038) | 2.06(0.122) | 0.877(0.069) | 1.78(0.122) | 0.957(0.094) |
| QFR-MBIC2 | 2.94(0.190) | 0.584(0.058) | 2.52(0.146) | 0.985(0.060) | 2.52(0.135) | 0.799(0.075) | |
| QPCS1-EBIC2 | 1.94(0.174) | 0.558(0.051) | 1.68(0.157) | 0.684(0.055) | 1.70(0.167) | 0.811(0.114) | |
| p = 500 | QPCS2-EBIC2 | 2.38(0.140) | 0.574(0.047) | 2.26(0.158) | 1.072(0.087) | 1.94(0.174) | 1.128(0.090) |
| QFR-EBIC2 | 3.48(0.237) | 0.600(0.061) | 2.26(0.123) | 0.887(0.069) | 2.08(0.121) | 1.028(0.094) | |
| QPCS1-EBIC2 | 1.92(0.219) | 0.665(0.081) | 1.64(0.145) | 0.647(0.053) | 1.70(0.194) | 0.834(0.089) | |
| p = 800 | QPCS2-EBIC2 | 2.80(0.202) | 0.670(0.081) | 2.16(0.172) | 1.020(0.070) | 2.40(0.194) | 1.099(0.118) |
| QFR-EBIC2 | 3.62(0.269) | 0.693(0.082) | 2.80(0.162) | 0.783(0.050) | 2.54(0.132) | 0.991(0.018) | |
7 Discussion
In sparse ultra-high dimensional quantile regression, we introduce three algorithms, QPCS, QTCS, and QFR, that use quantile correlation and quantile partial correlation to screen explanatory variables. We then employ an extended BIC for model selection. Simulation results demonstrate that the QPCS algorithm supports our theoretical findings. In addition, we find that QPCS performs well in the following settings: (1) highly correlated covariates, (2) ultra-high dimension, (3) covariates being either marginally uncorrelated or weakly correlated with the response, and (4) heavy-tailed errors. Moreover, our simulation results show that it is superior to LASSO, SCAD, SIS, and ISIS-SCAD.
To broaden the usefulness of QPCS, we discuss some extensions for future research in variable screening. There are three possible avenues. The first one is to extend quantile correlation and quantile partial correlation to various quantile regression models such as single-index quantile regression. We have conducted simulation studies by changing the conditional quantile function in Example 1 to its exponential form. The results, not presented here, show that QPCS still performs well under this setting.
The second avenue is considering an alternative correlation measure used in the QPCS algorithm. In simple linear regression, it is known that the square of the coefficient of correlation is the same as the coefficient of determination. In multiple linear regression, Kutner et al. (2005) indicated that the square of the partial correlation is the same as the coefficient of partial determination. In addition, Nagelkerke (1991) proposed a general definition of the coefficient of determination via the log-likelihood function of the response variable. Accordingly, as long as the likelihood function (or its related version, such as partial likelihood or quasi-likelihood) is available for any specific regression model and the maximum likelihood estimators of the regression parameters are also attainable, one can replace the quantile correlation and quantile partial correlation used in the QPCS algorithm by their corresponding coefficient of determination and coefficient of partial determination. This approach can be used for various regression models, e.g., generalized linear models, extreme value regression models, and parametric survival models.
Instead of the determination measure, the third avenue for future research considers the residual sum of squares. This is particularly useful for regression models that have no log-likelihood function. In linear regression, it can be easily shown that maximizing correlation is the same as minimizing the residual sum of squares. Analogously, maximizing partial correlation is the same as minimizing the partial residual sum of squares. It is known that the residual sum of squares is the objective function of regression estimators based on the L2-norm distance. As a result, the partial residual sum of squares is the difference between the two nested objective functions, which is namely the partial objective function. In general, the objective function can be a distance metric such as the Lp-norm distance or another robust function. This motivates us to replace the quantile correlation and quantile partial correlation used in the QPCS algorithm by the objective function and partial objective function, respectively. This approach can be used for many regression models such as generalized additive models, semiparametric models, and robust regression models. In sum, the above three avenues can shed light on areas of future research that warrant thorough investigation and study.
Supplementary Material
Acknowledgments
Ma’s research was supported by NSF grant DMS 1306972 and Hellman Fellowship. Li’s research was supported by NSF grant DMS 1512422 and NIDA, NIH grants P50 DA036107 and P50 DA039838. The content is solely the responsibility of the authors and does not necessarily represent the official views of NSF, NIDA or NIH. The authors are grateful to the Editor, the Associate Editor, and three anonymous reviewers for their constructive comments that helped us improve the article substantially.
Appendix
Before proving the three theorems and one proposition, we present the following five lemmas. Lemma A.1 shows the relationship between and for j = 1, ⋯, p. Lemmas A.2 and A.3 are used in the proofs of Lemmas A.4 and A.5, while Lemmas A.4 and A.5 are needed in the proof of Theorem 1. The proofs of these five lemmas are given in the Supplemental Materials. For the sake of convenience, we denote limn→∞an/bn = c with c > 0 and limn→∞an/bn = 1 by an ≍ bn and an ~ bn, respectively, for any positive series an and bn. In addition, for any matrix , denote |A| = max1≤i≤s,1≤j≤t |Aij |. Moreover, in the following lemmas, we assume that 0 < κ < 1/2, and rn = Cnω for some 0 ≤ ω < min((1 − 2κ), 2κ) and a positive constant C, as stated in Theorem 1.
Lemma A.1
Assume that is the unique minimizer of E[ρτ(Y − β0τ − β1τX1 − ⋯ − βpτ Xp)], and, for given 1 ≤ j ≤ p, and are unique minimizers of and E[ρτ (Y* − β0τ − Xjβjτ)], respectively, where . Then we have if and only if .
Lemma A.2
Under Condition (C2) and the assumption n−1δn = O(1), for every 1 ≤ j ≤pn and c1 > 0, there exist some positive constants c2 and c3 such that
when n is sufficiently large.
Lemma A.3
Under Conditions (C1) and (C2), for every 1 ≤ j ≤ pn and for any given constant c4 > 0, there exists some positive constant c5 such that
when n is sufficiently large.
Lemma A.4
Under Conditions (C1) and (C2), for every 1 ≤ j ≤ pn and for any given constant c6 > 0, there exist some positive constants c7 and c8 such that
when n is sufficiently large.
Lemma A.5
Under Condition (C2) and the assumption of rn in Theorem 1, for every 1 ≤ j ≤ pn and for any c9 > 0, there exist some positive constants c10 and c11 such that
| (A.1) |
when n is sufficiently large. Note that and have been defined after equations (6) and (8), respectively. Moreover, for a ∈ (0, 1),
| (A.2) |
when n is sufficiently large.
Proof of Theorem 1
Denote and . Then
| (A.3) |
After algebraic simplification, we have that, for any a ∈ (0, 1), implies ,where a* = (1 − a)−1 − 1. Hence, by (A.2) in Lemma A.5 and , we obtain
| (A.4) |
This, in conjunction with Lemma A.4, implies that for any c6 > 0,
| (A.5) |
for some positive constants .
It is worth noting that |ϕj| ≤ M1. Then, employing (A.1) in Lemma A.5 and (A.4), we have that for any c9 > 0,
| (A.6) |
By (A.3), (A.5), and (A.6), we have that, for any C1 > 0, there exist some positive constants C2 and C3 such that
| (A.7) |
for some positive constants and . The last inequality follows from and with and . This completes the proof.
Proof of Theorem 2
On the event
we apply Condition (C3) and obtain . Hence, by the choice of with C4 ≤ C0/2, we have ℳ* ⊂ ℳ̂νn on the event An. This, together with (A.7) and the union bound of probability, yields that
which completes the proof.
Proof of Proposition 1
Employing equation (10) by letting 2c = C4, we have that, on the event Ωn, |ℳ̂νn| ≤ C*nς+κ−ω/2, where 0 < C* < ∞. This, in conjunction with Theorem 1, leads to
Accordingly, Proposition 1 follows.
Proof of Theorem 3
Define kmin = min1≤k≤d{k : ℳ* ⊂ 𝒜(k)}. By the assumption that ς + κ − ω/2 < 1/2 and the result in Proposition 1, kmin is well defined and satisfies kmin ≤ C′nς+ κ−ω/2 = o(n1/2) for some constant 0 < C′ < ∞. For any 1 ≤ k < kmin, 𝒜(k) are underfitted models such that ℳ* ⊄ 𝒜(k) and 𝒜(k) are nested. By (A.18) in the supplementary materials of Lee et al. (2014), with probability approaching 1, we can choose a sequence of constants {Ln} such that Ln → ∞, Ln/Cn → 0, and
for some constant 0 < C″ < ∞, where and . Under the assumption that E|ε| < ∞, we obtain that and c′ ≤ Eρτ (ε) ≤ c″ for some constants 0 < c′, c″ < ∞. In addition, by assuming that (nς+κ−ω/2n−1 log n)Cn = o(1), we have n−1 Lnnς+κ−ω/2 log(n) = o(1). The above results imply that, with probability approaching 1,
| (A.8) |
Moreover, by employing the same techniques as those used in the proof of (A.20) from the supplementary materials of Lee et al. (2014), we have, with probability approaching 1,
| (A.9) |
for any 1 ≤ k < kmin, for some constant 0 < c‴ < ∞. Then, we have, with probability approaching 1, as n → ∞,
where the first inequality follows from the fact that log(1 + x) ≥ min{x/2, log 2} for any x > 0, (A.9) and kmin ≤ C′ς+κ−ω/2, and the second inequality follows from (A.8) and the assumption that (nς+κ−ω/2n−1 log n)Cn = o(1). The above result implies that P(k̂ ≥ kmin) → 1, as n → ∞, which completes the proof.
Contributor Information
Shujie Ma, Assistant Professor, Department of Statistics, University of California-Riverside, Riverside, CA 92521.
Runze Li, Verne M. Willaman Professor, Department of Statistics, the Pennsylvania State University, University Park, PA 16802.
Chih-Ling Tsai, Distinguished Professor and Robert W. Glock Endowed Chair in Management, Graduate School of Management, University of California at Davis, Davis, CA 95616.
References
- Angrist J, Chernozhukov V, Fernández-Val I. Quantile regression under misspecification, with an application to the U.S. wage structure. Econometrika. 2006;74:539–563. [Google Scholar]
- Belloni A, Chernozhukov V. ℓ1-penalized quantile regression in high-dimensional sparse model. Annals of Statistics. 2011;95:759–771. [Google Scholar]
- Bühlmann P, Kalisch M, Maathuis M. Variable selection for high-dimensional models: partially faithful distributions and the PC-simple algorithm. Biometrika. 2009;97:1–19. [Google Scholar]
- Chen J, Chen Z. Extended Bayesian Information Criteria for model selection with large model spaces. Biometrika. 2008;95:759–771. [Google Scholar]
- Cho H, Fryzlewicz P. High dimensional variable selection via tilting. Journal of the Royal Statistical Society: Series B. 2012;74:593–622. [Google Scholar]
- Fan J, Feng Y, Song R. Nonparametric independence screening in sparse ultra-high dimensional additive models. Journal of the American Statistical Association. 2011;106:544–557. doi: 10.1198/jasa.2011.tm09779. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association. 2001;96:1348–1360. [Google Scholar]
- Fan J, Lv J. Sure independence screening for ultra-high dimensional feature space (with discussion) Journal of the Royal Statistical Society: Series B. 2008;70:849–911. doi: 10.1111/j.1467-9868.2008.00674.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fan J, Lv J. A selective overview of variable selection in high dimensional feature space. Statistica Sinica. 2010;20:101–148. [PMC free article] [PubMed] [Google Scholar]
- Fan J, Song R. Sure independence screening in generalized linear models with NP-dimensionality. The Annals of Statistics. 2010;38:3567–3604. [Google Scholar]
- He X, Wang L, Hong HG. Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. Annals of Statistics. 2013;41:342–369. [Google Scholar]
- Hjort NL, Pollard D. Asymptotics for minimisers of convex processes. 1993 unpublished. arXiv:1107.3806. [Google Scholar]
- Huang J, Ma SG, Zhang CH. Adaptive lasso for sparse high-dimensional regression models. Statistica Sinica. 2008a;18:1603–1618. [Google Scholar]
- Huang J, Horowitz JL, Ma S. Asymptotic properties of bridge estimators in sparse high-dimensional regression models. Annals of Statistics. 2008b;36:587–613. [Google Scholar]
- Kalisch M, Bühlmann P. Estimating high-dimensional directed acyclic graphs with the PC-algorithm. Journal of Machine Learning Research. 2007;8:613–636. [Google Scholar]
- Koenker R. Quantile Regression. New York: Cambridge University Press; 2005. [Google Scholar]
- Kutner MH, Nachtsheim CJ, Neter J, Li W. Applied Linear Statistical Models. 5th. New York: McGraw-Hill and Irwin; 2005. [Google Scholar]
- Lee ER, Noh H, Park BU. Model selection via Bayesian Information Criterion for quantile regression models. Journal of the American Statistical Association. 2014;109:216–229. [Google Scholar]
- Li G, Li Y, Tsai CL. Quantile correlations and quantile autoregressive modeling. Journal of the American Statistical Association. 2015;110:246–261. [Google Scholar]
- Li R, Zhong W, Zhu L. Feature screening via distance correlation learning. Journal of American Statistical Association. 2012;107:1129–1139. doi: 10.1080/01621459.2012.695654. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu J, Li R, Wu R. Feature selection for varying coefficient models with ultrahigh dimensional covariates. Journal of the American Statistical Association. 2014;109:266–274. doi: 10.1080/01621459.2013.850086. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nagelkerke NJD. A note on a general definition of the coefficient of determination. Biometrika. 1991;78:691–692. [Google Scholar]
- Scheetz TE, Kim K-YA, Swiderski RE, Philp AR, Braun TA, Knudtson KL, Dorrance AM, DiBona GF, Huang J, Casavant TL, Sheffield VC, Stone EM. Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proceedings of the National Academy of Sciences of the United States of America. 2006;103:14429–14434. doi: 10.1073/pnas.0602562103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schwarz C. Estimating the dimension of a model. The Annals of Statistics. 1978;6:461–464. [Google Scholar]
- Tibshirani R. Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society, Series B. 1996;58:267–288. [Google Scholar]
- Wang H, Li R, Tsai C-L. Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika. 2007;94:553–568. doi: 10.1093/biomet/asm053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang H. Forward regression for ultra-high dimensional variable screening. Journal of the American Statistical Association. 2009;104:1512–1524. [Google Scholar]
- Wang H, Li B, Leng C. Shrinkage tuning parameter selection with a diverging number of parameters. Journal of Royal Statistical Society, Series B. 2009;71:671–683. [Google Scholar]
- Wang L, Wu Y, Li R. Quantile regression for analyzing heterogeneity in ultra-high dimension. Journal of the American Statistical Association. 2012;107:214–222. doi: 10.1080/01621459.2012.656014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu Y, Yin G. Conditional quantile screening in ultrahigh-dimensional heterogeneous data. Biometrika. 2015;102:65–76. [Google Scholar]
- Zhang CH. Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics. 2010;38:894–942. [Google Scholar]
- Zhu L-P, Li L, Li R, Zhu L-X. Model-free feature screening for ultrahigh-dimensional data. Journal of the American Statistical Association. 2011;106:1464–1475. doi: 10.1198/jasa.2011.tm10563. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zou H. The Adaptive lasso and its oracle properties. Journal of the American Statistical Association. 2006;101:1418–1429. [Google Scholar]
- Zou H, Hastie T. Regression shrinkage and selection via the elastic net with application to microarrays. Journal of the Royal Statistical Society, Series B. 2005;67:301–320. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
