Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Nov 1.
Published in final edited form as: Comput Stat Data Anal. 2023 Jun 23;187:107808. doi: 10.1016/j.csda.2023.107808

The Bayesian Regularized Quantile Varying Coefficient Model

Fei Zhou 1, Jie Ren 2, Shuangge Ma 3, Cen Wu 1
PMCID: PMC11090482  NIHMSID: NIHMS1911420  PMID: 38746689

Abstract

The quantile varying coefficient (VC) model can flexibly capture dynamical patterns of regression coefficients. In addition, due to the quantile check loss function, it is robust against outliers and heavy-tailed distributions of the response variable, and can provide a more comprehensive picture of modeling via exploring the conditional quantiles of the response variable. Although extensive studies have been conducted to examine variable selection for the high-dimensional quantile varying coefficient models, the Bayesian analysis has been rarely developed. The Bayesian regularized quantile varying coefficient model has been proposed to incorporate robustness against data heterogeneity while accommodating the non-linear interactions between the effect modifier and predictors. Selecting important varying coefficients can be achieved through Bayesian variable selection. Incorporating the multivariate spike-and-slab priors further improves performance by inducing exact sparsity. The Gibbs sampler has been derived to conduct efficient posterior inference of the sparse Bayesian quantile VC model through Markov chain Monte Carlo (MCMC). The merit of the proposed model in selection and estimation accuracy over the alternatives has been systematically investigated in simulation under specific quantile levels and multiple heavy-tailed model errors. In the case study, the proposed model leads to identification of biologically sensible markers in a non-linear gene-environment interaction study using the NHS data.

Keywords: Bayesian variable selection, Quantile regression, Markov Chain Monte Carlo, Robustness, Varying coefficient model

1. Introduction

The quantile varying coefficient model (Kim (2007)) has two defining characteristics. First, it can safeguard against heavy-tailed distribution and outliers due to the robustness of check loss function in quantile regression. Compared to the modeling based on conditional means, the check loss also makes a more comprehensive modeling of data feasible. Second, the quantile varying coefficient model can account for the dynamic effects of predictors on the response variable. As it has inherited from the varying coefficient model (Hastie and Tibshirani (1993)), its regression coefficients are nonparametric functions of other variables, or effect modifiers, so the dynamic influences of the predictor can be properly captured through the varying coefficients. Therefore, the quantile varying coefficient model enjoys wide popularity and application in a broad spectrum of scientific research areas due to its robustness, superior flexibility and interpretability. For example, in the gene-environment interaction analysis (Zhou et al. (2021)) of the Nurse’s Health Data conducted in Section 5 of this paper, we aim at addressing the scientific question on how the genetic factors, which are single nucleotide polymorphisms or SNPs, are influenced by age to affect the change in body mass index (BMI). The exploratory data analysis in Figure 1 clearly shows the skewness in the response variable BMI, and nonlinear interactions between SNP rs13001304 and age (the effect modifier), which justifies the use of the quantile VC model.

Figure 1:

Figure 1:

Distribution of the BMI (left) and non-linear interaction effect of SNP rs13001304 (right) from the NHS data. The blue dashed lines denote the 95% credible interval.

With a large number of the genetic factors, identification of important gene-environment interactions naturally leads to a sparse high-dimensional problem. Regularized variable selection has been extensively studied for quantile varying coefficient models. For example, Noh et al. (2012) has developed the regularization procedure based on the second order cone programming. The selection of important varying coefficients amounts to group level selection of the spline coefficients with group SCAD penalty. In longitudinal studies, Tang et al. (2013) has developed adaptive LASSO based variable selection method for quantile varying coefficient models, where the group level spline coefficients are penalized via the shrinkage of the Lv norm (v1). Tang et al. (2012) has further examined structural identification of varying coefficients by separating the varying, nonzero constant and zero effects in quantile regression. All these studies have established the asymptotic properties of the corresponding regularized estimators in terms of (1) consistency in variable selection; that is, the proposed methods can identify nonzero quantile varying coefficient functions with probability approaching 1, and (2) the rate of convergence of the nonzero quantile varying coefficient functions. However, they have not developed the asymptotic distributions of the regularized estimators. On the other hand, Dai and Kolar (2021) have established asymptotic normality and estimation consistency for a sparse kernel estimator that approximates quantile VC functions. The consistency in variable selection has not been established.

From the Bayesian perspective, variable selection for quantile varying coefficient models has not been well developed yet. One advantage of the fully Bayesian methods is that exact posterior inference can be conducted through the MCMC algorithms, even under small sample sizes. Therefore, the Bayesian analysis can provide additional insight over existing frequentist approaches, including the statistical inference based on credible intervals of the quantile varying coefficient functions. As the general framework for penalized (robust) variable selection can be formulated as “(robust) loss function + penalty function” (Wu and Ma (2015); Wu et al. (2019)), choosing the appropriate likelihood function and sparsity inducing priors, which correspond to the (robust) loss function and penalty terms respectively, have been shown to be an effective way to develop the Bayesian hierarchical models (Casella et al. (2010); Park and Casella (2008)). For Bayesian quantile regression, Yu and Moyeed (2001) has proposed using the asymmetric Laplace distribution (ALD) as the likelihood function to formulate the Bayesian quantile regression. Li et al. (2010) has further developed the Bayesian regularized quantile regression based on adopting the univariate and multivariate conditional Laplace priors as sparse priors. A major limitation of the conditional Laplace prior is that it does not lead to shrinkage with exact 0 coefficient, which has motivated Ren et al. (2022) to consider incorporating the spike-and-slab priors in bi-level selection for the Bayesian least absolute deviation (LAD) regression, a special case of the Bayesian penalized quantile regression with 50% quantile level. These methods are of a parametric nature, and cannot be adopted for analyzing the quantile varying coefficient models.

In literature, nonparametric Bayesian variable selection has been examined in varying coefficient models. Li et al. (2015) has developed Bayesian group LASSO for varying coefficient models in longitudinal studies. In gene-environment interaction studies, Ren et al. (2019) has examined the sparse structure identification for Bayesian partially linear varying coefficient models. Both work have developed Gibbs samplers for posterior sampling and inference. As the likelihood functions are employed based on normal distribution, both are not robust to long-tailed distributions and outliers in the response variable.

To the best of our knowledge, Bayesian regularized variable selection in quantile regression models with varying coefficients has not been well studied. As the quantile VC model can be further extended to a large family of non-/semi-parametric models (Lv and Li (2020); Ma and Song (2015); Wang et al. (2009)), it is not feasible to investigate these models within the Bayesian framework if the cornerstone model in this family has not been fully understood from the Bayesian perspective. Therefore, to fill this gap, we have developed a novel regularized Bayesian quantile varying coefficient model. The proposed model shares the two aforementioned defining characteristics of the quantile varying coefficient model within the Bayesian framework by accommodating the heavy-tailed errors and outlying observations in the response while flexibly modeling the nonlinear interactions between the predictor and the effect modifying variable. Selection of important varying coefficients can be efficiently conducted through group level Bayesian variable selection. Incorporation of the the multivariate spike and slab priors in our model promotes identification of important effects with exact sparsity, thus further improving the performance in identification and estimation. The Bayesian hierarchical model leads to a Gibbs sampler which facilitates fast posterior inference based on MCMC algorithms. We have implemented the proposed and alternative methods in R package pqrBayes on the corresponding author’s Github page (https://github.com/cenwu/pqrBayes). The core modules of the R package have been developed in C++. The package will be available on CRAN shortly.

2. Statistical Methods

2.1. The Quantile Varying Coefficient Model

Let (Yi,Xi,Vi,Ei),i=1,,n, be independent and identically distributed random vectors, where Yi is the response, Vi is the univariate index variable, Xi=(Xi0,Xi1,,Xip) denotes the (1+p)-dimensional design vector with the first element Xi0 being 1, and Ei=(Ei1,,Eiq) is the q-dimensional design vector. In particular, Xi is of high dimensionality (e.g., denoting gene expressions), and Ei represents low dimensional clinical factors. At a given quantile level 0<τ<1, we consider the following quantile varying coefficient model:

Yi=k=1qEikβk,τ+j=0pγj,τ(Vi)Xij+ϵi,τ,i=1,,n (1)

where Eik is the kth component of Ei, Xij is the jth component of Xi, and γj,τ()s are unknown smooth varying-coefficient functions. The τth quantile of random error ϵi equals 0. The quantile varying coefficient model enjoys the flexibility in that the high dimensional predictors X=(X1,,Xn) are linearly associated with the response, but the corresponding regression coefficients γj,τ()s vary with the univariate index variable V=(V1,,Vn). It frequently rises in many applications that only a subset of predictors among X are relevant to the response variable in model (1), motivating the variable selection for quantile varying coefficient models. Here, E stands for the low dimensional clinical and environmental factors that are pre-determined as important covariates and not subject to selection. Without loss of generality, we assume that the index variable Vi[0,1]. Besides, we omit the subscript "τ" hereafter for simplicity of notation.

2.2. The Bayesian formulation of the Quantile Varying Coefficient Model

To formulate the Bayesian quantile varying coefficient model, we begin with approximating the varying coefficient function γj() in model (1) through basis expansion using polynomial splines. Denote Nn as the number of uniform interior knots, and O as the degree of the polynomial. Then O=1 and 2 correspond to the linear and quadratic splines respectively, and so on. Let πj()=(πj1(),,πjd()) be a set of normalized B-spline basis with d=Nn+O+1 (Schumaker (2007)). Then for j=0,,p, we have the following approximations

γj()s=1dπjs()αjs=αjπj(),

where αj=(αj1,,αjd) is the spline coefficient vector. Subsequently, model (1) becomes

Yi=k=1qEikβk+j=0pαjZij+ϵi. (2)

where Zij=πj(Vi)Xij=(πj1(Vi)Xij,,πjd(Vi)Xij).

Given the above basis expansion, the regression coefficients β=(β1,,βq) and α=(α0,,αp) can be estimated by solving the following minimization problem:

argminβ,αi=1nρτ(Yik=1qEikβkj=0pαjZij), (3)

where ρτ(ϵi)=ϵi{τI(ϵi<0)} is the check loss function for quantile regression.

Given a quantile level τ, we assume that the random errors ϵis from model (2) follow an i.i.d. skewed (or asymmetric) Laplace distribution with density shown below (Yu and Moyeed (2001); Yu and Zhang (2005)):

f(ϵθ)=τ(1τ)θexp[θρτ(ϵ)]=τ(1τ)θ{eθτϵ,ifϵ0eθ(1τ)ϵ,ifϵ<0,

where θ1 is a scale parameter determining the skewness of the distribution. Then the joint distribution of Y given E and Z can be expressed as:

f(YE,Z,β,α,θ)=τn(1τ)nθnexp(θi=1nρτ(Yik=1qEikβkj=0pαjZij)).

It is worth pointing out that the asymmetric Laplace likelihood is essentially a working likelihood. It has been adopted merely for the purpose to ensure that the minimization problem specified in (3) is equivalent to maximizing the above likelihood (Yang et al. (2016)), which allows us to work with the usual likelihood function. Because of its connection to the check loss function in quantile regression, the asymmetric Laplace distribution has been widely adopted to specify the likelihood function for Bayesian quantile regression, which sheds additional insight over the frequentist-based approaches to quantile regression.

Kozumi and Kobayashi (2011) have shown that the skewed Laplace distribution can be equivalently represented as a mixture of an exponential distribution and a scaled normal distribution. To be more specific, let the random variables v and W be standard exponential distribution, Exp(1), and standard normal distribution, N(0,1), respectively. Define κ1=12ττ(1τ) and κ2=2τ(1τ) for 0<τ<1. Then we have the following representation based on a location–scale mixture of normals as

ϵ=θ1κ1u+θ1κ2uW,

where ϵ follows a skewed Laplace distribution with a scale parameter θ1. Consequently, model (2) becomes

Yi=k=1qEikβk+j=0pαjZij+θ1κ1ui+θ1κ2uiWi,

where uiExp(1) and WiN(0,1). Let u˜i=θ1uiExp(θ1) and u˜=(u˜1,,u˜n), Therefore, we have the following hierarchical model:

Yi=k=1qEikβk+j=0pαjZij+ξ1u˜i+θ12ξ2u˜iWi.
u˜1,,u˜ni=1nθexp(θu˜i),
W1,,Wni=1n12πexp(12Wi2).

2.3. The Bayesian Regularized Quantile Varying Coefficient Model

In the literature, penalized variable selection for quantile varying coefficient models have been examined with different group level penalty functions. For example, Noh et al. (2012) has developed a group SCAD to select important groups of spline coefficients after basis expansion. Tang et al. (2013) has proposed adaptive group LASSO for quantile varying coefficient models, where the group level shrinkage on spline coefficients has been imposed through the Lν norm with ν1. From the Bayesian perspective, the group LASSO estimator can be viewed as the posterior mode estimate when independent and identical multivariate Laplace priors are assumed for groups of regression coefficients. Such a connection has motivated us to consider the following regularized quantile varying coefficient model with group LASSO penalty:

minβ,αi=1nρτ(YiEiβZiα)+λj=1pαj2, (4)

where αj2=(αjαj)1/2, and λ>0 is the tuning parameter. We first set the independent and identical multivariate Laplace prior on αj as π(αjλ,θ)(λθ)dexp{λθαj2}, where d is the group size (i.e. the length of αj). The resulting posterior distribution of α is

f(αY,E,Z,β,λ,θ){θi=1nρτ(Yik=1qEikβkj=0pαjZij)λθj=1pαj2)}.

With the reparametrization η=λθ, the multivariate Laplace prior can be rewritten as a scale mixture of multivariate normal distribution using Gamma mixing density, that is,

M-Laplace(αjη)(η2)d/2exp{ηαj2},0+Nd{αj0,gjId}Gamma(gj|d+12,η22)dgj, (5)

where the multivariate normal (MVN) distribution has zero mean vector and a d-by-d diagonal matrix diag(gj,,gj) as the covariance matrix, and the Gamma distribution is defined with the shape parameter d+12 and the rate parameter η22. By integrating out gj, the conditional prior on αj has the multivariate Laplace distribution defined in (5). Therefore, the prior can be expressed as a gamma mixture of normal distributions in a Bayesian hierarchical model:

αjgjindNd(0,gjId1),gj|η2indGamma(d+12,η22), (6)

A major limitation of the above Laplacian shrinkage based formulation of hierarchical model is that the posterior estimates for regression coefficients α cannot be shrunk to 0 exactly. In general, a 95% credible interval needs to be constructed to determine the sparsity, which suffers from inaccuracy as shown in many published studies. Here, we consider incorporating multivariate spike-and-slab priors to achieve direct identification of sparsity, i.e.,

αjgj,ψjind(1ψj)Nd(0,gjId)+ψjδ0(αj),ψjπ0indBernoulli(π0),gj|η2indGamma(d+12,η22), (7)

where the spike is defined as δ0(αj), a point mass at 0d×1, and the slab component is Nd(0,gjId). The parameter π0[0,1]. For j=1,,p, we introduce a latent binary indicator variable ψj corresponding to each group to conduct the selection of spline coefficients on the group level. When ψj=1, the spline coefficient vector αj has a point mass density at zero, suggesting that αj is estimated as a zero vector and the varying coefficient corresponding to the jth predictor in X is 0, i.e., the jth predictor is not associated with the response. Besides, if ψj=0, the slab part, or the normal distribution, is in action, and the spike-and-slab prior reduces to the hierarchical priors in (6), leading to a Bayesian quantile group LASSO. Therefore, αj20 and the jth group of spline coefficients is selected in final model. By integrating out ψj and gj in (7), we have the marginal prior on αj as a mixture of a multivariate Laplace distribution and a point mass at 0d×1:

π(αjη2)(1π0)M-Laplace(0,η2)+π0δ0(αj), (8)

which borrows strength from both the Laplacian shrinkage and spike-and-slab priors. The multivariate Laplacian in the slab component plays the role as a diffuse density to model the large effects, and δ0() is a point mass at zero to achieve variable selection via shrinking negligible group of spline coefficients to 0. Note that (8) reduces to (5) when π0=0. We assign a conjugate beta prior as π0Beta(e,f) with fixed parameters e and f, which accounts for the uncertainty in choosing π0.

Besides, for computational convenience, we assign conjugate Gamma priors to η2 and θ as follows:

η2Gamma(c,m),
θGamma(a,b),

where a, b, c and m are constants. The multivariate normal prior has been placed on the q-dimensional coefficient vector β=(β1,,βq) as:

βNq(0,Σβ),

where Σβ denotes the covariance matrix. Similarly, for the coefficients α0 corresponding to the varying intercept, we assign the following prior:

α0Nd(0,Σα0).

3. The Gibbs Sampler

The joint likelihood of the unknown parameters conditional on data will be given as

p(α,β,u˜i,gj,π0,θ,η2Y)i=1n12πθ1κ22u˜iexp{(YiEiβj=0pαjZijκ1u˜i)22θ1κ22u˜i}×j=1p((1π0)(2πgj)d2exp(12gjαjαj)I(αj0)+π0δ0(αj))×π0e1(1π0)f1×j=1p(η22)d+12gjd12exp(η22gj)×i=1nθexp(θu˜i)×θa1exp(bθ)×(η2)c1exp(mη2)×(2π)q2|Σβ|12exp(12βΣβ1β)×(2π)d2|Σα0|12exp(12α0Σα01α0).

The full conditional distributions can be derived as follows. We provide all the details in the Appendix.

  • The full conditional distribution of u˜i is:
    u˜i1|restInverse-Gaussian(κ12+2κ22(YiEiβZiα)2,(θκ12κ22+2θ)),
    where “rest” denotes the data and all the other model parameters sampled in the MCMC.
  • Let lj=p(αj=0rest), then the conditional posterior distribution of αj(j=1,,p) is a multivariate spike-and-slab distribution given as:
    αjrest(1lj)Nd(μj,Σj)+ljδ0(αj),
    where
    Σj=(θκ22i=1n1u˜iZijZij+gj1Id)1,
    μj=Σjθκ22i=1nZiju˜i(YiZi,jαjEiβκ1u˜i),
    and
    lj=π0π0+(1π0)|gjId|12|Σj|12exp(12μjΣjμj).
    Therefore, the posterior distribution of αj is a mixture of a multivariate normal distribution and a point mass at 0. At each iteration of MCMC, αj is drawn from Nd(μj,Σj) with probability (1lj) and is set to 0 with probability lj.
  • The full conditional distribution of θ is
    θrestGamma(32n+a,12i=1n(YiEiβj=1pαjZij)2κ22u˜i+i=1nu˜i+b).
  • The full conditional distribution of η2 is
    η2restGamma((d+1)(p+1)2+c,12j=1pgj+m).
  • The full conditional distribution of gj,j=1,,p, is
    gj1rest{Inverse-Gamma(d+12,η22)ifαj=0Inverse-Gaussian(η2αjαj,η2)ifαj0.
  • The full conditional distribution of π0
    π0restBeta(1+pj=1pQj+e,j=1pQj+f),
    where
    Qj={0ifαj=01ifαj0.
  • The full conditional distribution of β is multivariate normal:
    βrestNq(μβ,Σβ),
    with covariance
    Σβ=(i=1nθEiEiκ22u˜i+Σβ1)1,
    and mean
    μβ=Σβ(i=1nθκ22u˜i(Yij=0pαjZijκ1u˜i)Ei).
  • Similarly the full conditional distribution of α0 can be obtained as
    α0restNd(μ0,Σ0),
    where
    Σ0=(i=1nθZi0Zi0κ22u˜i+Σα01)1
    and
    μ0=Σ0(i=1nθκ22u˜i(YiEiβj=1pαjZijκ1u˜i)Zi0).

4. Simulation

We conduct a comprehensive evaluation to assess the performance of the proposed method, Bayesian regularized quantile varying coefficient model with spike and slab priors (BQRVCSS), with three alternative Bayesian methods: BQRVC, BVCSS and BVC. The BQRVC only differs from BQRVCSS in that the spike-and-slab prior is not incorporated. BVCSS and BVC are the non-robust counterpart of BQRVCSS and BQRVC, respectively. Details of the hierarchical model formulation and derivation of the corresponding Gibbs samplers are provided in the Appendix C. Besides, two frequentist methods, regularized varying coefficient model with adaptive group LASSO under the quantile check loss (QRVC-adp) and least square loss (VC-adp) from Tang et al. (2013) are also included.

The response variable generated according to model 1 with sample size n=200 and dimensionality of X being 100. Without loss of generality, the low dimensional clinical covariates, denoted as E in model 1, is omitted, which can facilitate a fair comparison as such a component is not included in QRVC-adp and VC-adp (Tang et al. (2013)). The total dimension of regression coefficients after basis expansion is larger than the sample size. For instance, if the number of basis function is set to 5, the actual dimension is 505, including the varying intercept. The varying coefficients are set as γ0(v)=2+2sin(2πv), γ1(v)=2exp(2v1), γ2(v)=6v(1v) , γ3(v)=4v3. The rest of the coefficients are 0. We simulate two types of predictors X separately. First, the predictors are simulated from a multivariate normal distribution with mean 0 and an AR-1 covariance matrix where marginal mean is 0 and correlation coefficient is 0.5, which represents the continuous gene expression data. Second, we generate the predictors as the categorical single nucleotide polymorphism (SNP) data by dichotomizing the aforementioned gene expression values of each predictor at the 1st and 3rd quartiles, leading to the 3-level categories (0,1,2) for genotypes (aa, Aa, AA).

We consider five error distribution for ϵis in model (1): N(μ,1)(Error 1), 80%N(μ,1)+20%Normal(μ,3)(Error 2), Laplace(μ,b) with the scale parameter b=1 (Error 3), LogNormal(μ,1)(Error 4), t(2) with mean=μ (Error 5). Errors 2–5 are heavy-tailed distributions. For each error, μ is chosen so that the θth quantile is 0. We also consider the case of non i.i.d. random errors by using the following data generating model :

Yi=k=1qEikβk+j=0pγj(Vi)Xij+(1+Xi2)ϵi,

where ViUniform(0,1), the i.i.d. errors ϵi in model (1) are replaced by (1+Xi2)ϵi, and the regression coefficients are the same as in the model under i.i.d. random errors.

The proportions of correct fitting (C), over-fitting (O), and under-fitting (U) are used to evaluate identification performance. In addition, the integrated mean squared error (IMSE) is adopted to assess estimation accuracy of varying coefficients. Let γ^j(v) denote the posterior median estimate for γj(v), and (v1,,v200) be the grid of points equally space on [0,1]. Therefore γ^j(v) can be evaluated on the grid points {vi}i=1200. Then the IMSE of γ^j(v) is given as IMSE(γ^j(v))=1200t=1200(γ^j(vt)γj(vt))2. γj(vt) reduces to 0 if j>3. The total integrated mean squared error (TIMSE), or the sum of all the estimated varying coefficients, denote the overall estimation accuracy.

We have drawn the posterior samples from the Gibbs sampler. For Bayesian methods that are based on the spike-and-slab priors, the median probability model (MPM) is adopted to identify important predictors. Define the indicator ϕj for the jth predictor. At the mth iteration, ϕj(m)=1 if the jth predictor is included in the regression model,i.e., the jth varying coefficient is nonzero. Then, based on M posterior samples drawn from the MCMC after excluding burn-ins, the posterior probability of including the jth predictor in the final model can be calculated as

pj=π^(ϕj=1y)=1Mm=1Mϕj(m),j=1,,p.

A larger posterior inclusion probability suggests a stronger evidence for the importance of the corresponding varying coefficients. The MPM model consists of predictors with posterior inclusion probability no less than 0.5. It has been recommended due to its optimal prediction performance when selecting a single model is of interest (Barbieri and Berger (2004)). For methods without using spike–and–slab priors, we use the 95% credible interval (95%CI) to conduct identification. In simulation, the Gibbs sampler run 10,000 MCMC iterations in which the first 5,000 samples are burn-ins.

For the 4 data generating scenarios, i.e., (1) gene expression with i.i.d. error; (2) gene expression with non-i.i.d. error, (3) SNPs with i.i.d. error and (4) SNPs with non-i.i.d. error, all the 6 methods have been compared across 5 error distributions and 3 different quantile levels (0.3, 0.5 and 0.7). The identification results for the first scenario are shown in Figure 2. We can observe that under the standard normal error, BQRVCSS and BVCSS, the two Bayesian methods with the spike-and-slab priors, as well as the two frequentist methods (QRVC-adp and VC-adp), have comparable performance in correctly identifying the true model. When the random errors are heavy-tailed, Figure 2 clearly shows the advantage of BQRVCSS over non-robust alternatives. On the other hand, BQRVCSS is apparently superior over BQRVC and BVC by yielding much larger percentage of correctly fitted models. In fact, the two Bayesian approaches without adopting spike-and-slab priors consistently lead to the two lowest proportions of identifying the true model. A comparison between BQRVCSS and QRVC-adp indicates that the two are comparable in general, and the proposed one appears slightly better. Among the 12 sub-panels in Figure 2, robust methods tend to perform the worst at quantile level 0.7 under the lognormal error (Error 4), since lognormal distribution is right skewed. Such a phenomenon has not been observed under other 4 symmetric errors.

Figure 2:

Figure 2:

Identification results for simulated gene expression data with i.i.d. errors based on 100 replicates. C: correct-fitting proportion; O: overfitting proportion; U: underfitting proportion.

Figure 3 shows the identification results under the 2nd setting where the response variable is generated based on gene expression data with non-i.i.d. errors. The advantage of BQRVCSS can be again concluded. Furthermore, the estimation results in terms of total integrated mean square error (TIMSE) for scenario 1 and 2 are provided in Table 1 to Table 2, respectively. Under the heavy-tailed error, BVCSS leads to the smallest estimation error. For example, in Table 1, at quantile 0.5 with the t(2) error distribution, BQRVCSS has a TIMSE of 0.33 (sd 0.23), less than that of the BQRVC (4.35 (sd 0.78)) and QRVC-adp (0.76 (sd 0.99)), as well as non–robust alternatives. The advantage of the proposed method over the rest is due to its robustness and incorporation of the spike-and-slab prior. We also observe similar patterns in the 3rd and 4th setting from Figure 5, Figure 6, Table 3 and Table 4 in the Appendix.

Figure 3:

Figure 3:

Identification results for simulated gene expression data with heterogeneous errors based on 100 replicates. C: correct-fitting proportion; O: overfitting proportion; U: underfitting proportion.

Table 1:

Estimation results in terms of total integrated mean square error (TIMSE) for simulated gene expression data with i.i.d. errors based on 100 replicates.

τ BQRVCSS BQRVC BVCSS BVC QRVC-adp VC-adp
τ=0.3 Normal 0.23(0.10) 2.28(0.35) 0.45(0.09) 1.56(0.16) 0.25(0.10) 0.70(0.09)
NormalMix 0.34(0.19) 3.90(0.62) 0.76(0.27) 3.04(0.43) 0.45(0.23) 0.92(0.16)
Laplace 0.27(0.13) 2.97(0.45) 0.47(0.15) 2.12(0.31) 0.26(0.11) 0.71(0.11)
Lognormal 0.11(0.05) 3.38(0.55) 1.14(0.85) 5.84(1.92) 0.18(0.41) 1.22(2.45)
t(2) 0.44(0.24) 5.01(1.16) 2.63(5.24) 8.35(9.76) 0.84(0.98) 2.58(3.22)
τ=0.5 Normal 0.21(0.06) 2.42(0.36) 0.40(0.06) 1.57(0.16) 0.21(0.07) 0.62(0.11)
NormalMix 0.31(0.17) 3.75(0.60) 0.74(0.24) 2.71(0.49) 0.35(0.16) 0.92(0.11)
Laplace 0.22(0.06) 3.07(0.48) 0.46(0.08) 1.83(0.28) 0.22(0.09) 0.70(0.08)
Lognormal 0.25(0.19) 4.59(0.94) 1.18(1.69) 5.09(2.28) 0.40(0.56) 1.26(0.68)
t(2) 0.33(0.23) 4.35(0.78) 2.04(1.48) 6.82(6.51) 0.76(0.99) 2.05(4.32)
τ=0.7 Normal 0.21(0.08) 2.53(0.41) 0.41(0.08) 1.58(0.18) 0.23(0.10) 0.71(0.10)
NormalMix 0.33(0.14) 3.84(0.58) 0.78(0.30) 3.03(0.53) 0.45(0.26) 0.92(0.18)
Laplace 0.29(0.11) 3.22(0.49) 0.49(0.16) 2.18(0.34) 0.30(0.17) 0.73(0.12)
Lognormal 0.71(0.45) 5.44(1.52) 0.99(0.90) 4.19(2.07) 0.96(0.95) 1.35(3.65)
t(2) 0.42(0.35) 5.07(1.21) 2.65(3.35) 9.10(11.24) 0.97(1.42) 2.02(1.75)

Table 2:

Estimation results in terms of total integrated mean square error (TIMSE) for simulated gene expression data with heterogeneous errors based on 100 replicates.

τ BQRVCSS BQRVC BVCSS BVC QRVC-adp VC-adp
τ=0.3 Normal 0.35(0.15) 3.44(0.54) 0.94(0.30) 2.82(0.37) 0.37(0.20) 0.95(0.17)
NormalMix 0.50(0.24) 5.05(0.99) 1.04(1.20) 5.79(1.70) 0.45(0.23) 1.62(0.61)
Laplace 0.35(0.15) 4.04(0.79) 1.03(0.67) 3.57(0.90) 0.41(0.21) 0.94(0.27)
Lognormal 0.20(0.09) 4.18(0.93) 2.55(2.57) 9.84(4.87) 0.37(0.54) 3.59(2.03)
t(2) 0.64(0.39) 5.87(1.29) 2.99(2.83) 10.94(6.72) 1.37(1.59) 3.27(1.27)
τ=0.5 Normal 0.27(0.21) 3.38(0.53) 0.93(0.17) 2.21(0.36) 0.28(0.16) 0.96(0.16)
NormalMix 0.29(0.12) 4.61(0.82) 1.12(0.94) 5.20(1.48) 0.35(0.16) 1.62(0.61)
Laplace 0.21(0.10) 3.84(0.67) 0.98(0.41) 3.18(0.72) 0.21(0.12) 1.06(0.33)
Lognormal 0.29(0.16) 4.36(0.95) 2.09(2.13) 8.26(3.61) 0.40(0.48) 2.45(2.17)
t(2) 0.38(0.22) 5.31(1.12) 3.33(3.15) 11.94(15.06) 1.16(2.20) 3.92(5.56)
τ=0.7 Normal 0.33(0.11) 3.65(0.59) 0.85(0.25) 2.71(0.47) 0.38(0.16) 1.06(0.27)
NormalMix 0.51(0.22) 5.32(0.89) 1.22(1.04) 5.91(1.57) 0.78(0.56) 1.65(0.61)
Laplace 0.42(0.22) 4.25(0.73) 0.93(0.42) 3.37(0.72) 0.42(0.24) 1.10(0.39)
Lognormal 0.80(0.58) 6.85(1.71) 2.47(8.41) 7.98(6.94) 2.72(6.07) 2.54(3.39)
t(2) 0.62(0.29) 6.44(1.31) 5.37(4.67) 13.41(12.08) 1.27(1.13) 3.32(3.06)

We have also shown the estimated varying coefficients of the proposed method (BQRVCSS) for the gene expression data with i.i.d. errors and 50% quantile level in the first setting in Figure 7. Here are the details of generating the Figure 7. At each replicate, a new dataset has been simulated with the aforementioned data generating model. We can obtain the posterior median estimates and 95% credible intervals after fitting the proposed method to the data generated at every replicate. The median estimates, as well as the lower and upper bound of the credible intervals, have been averaged respectively across 100 replicates to yield the estimated varying coefficients and corresponding 95% credible intervals shown in Figure 7. In addition, we have evaluated the empirical 95% coverage probabilities of four Bayesian methods using their pointwise 95% credible intervals over the 200 grid points. Table 5 in the Appendix shows the 95% coverage probabilities for four varying coefficient functions under simulated gene expression data with i.i.d. t(2) errors. We can observe that overall, the proposed BQRVCSS outperforms all the alternatives. Specifically, BQRVC and BVC, the two methods not incorporating the spike-and-slab priors, can barely cover γ1(v) and γ3(v). The nonrobust counterpart BVCSS is inferior particularly at quantile level 0.3 and 0.7. The results also suggest that the performance may depend on the form of varying coefficients under estimation. It is apparent that γ2(v), a quadratic function, is corresponding to better coverage probabilities in general and none of the methods have completely missed the coverage of γ2(v), compared to those under the non-polynomial functions (γ0(v) and γ1(v)) and polynomial functions with a higher order (γ3(v)). Yang et al. (2016) have proposed a posterior variance adjustment procedure to improve the validity of credible intervals from Bayesian quantile regression with the asymmetric Laplace likelihood. While their method has been developed from a low dimensional parametric regression setting, how to adjust posterior variance to improve performance in terms of coverage probabilities in high-dimensional nonparametric setting when more complicated sparsity priors (i.e. the spike-and-slab prior) are involved worths further exploration beyond our study.

By far, the asymptotic distribution of the spline-based regularized quantile varying coefficient models have not been developed (Noh et al. (2012); Tang et al. (2013, 2012)). Without the asymptotic variance, it is not feasible to construct the corresponding pointwise asymptotic confidence intervals for the varying coefficients. Therefore,the counterpart of Figure 7 for frequentist spline-based quantile VC models are not available. In literature, Dai and Kolar (2021) have developed kernel-based inference procedure for estimators that approximates quantile VC in high-dimensional setting. They did not show any plots of pointwise confidence intervals for nonparametric functions. It is not immediately evident to us whether or how their methods can be used to generate confidence intervals for varying coefficient functions without the relevant specifics. Therefore we have not pursued a direct comparison to frequentist coverage of confidence intervals using their methods. For frequentist methods VC-adp and QRVC-adp, we have selected tuning parameters through Schwarz-type Information Criterion (SIC) which has been widely adopted in published literature in choosing tuning parameters for regularized (quantile) varying coefficient models (Noh et al. (2012); Tang et al. (2013, 2012); Wang and Xia (2009)). Please refer to the Appendix for more details.

The convergence of the MCMC chains is examined by using the the potential scale reduction factor (PSRF) (Gelman and Rubin (1992),Brooks and Gelman (1998)). The convergence is achieved if PSRF values are close to 1. According to Gelman et al. (2013), we use 1.1 as the cutoff (i.e. PSRF ≤ 1.1) to determine convergence. The PSRF has been computed for each parameter, indicating convergence of all chains after burn-ins. Figure 8 shows the PSRF of the estimated spline coefficients of each varying coefficient function in Figure 8. The convergence is satisfactorily achieved.

We demonstrate the sensitivity of the proposed method BQRVCSS for variable selection to the choice of the hyperparameters for π0 and η2 in the Appendix and tabulate the results from Table 6 to Table 9. These results suggest that the MPM model is insensitive to different choices of the hyperparameters. We also conduct sensitivity analysis on whether the smoothness specification of the parameters in the B spline will impact the variable selection. The sensitivity analysis results are shown in Table 12 to 15 in the Appendix. It is evident that the proposed method is insensitive to the number of spline basis d, which is equivalent to 1+O+Nn, in smoothness specification. We provide a heuristic justification as follows. In nonparametric literature, n1/(2O+3) has been established as the optimal order of number of interior spline knots under certain regularity conditions (Xue and Yang (2006)). Other orders, such as n1/(2O+1), has also been commonly assumed (Wang and Yang (2009)). Therefore, if the number of interior knots is chosen within the range of {max([0.5n1/(2O+3)],1),[1.5n1/(2O+3)]}, where [a] denotes the integer part of a, the optimal order can be achieved. In practice, to avoid over fitting, cubic splines and splines with a smaller degree have been extensively used. With quadratic and cubic splines, where the spline order O corresponds to 2 and 3 respectively, the aforementioned range results in 1 to 3 interior knots under the sample size 200 adopted in simulation. Therefore, the proposed method is insensitive with the above specifications of O and Nn, i.e. the number of spline basis. Nevertheless, a rigorous justification on the optimal order of number of interior knots in high-dimensional quantile varying coefficient models remains an open question. Based on this finding, we set the degree O=2 and the number of interior knots Nn=2 for the B spline basis, which leads to d=5 basis functions.

The varying coefficient functions in the simulation study have been widely adopted in published nonparametric literature (Noh et al. (2012); Tang et al. (2013, 2012); Wang and Yang (2009); Xue and Yang (2006)). Functions with more complex structures may not lead to the same satisfactory performance as shown here. For example, a sine function with more oscillations in [0,1] is not a polynomial function in nature, and thus cannot be well approximated by the spline–based methods with the established optimal order of number of interior knots. We run additional simulations under setting 1 where gene expression data are generated with i.i.d. errors by only changing γ0(v) to a more complicated sine function γ0(v)=2+2sin(6πv). Table 11 in the Appendix shows that the estimation accuracy has significantly decreased for all the methods, compared to the estimation results in Table 1. In the Appendix, we have also provided the estimation plots of more complicated varying coefficient functions using the BQRVCSS and the frequentist counterpart QRVC-adp. Figure 9 and Figure 10 show that γ0(v) cannot be well modeled by both methods, which has also been observed with all the other methods (BQRVC, BVCSS, BVC and VC-adp) under comparison.

In the simulation, the figures of estimated curves are obtained based on averages over multiple replicates. To further explore the estimation performance when the proposed method has been applied to single datasets, we have also shown the figure beyond the “average case” scenario. Specifically, at each simulation run, we compute the IMSE of posterior median estimates of the curves. Then the curves at the 25th, 50th and 75th percentile of IMSEs across all the replicates have been overlaid with the true curve in Figure 11 in the Appendix. We can observe that all of them are close to the true curves, although the curves at the 75th percentile of IMSEs are slightly worse than those at the other two percentiles.

5. Real Data Analysis

We analyze the Nurse’s Health Study (NHS) data from the Gene, Environment Association Studies Consortium (GENVEA) (Cornelis et al. (2010)). The NHS aims at assessing a series of hypotheses of disease susceptibility in female based on genetic factors, i.e. single nucleotide polymorphisms (SNPs), and environmental/clinical factors in gene-environment interaction studies. The body mass index (BMI), which can quantify the obesity level, is set as the response. We focus on SNPs on chromosome 2. We consider age as the environment factor since it has been shown to be associated with the variations of obesity level. Besides, three clinical covariates are included: total physical activity, trans fat intake and cereal fiber intake. The healthy subjects in the NHS are selected in the case study. We clean the data by keeping subjects with matched phenotypes and genotypes, removing SNPs with minor allele frequency (MAF) less than 0.05 or deviation from Hardy-Weinberg equilibrium, and imputing the missing values. The final working dataset contains 1716 subjects with 53,408 SNPs.

A common practice in variable selection for ultra-high dimensional data in omics data analysis is to first conduct marginal screening and reduce the number of feature to a reasonable scale so (Bayesian) regularized variable selection can be applied (Li et al. (2015); Wu et al. (2014, 2018)). Here, we screen the SNPs using the testing procedure in non-linear gene-environment interaction studies proposed by Ma et al. (2011) and Wu and Cui (2013). In particular, three statistical tests have been performed to assess the effect of a genetic factor under the environmental influences and to dissect whether the interaction effects are nonlinear, linear, constant, or zero. We keep the SNPs with p-values less than a cutoff of 0.005 from any of the tests under the response BMI. 300 SNPs pass the screening.

We analyze the screened data using the proposed method BQRVCSS at the median and the alternative BVCSS. Other methods, such as BQRVC and BVC are not considered since they have inferior performance in the simulation studies. The eleven SNPs identified by BQRVCSS and the corresponding estimated varying coefficients are displayed in Figures 4. BVCSS identifies nine SNPs which are rs17533992, rs16864365, rs6719951, rs7585571, rs752833, rs4894108, rs16867269, rs2675102 and rs13418054. Six SNPs are commonly selected by both methods. Besides, the proposed method uniquely identified five SNPs that are located within the genes that have been reported to be associated with body weight change. For example, BQRVCSS identifies the SNP rs17783776, which is located in the gene ALK. ALK (anaplastic lymphoma kinase) has been identified as a thinness gene which suggests it could be the target gene for obesity treatment (Orthofer et al. (2020)). As a comparison, the alternative method BVCSS misses this important gene. The proposed method also identifies rs 41349646, a SNP that is mapped to the gene NPAS2. NPAS2 has been found to play an essential role in the regulation of peripheral circadian response and hepatic metabolism, therefore affects weight change (O’Neil et al. (2013)). The SNP rs10933420 is also uniquely identified by our proposed method and it is located in the gene NGEF. Kim et al. (2015) has found NGEF associated with intra-abdominal fat accumulation. Besides, our proposed method BQRVCSS identifies rs4854071 as well. The SNP rs4854071 is located within the gene NDUFA10 (NADH:Ubiquinone Oxidoreductase Subunit A10), which has been found to be involved in the NAFLD pathway regulating weight loss together with ten other genes (Mirhashemi et al. (2021)).

Figure 4:

Figure 4:

Real data analysis using the proposed method (BQRVCSS). Black line: median estimates of varying coefficients for BQRVCSS. Blue dashed lines: 95% credible intervals for the estimated varying coefficients.

It is difficult to objectively evaluate the selection accuracy with real data. We assess the prediction performance as it may provide additional information on the performance of different methods. We refit the selected models of BQRVCSS and BQRVC by Bayesian quantile LASSO and Bayesian LASSO, respectively, by following the refitting procedure in Li et al. (2015). The prediction mean squared errors (PMSEs) and prediction mean absolute deviations (PMADs) are computed based on the posterior median estimates. The proposed method BQRVCSS has the PMSE and PMAD equal to 13.13 and 1.34, respectively, while the PMSE and PMAD for BVCSS are 15.04 and 3.05, which are both larger than the counterparts of BQRVCSS.

6. Discussion

Within a broader scope, regularized quantile varying coefficient model can be regarded as a robust variable selection problem in the form of “robust loss function + penalty function” (Wu and Ma (2015)), which consists of a quantile check loss and a group level penalty function. Although other robust loss functions, including the rank based loss (Wu et al. (2015)), can also be considered for robust high-dimensional varying coefficient models, the regularized quantile VC model naturally leads to a Bayesian formulation if the likelihood function of the Bayesian hierarchical model is specified based on the asymmetric Laplace distribution (ALD) (Yu and Moyeed (2001)). The modeling of spline basis in the proposed study has connections to the development of semiparametric Bayesian regressions for the “large n, small p settings (Huang et al. (2015)). As the high-dimensional Bayesian quantile VC model is underdeveloped, examining the Bayesian counterpart complements and further advances the existing studies on the quantile VC model in the frequentist framework.

Nevertheless, our limited literature search shows that high dimensional Bayesian quantile varying coefficient models have not been well examined by far. In this article, we have developed a Bayesian regularized quantile varying coefficient model. The robust asymmetric Laplace likelihood and sparsity inducing priors lead to full conditional distributions of the model parameters. Therefore, posterior inference can be efficiently conducted through Gibbs sampling. The varying coefficient model is a special case of the varying index coefficient model (VICM) when the effect modifying variable is univariate with loading weight being 1 (Ma and Song (2015)). Ma and Song (2015) has further shown that the new class of VICM gives rise to a broad spectrum of semi- and non-parametric models. Our study has laid a solid foundation for initiating Bayesian analyses of these models in the high-dimensional setting. Investigations on these extensions within the Bayesian framework will be postponed to the near future.

Acknowledgements

We thank the editor, associate editor and reviewers for their careful review and insightful comments which lead to a significant improvement of this article. We also thank Yuwen Liu’s help with conducting the additional simulation studies during the revision. This work was partially supported by an Innovative Research Award from the Johnson Cancer Research Center at Kansas State University and the National Institutes of Health (NIH) grant R01 CA204120.

Appendix

A. Additional Simulation Results

A.1. Additional Identification Results

Figure 5:

Figure 5:

Identification results for simulated SNP data with i.i.d. errors based on 100 replicates. C: correct-fitting proportion; O: overfitting proportion; U: underfitting proportion.

Figure 6:

Figure 6:

Identification results for simulated SNP data with heterogeneous errors based on 100 replicates. C: correct-fitting proportion; O: overfitting proportion; U: underfitting proportion.

A.2. Additional Estimation Results

Table 3:

Estimation results in terms of total integrated mean square error (TIMSE) for simulated SNPs with i.i.d. errors based on 100 replicates.

τ BQRVCSS BQRVC BVCSS BVC QRVC-adp VC-adp
τ=0.3 TMSE 0.23(0.10) 2.32(0.40) 0.45(0.12) 1.51(0.19) 0.28(0.11) 0.79(0.14)
NormalMix 0.34(0.17) 3.47(0.59) 0.76(0.23) 2.92(0.46) 0.53(0.35) 0.98(0.27)
Laplace 0.26(0.10) 2.91(0.53) 0.45(0.12) 2.06(0.32) 0.34(0.15) 0.80(0.11)
Lognormal 0.11(0.07) 3.23(0.61) 1.76(0.64) 4.7(1.38) 0.28(0.51) 1.45(0.76)
t(2) 0.38(0.17) 4.70(1.07) 1.99(1.66) 7.91(9.55) 1.30(1.30) 1.54(1.52)
τ=0.5 Normal 0.19(0.07) 2.14(0.38) 0.41(0.09) 1.21(0.14) 0.28(0.12) 0.76(0.10)
NormalMix 0.27(0.12) 3.67(0.58) 0.73(0.16) 2.65(0.43) 0.49(0.37) 1.03(0.32)
Laplace 0.16(0.05) 2.88(0.43) 0.45(0.09) 1.87(0.35) 0.28(0.19) 0.78(0.23)
Lognormal 0.23(0.13) 4.16(0.83) 1.55(1.14) 5.3(2.43) 0.44(0.45) 1.43(0.66)
t(2) 0.31(0.18) 4.17(0.83) 1.94(1.63) 7.49(7.61) 1.25(1.23) 2.14(1.90)
τ=0.7 Normal 0.19(0.07) 2.37(0.46) 0.41(0.10) 1.50(0.18) 0.30(0.16) 0.78(0.12)
NormalMix 0.35(0.15) 3.49(0.53) 0.7(0.19) 2.94(0.45) 0.52(0.30) 1.11(0.39)
Laplace 0.25(0.13) 2.76(0.46) 0.46(0.13) 1.99(0.27) 0.36(0.16) 0.86(0.19)
Lognormal 0.78(0.79) 5.24(1.38) 1.06(1.07) 4.21(1.91) 1.05(0.88) 0.49(0.77)
t(2) 0.46(0.38) 4.83(1.35) 1.9(1.67) 7.59(7.54) 1.13(1.01) 1.77(1.00)
Table 4:

Estimation results in terms of total integrated mean square error (TIMSE) for simulated SNPs with heterogeneous errors based on 100 replicates.

τ BQRVCSS BQRVC BVCSS BVC QRVC-adp VC-adp
τ=0.3 Normal 0.26(0.11) 3.17(0.57) 0.83(0.24) 2.71(0.44) 0.35(0.24) 1.13(0.30)
NormalMix 0.40(0.20) 4.59(0.79) 1.72(0.86) 5.66(1.35) 0.63(0.54) 1.63(0.57)
Laplace <0.30(0.14) 3.72(0.68) 0.90(0.36) 3.74(0.80) 0.42(0.45) 1.17(0.49)
Lognormal 0.17(0.08) 3.54(0.67) 3.70(2.01) 8.86(3.66) 0.72(0.98) 4.32(3.00)
t(2) 0.66(0.65) 5.92(1.54) 4.64(5.71) 16.16(23.82) 2.09(4.68) 3.78(4.41)
τ=0.5 Normal 0.17(0.08) 3.11(0.48) 0.82(0.21) 2.10(0.29) 0.25(0.18) 1.09(0.31)
NormalMix 0.25(0.12) 4.2(0.71) 1.66(0.63) 4.56(1.02) 0.68(0.73) 1.74(0.71)
Laplace 0.18(0.12) 3.78(0.63) 0.46(0.26) 3.18(0.55) 0.23(0.16) 0.85(0.45)
Lognormal 0.17(0.08) 4.20(0.74) 2.88(4.66) 9.86(9.94) 0.7(1.14) 2.79(3.26)
t(2) 0.3(0.16) 4.75(0.66) 3.19(4.68) 12.78(12.71) 1.55(1.46) 3.78(3.64)
τ=0.7 Normal 0.25(0.11) 3.40(0.59) 0.80(0.22) 2.63(0.39) 0.30(0.13) 1.12(0.29)
NormalMix 0.39(0.17) 4.77(0.76) 1.35(0.49) 4.76(0.97) 0.94(1.08) 1.85(0.77)
Laplace 0.25(0.11) 4.14(0.70) 0.88(0.25) 3.57(0.64) 0.43(0.53) 1.30(0.50)
Lognormal 0.58(0.23) 6.55(1.35) 5.32(22.78) 9.11(11.68) 1.26(1.18) 2.15(2.68)
t(2) 0.49(0.25) 6.08(0.99) 5.98(9.18) 18.73(22.33) 3.2(3.84) 4.94(5.77)
Table 5:

Empirical 95% coverage probabilities under simulated gene expression data with i.i.d. t(2) error based on 200 replicates.

t(2) error BQRVCSS BQRVC BVCSS BVC
τ=0.3 γ0(v) 0.800 0.875 0.570 0.630
γ1(v) 0.865 0.020 0.815 0.055
γ2(v) 0.950 0.780 0.745 0.825
γ3(v) 0.860 0.000 0.760 0.055
τ=0.5 γ0(v) 0.875 0.935 0.885 0.865
γ1(v) 0.930 0.020 0.850 0.050
γ2(v) 0.960 0.845 0.810 0.835
γ3(v) 0.905 0.015 0.790 0.050
τ=0.7 γ0(v) 0.820 0.905 0.665 0.710
γ1(v) 0.930 0.045 0.870 0.080
γ2(v) 0.940 0.830 0.735 0.850
γ3(v) 0.910 0.020 0.815 0.070

A.3. The estimated quantile varying coefficient functions

Figure 7:

Figure 7:

Estimation of non-zero varying coefficients under the normal mixture error (Error 2) for the proposed method (BQRVCSS) at 50% quantile level. Red line: true varying coefficients. Black line: posterior median estimates of varying coefficients from BQRVCSS. Blue lines: 95% credible intervals for the estimated varying coefficients.

A.4. Evaluation on the convergence of MCMC chains

Figure 8:

Figure 8:

Potential scale reduction factor (PSRF) versus iterations for the varying functions in Figure 7. Black line: PSRF. Red line: the threshold of 1.1. α^j1 to α^j5(j=0,,3) denote the five estimated spline coefficients for the varying coefficient function γj, respectively.

A.5. Hyper-parameters sensitivity analysis

Table 6:

Sensitivity analysis on the choice of the hyperparameter for π0 using different Beta priors for the Laplace error distribution for the 30% quantile. TIMSE: total integrated mean square error.

C O U TIMSE

Beta(0.5,0.5) 0.90 0.10 0.00 0.27(0.12)
Beta(1,1) 0.90 0.10 0.00 0.28(0.12)
Beta(2,2) 0.90 0.10 0.00 0.28(0.11)
Beta(1,5) 0.90 0.10 0.00 0.27(0.11)
Beta(5,1) 0.90 0.10 0.00 0.27(0.11)
Table 7:

Sensitivity analysis on the choice of the hyperparameter for η2 using different Gamma priors for the Laplace error distribution for the 30% quantile. TIMSE: total integrated mean square error.

C O U TIMSE

Gamma(0.1,1) 0.90 0.10 0.00 0.29(0.17)
Gamma(1,1) 0.90 0.10 0.00 0.29(0.16)
Gamma(1,5) 0.90 0.10 0.00 0.30(0.16)
Gamma(2,5) 0.88 0.12 0.00 0.30(0.16)
Gamma(5,1) 0.90 0.10 0.00 0.29(0.16)
Table 8:

Sensitivity analysis on the choice of the hyperparameter for π0 using different Beta priors for the Laplace error distribution for the 50% quantile. TIMSE: total integrated mean square error.

C O U TIMSE

Beta(0.5,0.5) 0.92 0.08 0.00 0.22(0.05)
Beta(1,1) 0.94 0.06 0.00 0.22(0.06)
Beta(2,2) 0.94 0.06 0.00 0.22(0.06)
Beta(1,5) 0.94 0.06 0.00 0.22(0.06)
Beta(5,1) 0.92 0.08 0.00 0.22(0.06)
Table 9:

Sensitivity analysis on the choice of the hyperparameter for η2 using different Gamma priors for the Laplace error distribution for the 50% quantile. TIMSE: total integrated mean square error.

C O U TIMSE

Gamma(0.1,1) 0.96 0.04 0.00 0.22(0.05)
Gamma(1,1) 0.94 0.06 0.00 0.22(0.05)
Gamma(1,5) 0.94 0.06 0.00 0.23(0.05)
Gamma(2,5) 0.94 0.06 0.00 0.22(0.06)
Gamma(5,1) 0.94 0.06 0.00 0.22(0.05)

A.6. Selection of tuning parameters for frequentist methods

We have chosen the tuning parameters for VC-adp and QRVC-adp in terms of the Schwarz-type Information Criterion (SIC):

SIC(λ)=logi=1nL(YiEiβ^Ziα^)+logn2nedf,

where edf is the effective degree of freedom. For QRVC-adp, L() is the quantile check loss function, and edf is the number of zero residuals which has been extensively used as a metric indicating the effective dimension of the fitted quantile regression models. Such a SIC criterion has been commonly adopted in published work on regularized quantile varying coefficient models Noh et al. (2012); Tang et al. (2013, 2012). For VC-adp, L() is the least square loss function, and edf is the total number of nonzero varying coefficients Tang et al. (2012); Wang and Xia (2009). The R codes of VC-adp and QRVC-adp can be obtained through minor modifications to the R codes for methods proposed in Tang et al. (2012) available at Dr. Huixia Wang’s website (https://blogs.gwu.edu/judywang/software/).

We have examined the estimation performance of the two frequentist methods when the tuning parameters are selected using validation. Specifically, after the regularized estimates have been obtained using the training data, the prediction in terms of the check loss for QRVC-adp and least square loss for VC-adp are assessed on an independently generated testing data. For each tuning parameter across the sequence, the prediction performance is assessed on the same testing data. Therefore, the optimal tuning is corresponding to the smallest testing error. Such a method of choosing the tuning parameters is feasible in simulation as the data generating model is available, which is computationally less intensive compared to cross-validation. For illustration purpose, we have conducted the simulation under the 1st setting where gene expression data have been generated with i.i.d errors. The estimation results in Table 10 below are very close to the ones obtained in Table 1 from the main text.

Table 10:

Selecting tuning parameters based on validation: estimation results in terms of total integrated mean square error (TIMSE) for simulated gene expression data with i.i.d. errors based on 100 replicates.

τ=0.3 τ=0.5 τ=0.7
QRVC-adp VC-adp QRVC-adp VC-adp QRVC-adp VC-adp
Normal 0.29(0.09) 0.84(0.26) 0.28(0.13) 0.94(0.22) 0.31(0.09) 1.02(0.33)
NormalMix 0.63(0.52) 1.31(0.52) 0.45(0.24) 1.05(0.28) 0.51(0.23) 1.49(0.26)
Laplace 0.37(0.21) 0.96(0.17) 0.30(0.13) 1.00(0.31) 0.35(0.16) 1.23(0.25)
Lognormal 0.28(0.48) 2.63(0.77) 0.51(0.65) 2.13(0.90) 0.98(0.57) 1.77(0.71)
t(2) 1.19(0.91) 2.61(1.32) 0.82(0.68) 2.23(1.45) 1.18(1.13) 2.56(1.27)

A.7. Additional simulation under more complicated varying coefficient functions

Table 11:

Additional simulation under more complicated varying coefficient functions (γ0(v)=2+2sin(6πv)): estimation results in terms of total integrated mean square error (TIMSE) for simulated gene expression data with i.i.d. errors based on 100 replicates.

τ BQRVCSS BQRVC BVCSS BVC QRVC-adp VC-adp
τ=0.3 Normal 2.22(0.28) 4.91(0.60) 2.14(0.29) 4.20(0.47) 2.52(0.43) 2.58(0.33)
NormalMix 2.49(0.52) 5.71(0.92) 2.49(0.47) 5.25(0.72) 2.89(0.71) 2.80(0.44)
Laplace 2.42(0.56) 5.41(0.76) 2.18(0.38) 4.47(0.58) 2.90(0.70) 2.57(0.37)
Lognormal 2.17(0.41) 5.34(0.87) 3.79(1.17) 7.45(2.38) 2.85(0.79) 3.80(0.77)
t(2) 2.74(0.61) 6.67(1.69) 4.41(3.97) 9.76(5.29) 4.73(3.27) 4.32(2.60)
τ=0.5 Normal 2.02(0.34) 4.99(0.65) 1.81(0.21) 3.83(0.39) 2.31(0.37) 2.44(0.85)
NormalMix 2.26(0.48) 5.76(0.83) 2.11(0.36) 4.87(0.61) 2.81(0.70) 2.76(1.08)
Laplace 2.21(0.50) 5.36(0.59) 1.96(0.43) 4.40(0.60) 2.68(0.72) 2.55(0.88)
Lognormal 2.34(0.48) 6.08(0.77) 3.23(1.21) 7.32(3.39) 3.07(0.87) 3.40(1.14)
t(2) 2.53(0.74) 6.16(0.89) 5.12(9.20) 9.88(7.15) 4.03(2.94) 4.82(2.91)
τ=0.7 Normal 2.20(0.32) 5.03(0.58) 2.28(0.22) 4.05(0.41) 2.67(0.57) 3.39(1.60)
NormalMix 2.44(0.52) 5.75(0.74) 2.57(0.50) 5.34(0.82) 2.93(0.77) 4.51(2.00)
Laplace 2.42(0.39) 5.43(0.70) 2.14(0.40) 4.44(0.61) 2.85(0.52) 3.86(1.90)
Lognormal 3.07(0.92) 7.30(1.45) 3.02(1.66) 6.73(3.14) 4.22(1.56) 3.58(1.72)
t(2) 2.73(0.73) 6.68(1.11) 4.92(4.32) 9.14(3.79) 4.09(2.67) 6.16(3.66)
Figure 9:

Figure 9:

Estimation of more complicated non-zero varying coefficients (γ0(v)=2+2sin(6πv)) under the normal mixture error (Error 2) for the proposed method (BQRVCSS) at 50% quantile level. Red line: true varying coefficients. Black line: posterior median estimates of varying coefficients from BQRVCSS. Blue lines: 95% credible intervals for the estimated varying coefficients.

Figure 10:

Figure 10:

Estimation of more complicated non-zero varying coefficients (γ0(v)=2+2sin(6πv)) under the normal mixture error (Error 2) for the QRVC-adp at 50% quantile level. Red line: true varying coefficients. Black line: estimated varying coefficients from QRVC-adp. The confidence intervals are not available for frequentist regularized quantile varying coefficients.

Figure 11:

Figure 11:

Estimation of non-zero varying coefficients under the normal mixture error (Error 2) for the proposed method (BQRVCSS) at 50% quantile level. Red line: true varying coefficients. Black, Blue and Green lines: posterior median estimates of varying coefficients from BQRVCSS under 25%, 50% and 75% IMSE respectively.

B. Sensitivity analysis on smoothness specification

Let O denote the degree of B spline basis and Nn denote the number of interior knots. For quadratic and cubic splines corresponding to O=2 and O=3 respectively, we conduct a sensitivity analysis for the proposed model.

Table 12:

Sensitivity analysis on smoothness specification for the Laplace error distribution for the 30% quantile. TIMSE: total integrated mean square error.

O=2 Nn 1 2 3 4 5

Laplace C 0.88 0.90 0.92 0.89 0.91
O 0.12 0.10 0.08 0.11 0.09
U 0.00 0.00 0.00 0.00 0.00
TIMSE 0.33(0.19) 0.28(0.12) 0.31(0.14) 0.24(0.12) 0.25(0.15)

O=3 Nn 1 2 3 4 5

Laplace C 0.89 0.90 0.92 0.86 0.88
O 0.11 0.10 0.08 0.14 0.12
U 0.00 0.00 0.00 0.00 0.00
TIMSE 0.25(0.11) 0.28(0.12) 0.28(0.15) 0.26(0.19) 0.25(0.16)

Table 13:

Sensitivity analysis on smoothness specification for the Normal error distribution for the 30% quantile. TIMSE: total integrated mean square error.

O=2 Nn 1 2 3 4 5

Normal C 0.97 0.96 0.98 0.95 0.94
O 0.03 0.04 0.04 0.05 0.06
U 0.00 0.00 0.00 0.00 0.00
TIMSE 0.26(0.12) 0.22(0.09) 0.29(0.16) 0.23(0.12) 0.22(0.18)

O=3 Nn 1 2 3 4 5

Normal C 0.96 0.94 0.97 0.94 0.95
O 0.04 0.06 0.03 0.06 0.05
U 0.00 0.00 0.00 0.00 0.00
TIMSE 0.24(0.09) 0.26(0.14) 0.21(0.10) 0.25(0.19) 0.24(0.12)

Table 14:

Sensitivity analysis on smoothness specification for the Laplace error distribution for the 50% quantile. TIMSE: total integrated mean square error.

O=2 Nn 1 2 3 4 5

Laplace C 0.96 0.94 0.92 0.95 0.96
O 0.04 0.06 0.08 0.05 0.04
U 0.00 0.00 0.00 0.00 0.00
TIMSE 0.25(0.11) 0.21(0.09) 0.29(0.16) 0.28(0.11) 0.25(0.19)

O=3 Nn 1 2 3 4 5

Laplace C 0.95 0.93 0.94 0.96 0.93
O 0.05 0.07 0.06 0.04 0.07
U 0.00 0.00 0.00 0.00 0.00
TIMSE 0.24(0.07) 0.31(0.14) 0.26(0.12) 0.22(0.16) 0.26(0.13)

Table 15:

Sensitivity analysis on smoothness specification for the Normal error distribution for the 50% quantile. TIMSE: total integrated mean square error.

O=2 Nn 1 2 A. 3 4 5

Normal C 0.97 0.98 0.96 0.99 0.98
O 0.03 0.02 0.04 0.01 0.02
U 0.00 0.00 0.00 0.00 0.00
TIMSE 0.21(0.06) 0.23(0.13) 0.22(0.07) 0.24(0.14) 0.22(0.09)

O=3 Nn 1 2 3 4 5

Normal C 0.98 0.96 0.98 0.98 0.97
O 0.02 0.04 0.02 0.02 0.03
U 0.00 0.00 0.00 0.00 0.00
TIMSE 0.19(0.07) 0.29(0.11) 0.25(0.07) 0.24(0.14) 0.23(0.08)

C. Posterior inference

C.1. Posterior inference for BQRVCSS

C.1.1. Bayesian hierarchical model
Yi=k=1qEikβk+j=0pαjZij+κ1u˜i+θ12κ2u˜iWi,
u˜1,,u˜ni=1nθexp(θu˜i),i=1,,n,
W1,,Wni=1n12πexp(12Wi2),i=1,,n,
αjgjind(1π0)Nd(0,gjId)+π0δ0(αj),j=1,,p,
gj|η2indGamma(d+12,η22),j=1,,p,
π0Beta(e,f),
θGamma(a,b),
βNq(0,Σβ),
α0Nd(0,Σα0).
C.1.2. Gibbs Sampler
  • The full conditional distribution of u˜i,i=1,,n,
    p(u˜irest)12πθ1κ22u˜iexp(12(YiEiβj=0pαjZijκ1u˜i)2κ22θ1u˜i)θexp(θu˜i)(u˜i)12exp(12(YiEiβj=0pαjZij)κ22θ1u˜i12κ12u˜iθ1κ22θu˜i)(u˜i)12exp(12((θκ12κ22+2θ)u˜i+θ(YiEiβj=0pαjZijκ221u˜i)).
    Hence, it follows that
    u˜i1restInverse-Gaussian(κ12+2κ22(YiEiβZiα)2,(θκ12κ22+2θ)).
  • The full conditional distribution of αj,j=1,,p,
    p(αjrest)i=1nexp(θ2κ22u˜i(YiZi,jαjZijαjEiβκ1u˜i)2)×((1π0)(2πgj)d2exp(12αj(gjId)1αj)I(αj0)+π0δ0(αj)).
    Let lj=p(αj=0rest), then the full conditional posterior distribution of αj(j=1,,p) is given as:
    αjrest(1lj)Nd(μj,Σj)+ljδ0(αj),
    where
    μj=Σjθκ22i=1nZiju˜i(YiZi,jαjEiβκ1u˜i),
    Σj=(θκ22i=1n1u˜iZijZij+gj1Id)1,
    and
    lj=π0π0+(1π0)|gjId|12|Σj|12exp(12μjΣjμj).

    Hence, the posterior distribution of αj is a mixture of a multivariate normal distribution and a point mass at 0. At each iteration of MCMC, αj is drawn from Nd(μj,Σj) with probability (1lj) and is set to 0 with probability lj.

  • The full conditional distribution of θ is
    p(θrest)i=1nθexp(θ(Yij=0pαjZijEiβκ1u˜i)2κ22u˜i)×i=1n[θexp(θu˜i)]×θa1exp(bθ)θ32n+a1exp((12i=1n(Yij=0pαjZijEiβκ1u˜i)2κ22u˜i+i=1nu˜i+b)θ).
    Therefore,
    θrestGamma(32n+a,12i=1n(Yij=0pαjZijEiβκ1u˜i)2κ22u˜i+i=1nu˜i+b).
  • The full conditional distribution of η2 is
    p(η2rest)j=1p[(η22)d+12exp(η22gj)]×(η2)c1exp(mη2)(η2)(d+1)p2+c1exp((12j=1pgj+m)η2).
    It follows that
    η2restGamma((d+1)p2+c,12j=1pgj+m).
  • The full conditional distribution of gj(j=1,,p) is
    p(gjrest)((1π0)(2πgj)d2exp(12αj(gjId)1αj)I(αj0)+π0δ0(αj))×gjd12exp(η22gj).
    Then,
    gj1rest{Inverse-Gamma(d+12,η22)ifαj=0Inverse-Gaussian(η2αjαj,η2)ifαj0.
  • The full conditional distribution of π0 is
    p(π0rest)j=1p((1π0)(2πgj)d2exp(12αj(gjId)1αj)I(αj0)+π0δ0(αj))×π0e1(1π0)f1
    Let
    Qj={0ifαj=01ifαj0,
    consequently,
    π0restBeta(1+pj=1pQj+e,j=1pQj+f)
  • The full conditional distribution of β is
    p(βrest)i=1nexp(θ2κ22u˜i(Yij=0pαjZijEiβκ1u˜i)2)exp(12βΣβ1β)exp(12(β(i=1nθEiEiκ22u˜i+Σβ1)β2i=1nθκ22u˜i(Yij=0pαjZijκ1u˜i)Eiβ)),
    therefore, we have
    βrestNq(μβ,Σβ),
    with
    Σβ=(i=1nθEiEiκ22u˜i+Σβ1)1,
    and
    μβ=Σβ(i=1nθκ22u˜i(Yij=0pαjZijκ1u˜i)Ei).
  • Similarly the full conditional distribution of α0 is derived as
    α0restNd(μ0,Σ0),
    with
    Σ0=(i=1nθZi0Zi0κ22u˜i+Σα01)1
    and
    μ0=Σ0(i=1nθκ22u˜i(YiEiβj=1pαjZijκ1u˜i)Zi0).

C.2. Posterior inference for BQRVC

C.2.1. Bayesian hierarchical model
Yi=Eiβ+j=0pαjZij+κ1u˜i+κ2θ12u˜iWi,i=1,,n,
u˜1,,u˜ni=1nθexp(θu˜i),i=1,,n,
W1,,Wni=1n12πexp(12Wi2),i=1,,n,
αjgjindNd(0,gjId),j=1,,p,
gj|η2indGamma(d+12,η22),j=1,,p,
π0Beta(e,f),
θGamma(a,b),
η2Gamma(c,m),
βNq(0,Σβ),
α0Nd(0,Σα0).
C.2.2. Gibbs Sampler
  • The full conditional distribution of u˜i,i=1,,n,
    π(u˜irest)12πθ1κ22u˜iexp(12(YiEiβZiακ1u˜i)2κ22θ1u˜i)θexp(θu˜i)(u˜i)12exp(12(YiEiβZiα)2κ22θ1u˜i12κ12u˜iθ1κ22θu˜i)(u˜i)12exp(12((θκ12κ22+2θ)u˜i+θ(YiEiβZiα)2κ221u˜i))
    Then, the full conditional distribution of u˜i is
    u˜i1restInverse-Gaussian(κ12+2κ22(YiEiβZiα)2,(θκ12κ22+2θ)).
  • The full conditional distribution of gj(j=1,,p) is
    π(gjrest)(2πgj)d2exp(12αj(gjId)1αj)×gjd12exp(η22gj)gj12exp(12(η2gj+αjαj1gj))
    It follows that
    gj1restInverse-Gaussian(η2αjαj,η2).
  • The full conditional distribution of αj,j=1,,p,
    p(αjrest)i=1nexp(θ2κ22u˜i(YiZi,jαjZijαjEiβκ1u˜i)2)×(2πgj)d2exp(12αj(gjId)1αj)exp(12θκ22i=1n1u˜i(YiZi,jαjEiβκ1u˜i)2)×exp(12(αj(θκ22i=1nZijZiju˜i+gj1Id)αj2θκ22i=1n1u˜i(YiZi,jαjEiβκ1u˜i)Zijαj))
    Denote the covariance
    Σj=(θκ22i=1n1u˜iZijZij+gj1Id)1
    and the mean
    μj=Σjθκ22i=1nZiju˜i(YiZi,jαjEiβκ1u˜i),
    then we have
    αjrestNd(μj,Σj).
  • The full conditional distribution of θ is
    π(θrest)i=1nθexp(θ(YiZiαEiβκ1u˜i)2κ22u˜i)×i=1n[θexp(θu˜i)]×θa1exp(bθ)θ32n+a1exp((12i=1n(YiZiαEiβκ1u˜i)2κ22u˜i+i=1nu˜i+b)θ)
    Therefore,
    θrestGamma(32n+a,12i=1n(YiZiαEiβκ1u˜i)2κ22u˜i+i=1nu˜i+b).
  • The full conditional distribution of η2 is
    π(η2rest)j=1p(η22)d+12exp(η22gj)×(η2)c1exp(mη2)(η2)(d+1)p2+c1exp((12j=1pgj+m)η2)
    It follows that
    η2restGamma((d+1)p2+c,12j=1pgj+m).
  • The full conditional distribution of β is
    π(βrest)i=1nexp(θ2κ22u˜i(Yij=0pαjZijEiβκ1u˜i)2)exp(12βΣβ1β)exp(12(β(i=1nθEiEiκ22u˜i+Σβ1)β2i=1nθκ22u˜i(Yij=0pαjZijκ1u˜i)Eiβ)),
    therefore, we have
    βrestNq(μβ,Σβ),
    with mean
    μβ=Σβ(i=1nθκ22u˜i(Yij=0pαjZijκ1u˜i)Ei)
    and covariance
    Σβ=(i=1nθEiEiκ22u˜i+Σβ1)1.
  • The full conditional distribution of α0 is derived as
    α0restNd(μ0,Σ0),
    where
    Σ0=(i=1nθZi0Zi0κ22u˜i+Σα01)1
    and
    μ0=Σ0(i=1nθκ22u˜i(YiEiβj=1pαjZijκ1u˜i)Zi0).

C.3. Posterior inference for BVCSS

C.3.1. Bayesian hierarchical model
Yβ,α,σ2Nn(Eβ+Zα,σ2In),
αjζj2,σ2ind(1π0)Nd(0,σ2ζj2Id)+π0δ0(αj),j=1,,p,
ζj2|λ2indGamma(d+12,λ22),j=1,,p,
π0Beta(a,b),
σ2Inverse-Gamma(s,h),
λ2Gamma(t,ψ),
βNq(0,Σβ),
α0Nd(0,Σα0).
C.3.2. Gibbs Sampler
  • The full conditional distribution of αj,j=1,,p,
    π(αjrest)exp(12σ2YZjαjZjαjEβ2)×((1π0)(2πσ2ζj2)d2exp(12αj(σ2ζj2Id)1αj)I(αj0)+π0δ0(αj))
    Let lj=p(αj=0rest), then the conditional posterior distribution of αj(j=1,,p) is a multivariate spike-and-slab distribution given as:
    αjrest(1lj)Nd(μj,σ2Σj)+ljδ0(αj),
    where Σj=(ZjZj+ζj2Id)1, μj=ΣjZj(YEβZjαj), and
    lj=π0π0+(1π0)(ζj2)d2|Σj|exp(12μj(σ2Σj)1μj).

    Hence, the posterior distribution of αj is a mixture of a multivariate normal distribution and a point mass at 0.

  • The full conditional distribution of σ2
    π(σ2|rest)(σ2)n2exp(12σ2YZαEβ2)×(1σ2)s+1exp(hσ2)×j=1p((1π0)(2πσ2ζj2)d2exp(12αj(σ2ζj2Id)1αj)I(αj0)+π0δ0(αj))
    Let
    Qj={0ifαj=01ifαj0
    then the posterior distribution of σ2 becomes
    π(σ2rest)(σ2)n2exp(12σ2YZαEβ2)×(1σ2)s+1exp(hσ2)×j=1p(1π0)Qj(σ2)d2j=1pQjj=1pπ01Qjexp(1σ212j=1p(ζj2)1αjαj)(σ2)n2d2j=1pQjs1exp(1σ2(12YZαEβ2+12j=1p(ζj2)1αjαj+h)).
    Therefore,
    σ2restInverse-Gamma(n2+d2j=1pQj+s,12YZαEβ2+12j=1p(ζj2)1αjαj+h).
  • The full conditional distribution of ζj2,j=1,,p,
    π(ζj2rest)((1π0)(2πσ2ζj2)d2exp(12αj(σ2ζj2Id)1αj)I(αj0)+π0δ0(αj))×(ζj2)d12exp(λ22ζj2).
    Then we have
    (ζj2)1rest{Inverse-Gamma(d+12,λ22)ifαj=0Inverse-Gaussian(σ2λ2αj1αj,λ2)ifαj0
  • The full conditional distribution of λ2
    π(λ2rest)j=1p((λ22)d+12exp(λ22ζj2))×(λ2)t1exp(ψλ2)(λ2)12(d+1)p+t1exp((12j=1pζj2+ψ)λ2),
    and we have
    λ2restGamma(12(d+1)p+t,12j=1pζj2+ψ).
  • The full conditional distribution of π0
    π(π0rest)j=1p((1π0)(2πσ2ζj2)d2exp(12αj(σ2ζj2Id)1αj)I(αj0)+π0δ0(αj))×π0a1(1π0)b1π0a+pj=1pQj1(1π0)b+j=1pQj1,
    hence
    π0restBeta(p+aj=1pQj,b+j=1pQj).
  • The full conditional distribution of β
    π(βrest)exp(12σ2YEβZα2)×exp(12βΣβ1β)exp(12(β(EEσ2+Σβ1)β2σ2(YZα)Eβ)),
    and
    βrestNq(μβ,Σβ),
    where Σβ=(EEσ2+Σβ1)1 and μβ=Σβ(1σ2(YZα)E).
  • The full conditional distribution of α0 is
    α0restNd(μ0,Σ0),
    with Σ0=(Z0Z0σ2+Σα01)1 and μ0=Σ0(1σ2(YEβZ0α0)Z0).

C.4. Posterior inference for BVC

C.4.1. Bayesian hierarchical model
Yβ,α,σ2,ζj2Nn(Eβ+Zα,σ2In),
αj|ζj2,σ2Nd(0,σ2ζj2Id),j=1,,p,
ζj2|λ2Gamma(d+12,λ22),j=1,,p,
σ2Inverse-Gamma(s,h),
λ2Gamma(t,ψ),
βNq(0,Σβ),
α0Nd(0,Σα0).
C.4.2. Gibbs Sampler
  • The full conditional distribution of αj,j=1,,p
    p(αjrest)exp(12σ2YZαEβ2)exp(12αj(σ2ζj2Id)1αj)exp(12σ2(αjZjZjαj2αjZj(YEβZjαj)))exp(12αj(σ2ζj2Id)1αj)exp(12σ2(αj(ZjZj+ζj2Id)αj2αjZj(YEβZjαj))),
    Denote Σj=(ZjZj+ζj2Id)1 and μj=ΣjZj(YEβZjαj), then the posterior distribution of αj is
    αjrestNd(μj,σ2Σj),j=1,p.
  • The full conditional distribution of β
    p(βrest)exp(12σ2YEβZα2)×exp(12βΣβ1β)exp(12(β(EEσ2+Σβ1)β2σ2(YZα)Eβ)),
    and we have
    βrestNq(μβ,Σβ)
    which is a multivariate normal distribution, with mean
    μβ=((EEσ2+Σβ1)1(1σ2(YZα)E)
    and covariance
    Σβ=(EEσ2+Σβ1)1.
  • The full conditional distribution of ζj2,j=1,,p
    p(ζj2|rest)(2πσ2ζj2)d2exp(12αj(σ2ζj2Id)1αj)×(ζj2)d12exp(12λ2ζj2)(ζj2)12exp(12(αjαjσ21ζj2+λ2ζj2)),
    therefore (ζj2)1Inverse-Gaussian(σ2λ2αjαj,λ2).
  • The full conditional distribution of λ2
    p(λ2rest)j=1p((λ22)d+12exp(λ22ζj2))×(λ2)t1exp(ψλ2)(λ2)12(d+1)p+t1exp((12j=1pζj2+ψ)λ2),
    then,
    λ2restGamma(12(d+1)p+t,12j=1pζj2+ψ).
  • The full conditional distribution of σ2
    p(σ2rest)(σ2)n2exp(12σ2YZαEβ2)×(1σ2)s+1exp(hσ2)×j=1p(2πσ2ζj2)d2exp(12αj(σ2ζj2Id)1αj)(σ2)n2d(p+1)2s1exp(1σ2(12YZαEβ2+12j=1p(ζj2)1αjαj+h))
    Therefore, the posterior distribution of σ2 is
    σ2restInverse-Gamma(n+dp2+s,12YZαEβ2+12j=1p(ζj2)1αjαj+h).
  • The full conditional distribution of α0 is derived as
    α0restNd(μ0,Σ0),
    where Σ0=(Z0Z0σ2+Σα01)1 and μ0=Σ0(1σ2(YEβZ0α0)Z0).

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  1. Barbieri MM and Berger JO (2004). Optimal predictive model selection. The Annals of Statistics 32(3), 870–897. [Google Scholar]
  2. Brooks SP and Gelman A. (1998, December). General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics 7(4), 434–455. [Google Scholar]
  3. Casella G, Ghosh M, Gill J, and Kyung M. (2010). Penalized regression, standard errors, and Bayesian lassos. Bayesian Analysis 5(2), 369–411. [Google Scholar]
  4. Cornelis MC, Agrawal A, Cole JW, Hansel NN, Barnes KC, Beaty TH, Bennett SN, Bierut LJ, Boerwinkle E, Doheny KF, et al. (2010). The gene, environment association studies consortium (geneva): maximizing the knowledge obtained from gwas by collaboration across studies of multiple conditions. Genetic Epidemiology 34(4), 364–372. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Dai R. and Kolar M. (2021). Inference for high-dimensional varying-coefficient quantile regression. Electronic Journal of Statistics 15(2), 5696–5757. [Google Scholar]
  6. Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, and Rubin DB (2013). Bayesian data analysis. CRC press. [Google Scholar]
  7. Gelman A. and Rubin DB (1992, November). Inference from iterative simulation using multiple sequences. Statistical Science 7(4). [Google Scholar]
  8. Hastie T. and Tibshirani R. (1993). Varying-coefficient models. Journal of the Royal Statistical Society: Series B (Methodological) 55(4), 757–779. [Google Scholar]
  9. Huang Z, Li J, Nott D, Feng L, Ng T-P, and Wong T-Y (2015). Bayesian estimation of varying-coefficient models with missing data, with application to the singapore longitudinal aging study. Journal of Statistical Computation and Simulation 85(12), 2364–2377. [Google Scholar]
  10. Kim H-J, Park J-H, Lee S, Son H-Y, Hwang J, Chae J, Yun JM, Kwon H, Kim J-I, and Cho B. (2015, 09). A common variant of ngef is associated with abdominal visceral fat in Korean men. PLOS ONE 10(9), 1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Kim M-O (2007). Quantile regression with varying coefficients. The Annals of Statistics, 92–108. [Google Scholar]
  12. Kozumi H. and Kobayashi G. (2011). Gibbs sampling methods for Bayesian quantile regression. Journal of Statistical Computation and Simulation 81(11), 1565–1578. [Google Scholar]
  13. Li J, Wang Z, Li R, and Wu R. (2015). Bayesian group lasso for nonparametric varying-coefficient models with application to functional genome-wide association studies. Annals of Applied Statistics 9(2), 640–664. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Li Q, Xi R, and Lin N. (2010). Bayesian regularized quantile regression. Bayesian Analysis 5(3), 533–556. [Google Scholar]
  15. Lv J. and Li J. (2020). High-dimensional varying index coefficient quantile regression model. Statistica Sinica. [Google Scholar]
  16. Ma S. and Song PX-K (2015). Varying index coefficient models. Journal of the American Statistical Association 110(509), 341–356. [Google Scholar]
  17. Ma S, Yang L, Romero R, and Cui Y. (2011). Varying coefficient model for gene–environment interaction: a non-linear look. Bioinformatics 27(15), 2119–2126. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Mirhashemi ME, Shah RV, Kitchen RR, Rong J, Spahillari A, Pico AR, Vitseva O, Levy D, Demarco D, Shah S, Iafrati MD, Larson MG, Tanriverdi K, and Freedman JE (2021, February). The dynamic platelet transcriptome in obesity and weight loss. Arteriosclerosis, Thrombosis, and Vascular Biology 41(2), 854–864. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Noh H, Chung K, and Keilegom IV (2012). Variable selection of varying coefficient models in quantile regression. Electronic Journal of Statistics 6(0), 1220–1238. [Google Scholar]
  20. O’Neil D, Mendez-Figueroa H, Mistretta T-A, Su C, Lane RH, and Aagaard KM (2013). Dysregulation of npas2 leads to altered metabolic pathways in a murine knockout model. Molecular Genetics and Metabolism 110(3), 378–387. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Orthofer M, Valsesia A, Mägi R, Wang Q-P, Kaczanowska J, Kozieradzki I, Leopoldi A, Cikes D, Zopf LM, Tretiakov EO, et al. (2020). Identification of alk in thinness. Cell 181(6), 1246–1262. [DOI] [PubMed] [Google Scholar]
  22. Park T. and Casella G. (2008). The Bayesian lasso. Journal of the American Statistical Association 103(482), 681–686. [Google Scholar]
  23. Ren J, Zhou F, Li X, Chen Q, Zhang H, Ma S, Jiang Y, and Wu C. (2019). Semiparametric Bayesian variable selection for gene-environment interactions. Statistics in Medicine 39(5), 617–638. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Ren J, Zhou F, Li X, Ma S, Jiang Y, and Wu C. (2022). Robust Bayesian variable selection for gene–environment interactions. Biometrics. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Schumaker L. (2007). Spline functions: basic theory. Cambridge University Press. [Google Scholar]
  26. Tang Y, Wang HJ, and Zhu Z. (2013, January). Variable selection in quantile varying coefficient models with longitudinal data. Computational Statistics and Data Analysis 57(1), 435–449. [Google Scholar]
  27. Tang Y, Wang HJ, Zhu Z, and Song X. (2012, April). A unified variable selection approach for varying coefficient models. Statistica Sinica 22(2). [Google Scholar]
  28. Wang H. and Xia Y. (2009). Shrinkage estimation of the varying coefficient model. Journal of the American Statistical Association 104(486), 747–757. [Google Scholar]
  29. Wang HJ, Zhu Z, and Zhou J. (2009). Quantile regression in partially linear varying coefficient models. The Annals of Statistics, 3841–3866. [Google Scholar]
  30. Wang J. and Yang L. (2009). Polynomial spline confidence bands for regression curves. Statistica Sinica, 325–342. [Google Scholar]
  31. Wu C. and Cui Y. (2013). A novel method for identifying nonlinear gene–environment interactions in case–control association studies. Human Genetics 132(12), 1413–1425. [DOI] [PubMed] [Google Scholar]
  32. Wu C, Cui Y, and Ma S. (2014). Integrative analysis of gene–environment interactions under a multi-response partially linear varying coefficient model. Statistics in Medicine 33(28), 4988–4998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Wu C. and Ma S. (2015). A selective review of robust variable selection with applications in bioinformatics. Briefings in Bioinformatics 16(5), 873–883. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Wu C, Shi X, Cui Y, and Ma S. (2015). A penalized robust semiparametric approach for gene–environment interactions. Statistics in Medicine 34(30), 4016–4030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Wu C, Zhong P-S, and Cui Y. (2018). Additive varying-coefficient model for nonlinear gene-environment interactions. Statistical Applications in Genetics and Molecular Biology 17(2). [DOI] [PubMed] [Google Scholar]
  36. Wu C, Zhou F, Ren J, Li X, Jiang Y, and Ma S. (2019). A selective review of multi-level omics data integration using variable selection. High-throughput 8(1), 4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Xue L. and Yang L. (2006). Additive coefficient modeling via polynomial spline. Statistica Sinica, 1423–1446. [Google Scholar]
  38. Yang Y, Wang HJ, and He X. (2016). Posterior inference in Bayesian quantile regression with asymmetric laplace likelihood. International Statistical Review 84(3), 327–344. [Google Scholar]
  39. Yu K. and Moyeed RA (2001). Bayesian quantile regression. Statistics & Probability Letters 54(4), 437–447. [Google Scholar]
  40. Yu K. and Zhang J. (2005). A three-parameter asymmetric laplace distribution and its extension. Communications in Statistics—Theory and Methods 34(9–10), 1867–1879. [Google Scholar]
  41. Zhou F, Ren J, Lu X, Ma S, and Wu C. (2021). Gene–environment interaction: A variable selection perspective. Epistasis, 191–223. [DOI] [PubMed] [Google Scholar]

RESOURCES