Skip to main content
Springer logoLink to Springer
. 2025 Aug 26;35(6):175. doi: 10.1007/s11222-025-10711-w

Bayesian additive tree ensembles for composite quantile regressions

Yaeji Lim 1, Ruijin Lu 2, Madeleine St Ville 3, Zhen Chen 3,
PMCID: PMC12380950  PMID: 40880753

Abstract

In this paper, we introduce a novel approach that integrates Bayesian additive regression trees (BART) with the composite quantile regression (CQR) framework, creating a robust method for modeling complex relationships between predictors and outcomes under various error distributions. Unlike traditional quantile regression, which focuses on specific quantile levels, our proposed method, composite quantile BART, offers greater flexibility in capturing the entire conditional distribution of the response variable. By leveraging the strengths of BART and CQR, the proposed method provides enhanced predictive performance, especially in the presence of heavy-tailed errors and non-linear covariate effects. Numerical studies confirm that the proposed composite quantile BART method generally outperforms classical BART, quantile BART, and composite quantile linear regression models in terms of RMSE, especially under heavy-tailed or contaminated error distributions. Notably, under contaminated normal errors, it reduces RMSE by approximately 17% compared to composite quantile regression, and by 27% compared to classical BART.

Keywords: Bayesian additive regression trees, Composite quantile regression, Heavy-tailed errors, Non-linear covariate effects

Introduction

Quantile methods are popular approaches for handling non-Gaussian data in regression modeling framework (Koenker and Hallock 2001; Hao and Naiman 2007). Unlike ordinary least squares (OLS) regression that focuses on the mean outcome given predictor variables, quantile regression (QR) examines the effects of covariates on the entire distribution of the response variable, offering a more comprehensive characterization of the data. This method provides robust results in the presence of heavy-tailed errors or outliers (Koenker 2005), making it a valuable tool for many applied settings. QR models have been successfully applied in epidemiology (Lee and Neocleous 2010; Wei et al. 2019), climatology (Haugen et al. 2018; Reich et al. 2011), and economics (Fitzenberger et al. 2013; Marrocu et al. 2015).

Except in situations where a particular quantile level is of interest (e.g., growth percentiles (Wei et al. 2006; Chen and Müller 2012)), the choice of appropriate quantile levels in quantile analysis affects the relative efficiency of the estimators, presenting a challenge in practical applications (Koenker and Bassett Jr 1978; Zhao and Xiao 2014). Zheng et al. (2015) also highlighted challenges in traditional quantile regression approaches that focus on estimating covariate effects at a single or a few prespecified quantile levels, noting the lack of a clear scientific basis for choosing one quantile over a nearby alternative. Additionally, it has been shown that QR can have arbitrarily low relative efficiency compared to OLS. To overcome the drawbacks of traditional QR, Zou and Yuan (2008) proposed a composite quantile regression (CQR) method to address multiple quantile regression models concurrently. Since then, significant efforts have been made to extend CQR. Jiang et al. (2012) pointed out that applying the same weight across different quantile levels is typically suboptimal and introduced weighted CQR (WCQR), which was later enhanced by Zhao and Lian (2016) to improve its efficiency. Unlike traditional QR, CQR does not require selecting specific quantiles and retains the robustness and other desirable properties of the quantile method (Zhao and Xiao 2014). In addition, when the error variance is finite, CQR still enjoys great advantages in terms of estimation efficiency. This approach enhances the flexibility and efficiency of the QR framework, allowing for a more holistic analysis of the response variable’s distribution. Huang and Chen (2015), Xu et al. (2017), and Yuan et al. (2023) are a few examples where CQR have been shown to outperform regular QR.

Both frequentist and Bayesian approaches to QR and CQR are abundant. Under the frequentist framework, Taylor (2000) proposed a quantile regression neural network (QRNN) to estimate the conditional probability distribution of multiperiod financial returns, and Jiang et al. (2013) extended a CQR estimation procedure for single index models. Galvao and Kato (2016) and Powell (2022) developed QR models for panel data. Bayesian approaches to QR make use of the equivalence between the minimization of the loss with the quantile check function of Koenker (2005) and maximization of likelihood function with an asymmetric Laplace distribution (ALD) error term (Yu and Moyeed 2001; Sriram et al. 2013). Using the mixture representation proposed by Kozumi and Kobayashi (2011), a Gibbs sampler for Bayesian QR can be implemented. This approach facilitates the estimation of complex models and the incorporation of prior information, enhancing the overall inferential process. For example, Li et al. (2010) explored regularization in Bayesian QR, while Yang et al. (2016) established the asymptotic validity of posterior inference for pseudo-Bayesian QR methods using an asymmetric Laplace likelihood. Additionally, Benoit and Van den Poel (2017) developed an R package for estimating QR parameters through a Bayesian approach based on the asymmetric Laplace distribution. These developments illustrate the potential of Bayesian methods to address various challenges in QR, including model complexity and computational efficiency. For the composite quantile model, Huang and Chen (2015) proposed a WCQR model where the weight of each component can be treated as an unknown parameter and estimated via Markov chain Monte Carlo (MCMC) sampling in a Bayesian hierarchical framework. This approach allows for greater flexibility in modeling and can lead to improved predictive performance by optimally combining information from multiple quantiles.

In many applications, the linearity assumption between covariates and conditional quantiles might not hold. In these situations, semi- or nonparametric approaches are attractive alternatives. For example, Koenker et al. (1994) considered regression spline approaches for estimating the conditional QR and Yu and Jones (1998) proposed local linear polynomial QR. Similarly, Kai et al. (2010) developed the local polynomial CQR estimators and proved its efficiency for non-normal error distributions. For Bayesian approaches, Thompson et al. (2010) proposed a nonparametric Bayesian QR method using natural cubic splines, offering a flexible alternative to parametric Bayesian QR models, particularly when the linearity assumption fails. More recently, Xu and Reich (2023) proposed a nonlinear simultaneous QR model by specifying a Bayesian nonparametric model for the conditional distribution.

Recently, Bayesian regression trees and their ensembles have demonstrated enhanced predictive performance in least squares regression, binary classification, and multiclass classification contexts. These methods have garnered significant attention due to their flexibility and ability to model complex relationships between variables. Notably, Bayesian additive regression trees (BART) (Chipman et al. 2012) estimate the conditional mean of a response given a set of predictors using a sum of regression trees model, showing remarkable predictive performance across various applications (Sparapani et al. 2016; Zhang et al. 2020). This approach leverages the power of multiple regression trees to capture non-linear interactions and intricate patterns in data, making it a robust tool for various statistical modeling tasks. Furthermore, Linero (2018) demonstrated the utility of the Dirichlet splitting probability prior within the BART framework for both prediction and variable selection problems. Additionally, Linero and Yang (2018) introduced soft decision trees and sparsity-inducing priors in BART, illustrating their promising performance. Several extensions of BART have been proposed to handle different types of outcomes. Notably, Kindo (2016) developed BART-based methods for multinomial, ordinal, and quantile regression as part of his dissertation. Subsequent works further advanced these directions, including multinomial probit BART (Xu et al. 2024) and BART for ordinal outcomes (Lee and Hwang 2024). Basak et al. (2022) developed BART for censored survival data, and Um et al. (2023) extended BART to multivariate skewed responses. These extensions have broadened the applicability of BART, enabling it to address a wider range of statistical challenges beyond traditional regression. Although BART has primarily been used for mean regression, several recent studies have extended it to quantile or tail estimation. For example, Clark et al. (2023) used BART-based vector autoregressions to perform real-time tail forecasting of GDP growth, inflation, and unemployment. In addition, Clark et al. (2024) employed a nonparametric quantile panel regression model that uses BART to capture nonlinear effects, and Baumeister et al. (2024) proposed a mixture BART model with stochastic volatility for forecasting the tails of the conditional distribution. In contrast to these applications, Kindo et al. (2016) developed a quantile version of BART (QBART) by incorporating the asymmetric Laplace distribution into the model, allowing for direct estimation of conditional quantiles. They demonstrated its superiority over linear quantile regression and quantile random forests.

In this paper, we consider the integration of BART into the composite quantile method, creating a BART for composite quantile regression (BART-CQR) to improve prediction accuracy under a wide range of error distributions. Rather than targeting a specific conditional quantile, BART-CQR aims to produce robust and efficient estimates of the conditional mean by aggregating information across multiple quantile levels. This aligns with the original motivation of CQR (Zou and Yuan 2008), which seeks robustness against heavy-tailed and non-Gaussian errors while maintaining estimation efficiency. Related work by Cao et al. (2024) proposed an adaptive trimmed regression approach based on BART, which enhances robustness by incorporating data-driven tuning parameters. However, their method focuses on effectively identifying suspected outliers and removing them from the analysis. In contrast, our BART-CQR method takes a different approach to robustness: rather than detecting and trimming outliers, it leverages the composite quantile framework, which inherently provides robustness to distributional misspecification without excluding any observations. This design enables BART-CQR to utilize the full dataset while maintaining efficiency and robustness in estimating the conditional mean.

Our approach extends the work of QBART by Kindo et al. (2016), which focused on single quantile regression, by generalizing it to composite quantile regression. Therefore, our focus is on accurate and robust prediction of the conditional mean, not on modeling specific quantiles. Our method also builds upon existing Bayesian composite quantile methods (Huang and Chen 2015; Alhamzawi 2016), which are based on linear models, by introducing a flexible additive regression tree framework to better capture nonlinear relationships between outcomes and predictors. This model is a fully Bayesian framework for constructing composite quantile regression trees and their ensembles. Through numerical studies, we verify that the BART-CQR outperforms the classical BART, QBART and CQR models under heavy-tailed distributions and in the presence of non-linearity between the input and output variables. This indicates that the proposed method combines the benefits of ensemble trees and the composite quantile approach, offering a powerful tool for handling complex and heavy-tailed data distributions.

The rest of the article is organized as follows. Section 2 introduces the BART-CQR method in detail, explaining its underlying principles and key features. Section 3 provides details on the posterior sampling of BART-CQR. Section 4 presents simulation results, and Section 5 illustrates a real data application. Conclusion and discussion are presented in Section 6. The R codes for implementing the numerical experiments in this study are available at https://github.com/yaeji-lim/BARTCQR.

BART for composite quantile regressions

Bayesian QR and CQR

Given (yi,xi) for i=1,,n, where yiR and xiRp, consider the following regression model:

yi=β0+xiTβ+ϵi, 1

where β0 is an intercept, β=(β1,,βp)T is the vector of unknown coefficients and ϵii.i.d.ALD(τ,θ). The density function of ALD(τ,θ) is fθ(x|τ)=θ(1-θ)τe-τρθ(x), where θ(0,1) is an asymmetry parameter, τ is a precision parameter, and ρθ(t)=t(θ-I(t<0)) is the check loss function. Then, the conditional θth quantile of yi|xi is

β0+xiTβ+qϵi:=bθ+xiTβ,

where qϵi is the θth quantile of ϵi, and QR estimates coefficients by solving following minimization:

(b^θ,β^)=argminbθ,βi=1nρθyi-bθ-xiTβ. 2

The minimization is exactly equivalent to the maximization of a likelihood function,

i=1n{θ(1-θ)τexp[-τρθ(yi-bθ-xiTβ)]}. 3

The minimum criterion (2) can be extended to the weighted CQR with multiple quantiles θ1,,θK as follows:

argmin(bθ1,,bθK),βi=1nk=1Kωkρθkyi-bθk-xiTβ,

where 0ωk1 is the weight for the kth component with kωk=1. We extend the joint distribution of (3) to the composite model as:

i=1n{k=1Kωkθk(1-θk)τexp[-τρθk(yi-bθk-xiTβ)]}.

Due to the complexity of directly solving this, it is common to introduce a cluster matrix C, where the (ik)th element, the latent variable Cik, is equal to 1 if the ith subject belongs to the kth cluster; otherwise, Cik=0. We assume that each observation belongs to exactly one cluster, i.e., for each i, k=1KCik=1. The complete likelihood is then:

i=1n[P(yi|Ci)×P(Ci)]=i=1nk=1K{ωkθk(1-θk)exp[-τρθk(yi-bθk-xiTβ)]}Cik, 4

where Ci=(Ci1,,CiK).

We place a Laplace prior on β for regularization:

π(β|τ,λ)=τλ2pexp-τλj=1p|βj|,

and the prior can be further represented as

π(β|η2)=j=1p012πsjexp(-βj22sj)η22×exp(-η22sj)dsj,

where η:=τλ. For π(ω), we assume π(ω)=Dirichlet(α1,,αK) with ω=(ω1,,ωK)T. The priors for τ and η2 are assumed to follow gamma distributions.

Then the posterior distribution is given by:

i=1nk=1K{ωkθk(1-θk)exp[-τρθk(yi-bθk-xiTβ)]}Cik×π(β|η2)π(τ,η2)π(ω).

To obtain closed-form conditional distributions, we use the representation of ALD as a mixture of an exponential and a scaled normal distribution (Kozumi and Kobayashi 2011). The regression model (1) can then be expressed as:

yi=β0+xiTβ+τ-1ξ1νi+τ-1ξ2νizi=β0+xiTβ+ξ1ν~i+τ-1/2ξ2ν~izi,

where ξ1=1-2θθ(1-θ) and ξ22=2θ(1-θ). Here, νiexp(1), ν~i:=τ-1νiexp(τ-1) and ziN(0,1) are independent. The hierarchical model for MCMC sampling is then:

yi=bθk+xiTβ+ξ1kν~i+τ-1/2ξ2kν~izi,for allisuch thatCik=1,ν~|τi=1nτexp(-τν~i)forν~=(ν~1,,ν~n)T,zi=1n12πexp(-12zi2)forz=(z1,,zn)T,β,s|η2j=1p12πsjexp(-βj22sj)j=1pη22exp(-η22sj)fors=(s1,,sp)T,(τ,η2)τaτ-1exp(-bττ)(η2)aη-1exp(-bηη2),ωDirichlet(α1,,αK),

where aτ,bτ,aη,bη are hyperparameters.

Denote y=(y1,,yn)T, X=(x1,,xn)T, and b=(bθ1,,bθK)T. The complete likelihood based on ALD form is:

f(y|X,β,s,η2,τ,ν~,b,ω,C)=i=1nk=1K(12πτ-1ξ2k2ν~i)Cikexp[-12i=1nk=1KCik(yi-bθk-xiTβ-ξ1kν~i)2τ-1ξ2k2ν~i].

BART CQR

To allow a semiparametric relationship between outcome and predictors, consider the following regression model:

yi=h(xi)+ϵi,i=1,,n, 5

where h is an unknown function, and ϵii.i.d.ALD(τ,θ). The idea of applying BART to CQR is to model (5) as

yi=bθk+jg(xi;Tj,Mj)+ϵi,for allisuch thatCik=1,

where (T,M)=(Tj,Mj);j=1,,nT, with Tj and Mj being the parameters of the jth tree in the BART model.

We assume that the priors on any two distinct trees in the sum are independent and the prior on τ is independent of the tree priors. Further assuming that given a tree, the priors on its terminal node parameters are independent. Therefore,

p(T,M,ν~,τ)=[j=1nTp(Tj,Mj)]p(ν~|τ)p(τ)=[j=1nTp(Tj)p(Mj|Tj)]p(ν~|τ)p(τ)=[j=1nT[p(Tj)k=1mjp(μjk|Tj)]]p(ν~|τ)p(τ),

where Mj=(μj1,,μjmj), mj is the number of terminal nodes of tree Tj, and nT is the number of trees in the sum.

For the prior p(Tj), we follow a tree generating stochastic process of Chipman et al. (2012).

P(split at depthd)=1d=0αT(1+d)βT,d>0,

where αT(0,1), βT[0,). In addition, we use uniform distribution for both of the distribution over the splitting variable and the distribution over the splitting rule.

Given a tree Tj, the prior on the terminal node parameters is a Gaussian distribution, p(μjk|Tj)N(μμ,σμ2), for k=1,,mj. Hyper-parameters μμ and σμ2 are selected so that the overall effect induced by the prior distributions is in the interval (ymin,ymax) with high probability. As in Kindo et al. (2016), we use the transformation y~=y-yminymax-ymin-0.5, ensuring that the transformed response lies in the (-0.5,0.5) interval. Consequently, p(μjk|Tj)N(0,14κ2nT), for k=1,,mj. We set κ=2, which has been found to yield good results, though it can be optimized through cross-validation.

Following standard Bayesian approach to CQR, we use the expression of ALD as a mixture of an exponential and a scaled normal distribution and obtain the following hierarchical models for BART-CQR:

yi=bθk+jg(xi;Tj,Mj)+ξ1kν~i+τ-1/2ξ2kν~izi,for allisuch thatCik=1,ν~|τi=1nτexp(-τν~i),ττaτ-1exp(-bττ),zi=1n12πexp(-12zi2),ωDirichlet(α1,,αK), 6

where ξ1k=1-2θkθk(1-θk), and ξ2k2=2θk(1-θk).

Posterior Inference

The posterior computation is carried out using a Gibbs sampling algorithm. Based on the hierarchical model in (6), the posterior updating scheme proceeds through the following six steps:

f(ν~i|T,M,Y,X,τ,ν~(-i),b,C,ω)fori=1,,nf((Tj,Mj)|T(-j),M(-j),Y,X,τ,ν~,b,C,ω)forj=1,,nTf(τ|T,M,Y,X,ν~,b,C,ω)f(Ci|T,M,Y,X,τ,ν~,b,C-i,ω)fori=1,,nf(ω|T,M,Y,X,τ,ν~,b,C)f(bθk|T,M,Y,X,τ,ν~,b-θk,C,ω)fork=1,,K.

The full conditional distribution of ν~i follows a Generalized Inverse Gaussian distribution. The distribution of the precision parameter τ is Gamma, and the latent class assignments Ci are drawn from a Multinomial distribution. For the mixture weights ω, we use a Dirichlet distribution, and the intercept bθk has a normal full conditional distribution.

The most challenging part is drawing the regression trees, (Tj,Mj). Each pair is updated using a Bayesian backfitting strategy based on residuals from the other trees, similar to standard BART procedures. First define

rij:=yi-bθk-ljg(xi;Tl,Ml)-ξ1kν~i=g(xi;Tj,Mj)+τ-1/2ξ2kν~izi,for allisuch thatCik=1. 7

Then, draw from (Tj,Mj) is equivalent to draw from a single regression tree rij=g(xi;Tj,Mj)+τ-1/2ξ2kν~izi for i=1,,n. In the tree Tj, assume that there are mj terminal nodes and that nl observations fall into terminal node l for l=1,,mj (n1++nmj=n). Let’s consider the set of observations fall into terminal node l. Since for each yi, there is corresponding Cik such that Cik=1, we can consider a pair {(i,k)|ksuch thatCik=1} for each i. Denote the set of (ik)’s that fall into terminal node l as Gl. Further, define rj=(rj,1,,rj,mj) with rj,l=(r1,j,l,,rnl,j,l), where ri,j,l be a observation that falls into lth terminal node in tree Tj for iGl, and we have ri,j,lN(μjl,τ-1ξ2k2ν~i) for (i,k)Gl. Then the likelihood of the single tree in (7) is

f(rj|X,Tj,Mj,τ,ν~,b,C,ω)=l=1mjf(rj,l|Xl,Tj,Mj,τ,ν~l,b,C,ω),

where

f(rj,l|Xl,Tj,Mj,τ,ν~l,b,C,ω)=(i,k)Gl12πτ-1ξ2k2ν~iexp[-12(ri,j,l-μjl)2τ-1ξ2k2ν~i].

Now, the draw from f((Tj,Mj)|T(-j),M(-j),Y,X,τ,ν~,b,C,ω) can be done by two successive steps as

f(Tj|rj,τ,ν~,b,C,ω), 8
f(Mj|Tj,rj,τ,ν~,b,C,ω). 9

For (8), we use Metropolis-Hastings algorithm. The formula of f(Tj|rj,τ,ν~,b,C,ω) can be derived as

f(Tj|rj,τ,ν~,b,C,ω)p(Tj)p(rj|Mj,Tj,τ,ν~,b,C,ω)p(Mj|Tj,τ,ν~,b,C,ω)dMj=p(Tj)l{(12πτ-1)nl((i,k)Glξ2k-1ν~i-1/2)exp(-12(i,k)Glri,j,l2τ-1ξ2k2ν~i)×τ-1σμ2(i,k)Glν~i-1ξ2k-2+τ-1exp[σμ2((i,k)Glri,j,lν~i-1ξ2k-2)22τ-1(τ-1+σμ2(i,k)Glν~i-1ξ2k-2)]}. 10

Given an update tree, μjl, lth terminal node parameter at tree Tj in (9) can be drawn as:

f(μjl|Tj,rj,τ,ν~,b,C,ω)exp[-((i,k)Gl12τ-1ξ2k2ν~i+12σμ2)μjl2+2(i,k)Glri,j,l2τ-1ξ2k2ν~iμjl],

which is a Gaussian distribution.

Complete derivations and the step-by-step sampling scheme are provided in Appendix A.

After running the algorithm sufficiently long after the burn-in period, we obtain a sequence of posterior draws of f;

f(x)=jg(x;Tj,Mj),

denoted as f1,,fK. The final estimate of f(x) at a given x is then taken as the average of f1,,fK, as in Chipman et al. (2012).

Simulations

We consider Friedman’s five dimensional test function (Friedman 1991) to illustrate various features of the proposed method on simulated data. We construct data by simulating values of x=(x1,,xp), where x1,,xpi.i.d.Uniform(0,1), and

y=f(x)+ϵ=10sin(πx1x2)+20(x3-0.5)2+10x4+5x5+ϵ.

We consider various error distributions to demonstrate the superiority of the proposed method:

  • Normal distribution: ϵN(0,1).

  • Contaminated Normal distribution: ϵ=ϵ1+ζ1ϵ2+ζ2ϵ3, where ζj Bern(0.15),  j=1,2, being independent of Y,ϵ1,ϵ2, and ϵ3, ϵ1N(0,1), ϵ2N(0,92), and ϵ3ALD(τ=1,θ=0.9).

  • t-distribution: ϵt(2).

  • Asymmetric Laplace distribution: ϵALD(τ=1,θ) with θ=0.1,0.9.

The model evaluation metric used is the root mean squared error (RMSE), given by

RMSE=1ni=1n(f(xi)-y^i)2,

where xi=(xi1,,xip) is the ith covariate, f(xi) is the true regression function used to generate the data and y^i is the model’s prediction.

We compare the proposed composite quantile BART model (BART-CQR) with quantile BART (QBART) at quantile levels 25%, 50%, and 75% (Kindo et al. 2016), BART (Chipman et al. 2012), and composite quantile regression (CQ regression) (Huang and Chen 2015). For tree methods, we set the number of trees nT=200, and for the composite quantile methods, we use K=9 quantile levels, θk=kK+1 for k=1,,K. For all methods, we set 3000 burn-in steps and use 8000 iterations.

We set n=200 and p=30 in each simulation and use 100 observations as the training set, with the remaining observations designated for the test set. We run 200 independent replications, and the test dataset performance is presented in Table 1.

Table 1.

RMSE (standard error in parentheses) over 200 replications for Friedman’s five dimensional test function data with different error distributions. Bold face indicates the best performance

BART-CQR QBART (25%) QBART (50%) QBART (75%) BART CQ regression
Normal 1.800 (0.187) 2.641 (0.347) 2.736 (0.324) 2.644 (0.289) 1.739 (0.175) 2.591 (0.240)
Contaminated Normal 2.712(0.427) 8.285(1.630) 8.411(1.660) 7.276(1.487) 3.729(0.629) 3.245(0.440)
t(2) 1.970(0.206) 3.826(1.541) 4.076(1.320) 3.624(0.826) 2.325(0.811) 2.724(0.251)
ALD (θ=0.1) 9.317(1.347) 14.228(2.105) 16.33(2.427) 16.774(2.501) 10.098(1.223) 9.689(1.181)
ALD (θ=0.9) 8.886(1.299) 16.343(2.534) 15.976(2.67) 13.922(2.196) 9.862(1.236) 9.372(1.014)

The simulation results demonstrate that the BART-CQR model consistently outperforms other methods, particularly under heavy-tailed or skewed error distributions. As expected, classical BART performs best under normally distributed errors, where it is well suited; however, its performance deteriorates significantly under non-Gaussian settings. Notably, BART-CQR performs comparably well under normal errors, indicating its effectiveness even in well-behaved scenarios. Under the contaminated normal distribution and t(2) distribution, BART-CQR also yields the best test RMSE, highlighting its robustness. Finally, with the ALD at θ=0.1 and θ=0.9, BART-CQR maintains the lowest test RMSE, confirming its ability to handle complex and asymmetric error structures effectively. Interestingly, CQ regression outperforms QBART even though the underlying regression function is nonlinear. This may be attributed to the composite nature of the composite quantile loss, which aggregates information across multiple quantiles, thereby stabilizing estimation and reducing variance. In contrast, QBART focuses on individual quantiles, which may be more sensitive to noise and outliers, particularly under heavy-tailed or asymmetric error distributions.

To examine the sensitivity of BART-CQR to the prior specification on the error precision parameter τ, we conducted a prior sensitivity analysis using three Gamma prior settings. The results, presented in Appendix B, indicate that the performance of BART-CQR remains robust across a wide range of prior choices.

We conduct additional simulations under the contaminated normal setting with increased training sample sizes to further evaluate the scalability of BART-CQR. The results, reported in Appendix C, show that BART-CQR’s performance improves as the sample size increases, consistently achieving the lowest RMSE compared to competitors.

We further investigate the effect of heteroscedastic errors and correlated input variables under the following model with n=200 and p=30:

y=10sin(πx1x2)+20(x3-0.5)2+10x4+5x5+σ(x)ϵ,

where x:=(x1,,xp)TNp(0,Σ), Σ=(σij)p×p with σij=0.5|i-j|, and σ(x)={(1+2xT)/3}1, and ϵN(0,1). For each of the 200 replications, we randomly split the data into a training set of size 100 and a test set of size 100. The predictive performance was evaluated using the test set only. The results in Table 2 show that BART-CQR consistently achieves the lowest RMSE, demonstrating superior predictive performance under both heteroscedasticity and correlated covariates.

Table 2.

RMSE (standard error in parentheses) over 200 replications for heteroscedastic error model. Bold face indicates the best performance

BART-CQR QBART (25%) QBART (50%) QBART (75%) BART CQ regression
16.623(3.676) 21.347(3.629) 24.568(4.399) 23.345(5.398) 17.419(4.282) 31.297(4.623)

Real Data Examples

For the real data analysis, we consider three benchmark/public datasets:

Ozone Data: This dataset records ozone levels (in parts per billion) in New York from May to September 1973. The predictors include solar radiation level, wind speed, maximum daily temperature, month, and day of measurement. After removing observations with missing values, we have n=153 observations.

Auto Insurance Data: This dataset consists of n=2812 auto insurance policyholders with 56 predictors along with an aggregate paid claim amount. Examples of predictors include the driver’s age, driver’s income, vehicle use (commercial or non-commercial), vehicle type (one of six categories), and the driver’s gender. The response variable is the aggregate claim amount, which is skewed with a significant number of policyholders having zero claims. For non-zero claims, the reported amounts tend to be larger.

Boston Housing Data: This dataset includes n=506 samples. We examine the relationship between the log-transformed corrected median value of owner-occupied housing (in $1000), denoted as mdev, and 13 explanatory variables: crim (per capita crime rate by town), zn (proportion of residential land zoned for lots over 25,000 sq.ft), indus (proportion of non-retail business acres per town), chas (Charles River dummy variable), nox (nitrogen oxides concentration), rm (average number of rooms per dwelling), age (proportion of owner-occupied units built prior to 1940), dis (weighted mean of distances to five Boston employment centers), rad (index of accessibility to radial highways), tax (full-value property-tax rate per $10,000), ptratio (pupil-teacher ratio by town), lstat (percentage of lower status of the population).

These three datasets are available in the R packages datasets (R Core Team 2024), HDtweedie (Qian et al. 2022), and MASS (Venables and Ripley 2002), respectively. To evaluate predictive performance, we apply 5-fold cross-validation to each dataset. Specifically, the data are randomly partitioned into five nearly equal-sized folds. In each iteration, four of the five folds are used for training, and the remaining fold is used for testing. Predictive RMSE is computed on each test fold, and the average RMSE across all five folds is reported as the final performance measure. As in the simulation study, we set the number of trees to nT=200, and for the composite quantile methods, we use K=9 quantile levels.

Table 3 summarizes the results. For all data, the BART-CQR provides the smallest RMSE, implying that the proposed method works well with complex real data sets. While the composite quantile regression model may work well if the variables are linearly related, it may collapse under complicated structures. On the other hand, BART models perform well with various structured data, but we need to determine the proper quantile level for the QBART. The proposed composite quantile BART strikes a balance and provides the best performance in these datasets.

Table 3.

Real data: Test data average RMSE based on 5 folds cross-validation

BART-CQR QBART (25%) QBART (50%) QBART (75%) CQ regression
Ozone Data 16.920 20.587 23.769 21.327 26.149
Auto Insurance Data 8.457 9.352 8.682 9.819 8.584
Boston Housing Data 0.156 0.192 0.190 0.189 0.186

For the interpretation, we consider the effect of predictors on the outcome. However, tree models do not directly provide a summary of the effect of a single predictor, or a subset of predictors, on the outcome. We first examine how many times each variable appeared in the collection of trees, which provides a summary similar to the variable importance plot used in boosting and random forests. For simplicity and conciseness of the paper, we report results only for the Boston housing data, among the three real datasets we considered. Figure 1 shows the barplot of the counts in BART-CQR for Boston housing data. We observe that lstat appears most frequently in the trees, highlighting the significant impact of socio-economic status on housing prices.

Fig. 1.

Fig. 1

Variable used count in the BART-CQR for Boston housing data

Furthermore, as suggested by Chipman et al. (2014), we use Friedman’s partial dependence function (Friedman 2001) to summarize the marginal effect due to a subset of the predictors, xS, by aggregating over the predictors in the complement set, xC, i.e., x=[xS,xC]. The marginal dependence function is defined by fixing xS while aggregating over the observed settings of the complement predictors in the data set: f(xS)=1ni=1nf(xS,xiC).

Figure 2 summarizes the marginal effect of lstat on mdev while aggregating over the other predictors with Friedman’s partial dependence function. We observe a negative effect on mdev, shown by the black solid line, which implies that less affluent neighborhoods have lower home values. Compared to the quadratic regression results, shown by the red dashed line, BART-CQR provides a more robust fitted line and also well-captures the complex non-linear relationship between the predictors and the outcome.

Fig. 2.

Fig. 2

The Boston housing data: the marginal effect of lstat on mdev while aggregating over the other covariates with Friedman’s partial dependence function. The marginal estimate from BART-CQR is shown by the black solid line, and the red dashed line comes from the linear regression model where a quadratic effect of lstat with respect to the logarithm of mdev is assumed

Conclusion and Discussion

In this paper, we proposed a novel Bayesian framework, BART-CQR, which integrates BART with the CQR approach. This method is designed to handle complex, nonlinear relationships and heavy-tailed error distributions, extending the flexibility and robustness of existing quantile-based models. We developed a fully Bayesian hierarchical formulation for BART-CQR and derived an efficient Gibbs sampler for posterior inference. Through comprehensive simulation studies and real data applications, we demonstrated that BART-CQR consistently outperforms classical BART, quantile BART and standard linear composite quantile regression in terms of predictive accuracy. Compared to quantile BART, BART-CQR eliminates the challenge of quantile level selection. Additionally, it offers greater modeling flexibility than standard CQR by capturing nonlinear effects and interactions via tree ensembles. Thus, the proposed BART-CQR method inherits the strengths of both the BART framework and the composite quantile approach, offering a powerful alternative for analyzing complex, non-Gaussian data in moderately sized regression problems with up to approximately 3000 observations.

The proposed approach has a couple of limitations. Although there is room to improve computational efficiency through careful optimization, BART-CQR requires more time than classical BART and quantile BART due to the complexity of the tree ensemble model and MCMC sampling, which may limit its scalability to very large datasets. Moreover, the current method assumes independent and identically distributed errors; its performance under dependent or longitudinal data settings remains to be investigated. In addition, this work does not explore model selection or variable selection properties of BART-CQR, which could be important in high-dimensional settings. Investigating these aspects both theoretically and empirically constitutes an important direction for future research.

Several promising directions exist for future research. Extensions could include incorporating recent advances in BART, such as softBART (Linero and Yang 2018) and BART models with random effects for hierarchical data (Tan and Roy 2019). Furthermore, model selection criteria, theoretical properties of the BART-CQR estimator, and applications to specific domains such as finance, epidemiology, and environmental science warrant further investigation.

Acknowledgements

Research of M. St. Ville, and Z. Chen was supported by the Intramural Research Program of the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD), and Lim’s research was supported by the National Research Foundation of Korea (NRF) funded by the Korea government (2022R1F1A1074134).

Appendix A Posterior updating scheme

The complete likelihood is

f(y|X,T,M,τ,ν~,b,C,ω)=i=1nk=1K(12πτ-1ξ2k2ν~i)Cikexp[-12i=1nk=1KCik(yi-bθk-jg(xi;Tj,Mj)-ξ1kν~i)2τ-1ξ2k2ν~i],

and the posterior updating scheme cycles can be done through the following six posterior draws:

f(ν~i|T,M,Y,X,τ,ν~(-i),b,C,ω) A1
f((Tj,Mj)|T(-j),M(-j),Y,X,τ,ν~,b,C,ω) A2
f(τ|T,M,Y,X,ν~,b,C,ω) A3
f(Ci|T,M,Y,X,τ,ν~,b,C-i,ω) A4
f(ω|T,M,Y,X,τ,ν~,b,C) A5
f(bθk|T,M,Y,X,τ,ν~,b-θk,C,ω). A6

For (A1), we have f(ν~i|T,M,Y,X,τ,ν~(-i),b,C,ω)νi-1/2exp{-12(δ1iν~i-1+δ2iν~i)}, where δ1i:=kCikyi-bθk-jg(xi;Tj,Mj)2τ-1ξ2k2 and δ2i=kCikξ1k2ξ2k2τ-1+2τ. Therefore, we sequentially samples ν~i from a Generalized Inverse Gaussian distribution.

For (A3), we derive that

f(τ|T,M,Y,X,ν~,b,C,ω)τn/2+n+α2-1exp{-τ[i=1nk=1KCik(yi-bθk-jg(xi;Tj,Mj)-ξ1kν~i)22ξ2k2ν~i+i=1nν~i+bτ]},

which is a Gamma distribution, and for f(Ci|T,M,Y,X,τ,ν~,b,C-i,ω) in (A4),

f(Ci|T,M,Y,X,τ,ν~,b,C-i,ω)k=1K{(12πτ-1ξ2k2ν~i)Cikexp[-12Cik(yi-bθk-jg(xi;Tj,Mj)-ξ1kν~i)2τ-1ξ2k2ν~i]}k=1Kωkikk=1K{(ωkξ2k)exp[-12(yi-bθk-jg(xi;Tj,Mj)-ξ1kν~i)2τ-1ξ2k2ν~i]}Cik,

which is a Multinomial(1,p^1,,p^K), where

p^k=(ωkξ2k)exp[-12(yi-bθk-jg(xi;Tj,Mj)-ξ1kν~i)2τ-1ξ2k2ν~i]k{(ωkξ2k)exp[-12(yi-bθk-jg(xi;Tj,Mj)-ξ1kν~i)2τ-1ξ2k2ν~i]}.

The draw in (A5) is Dirichlet(n1+α1,,nK+αK)

f(ω|T,M,Y,X,τ,ν~,b,C)k=1Kωkαk+nk,wherenk:=iCik.

For the intercept (A6),

f(bθk|T,M,Y,X,τ,ν~,b-θk,C,ω)exp(-12iCikτ-1ξ2k2ν~ibθk2+iCikr~ikτ-1ξ2k2ν~ibθk),

where r~ik:=yi-jg(xi;Tj,Mj)-ξ1kν~i, and it is a normal distribution.

Now, for the draw (A2), first define

rij:=yi-bθk-ljg(xi;Tl,Ml)-ξ1kν~i=g(xi;Tj,Mj)+τ-1/2ξ2kν~izi,foryiclusterk. A7

Then, draw from (A2) is equivalent to draw from a single regression tree rij=g(xi;Tj,Mj)+τ-1/2ξ2kν~izi for i=1,,n. In the tree Tj, assume that there are mj terminal nodes and nl observations fall into terminal node l for l=1,,mj (n1++nmj=n). Lets consider the set of observations fall into terminal node l. Since for each yi, there is corresponding Cik such that Cik=1, we can consider a pair {(i,k)|ksuch thatCik=1} for each i. Denote the set of (ik)’s that fall into terminal node l as Gl. Further, define rj=(rj,1,,rj,mj) with rj,l=(r1,j,l,,rnl,j,l), where ri,j,l be a observation that falls into lth terminal node in tree Tj for iGl, and we have ri,j,lN(μjl,τ-1ξ2k2ν~i) for (i,k)Gl. Then the likelihood of the single tree in (A7) is

f(rj|X,Tj,Mj,τ,ν~,b,C,ω)=l=1mjf(rj,l|Xl,Tj,Mj,τ,ν~l,b,C,ω),

where

f(rj,l|Xl,Tj,Mj,τ,ν~l,b,C,ω)=f(rj,l|μjl,τ,ν~l,b,C,ω)=(i,k)Gl12πτ-1ξ2k2ν~iexp[-12(ri,j,l-μjl)2τ-1ξ2k2ν~i].

Now, the draw from f((Tj,Mj)|T(-j),M(-j),Y,X,τ,ν~,b,C,ω) can be done by two successive steps as

f(Tj|rj,τ,ν~,b,C,ω), A8
f(Mj|Tj,rj,τ,ν~,b,C,ω). A9

For (A8), we use Metropolis-Hastings algorithm. We first derive a formula of f(Tj|rj,τ,ν~,b,C,ω):

f(Tj|rj,τ,ν~,b,C,ω)p(Tj)p(rj|Mj,Tj,τ,ν~,b,C,ω)p(Mj|Tj,τ,ν~,b,C,ω)dMj=p(Tj)l[(i,k)Glp(ri,j,l|Mj,Tj,τ,ν~,b,C,ω)×p(μjl|Tj,τ,ν~,b,C,ω)dμjl]=p(Tj)l[{(i,k)Gl12πτ-1ξ2k2ν~iexp[-12(ri,j,l-μjl)2τ-1ξ2k2ν~i]}12πσμ2exp(-μjl22σμ2)dμjl]=p(Tj)l{(12πτ-1)nl((i,k)Glξ2k-1ν~i-1/2)exp(-12(i,k)Glri,j,l2τ-1ξ2k2ν~i)×τ-1σμ2(i,k)Glν~i-1ξ2k-2+τ-1exp[σμ2((i,k)Glri,j,lν~i-1ξ2k-2)22τ-1(τ-1+σμ2(i,k)Glν~i-1ξ2k-2)]}. A10

Then, to draw from f(Tj|rj,τ,ν~,b,C,ω), we first obtain a new candidate tree Tj with a acceptance rate,

α~=min1,f(Tj|rj,τ,ν~,b,C,ω)q(Tj|Tj)f(Tj|rj,τ,ν~,b,C,ω)q(Tj|Tj)=min1,p(Tj)p(rj|Mj,Tj,τ,ν~,b,C,ω)p(Mj|Tj,τ,ν~,b,C,ω)dMjq(Tj|Tj)p(Tj)p(rj|Mj,Tj,τ,ν~,b,C,ω)p(Mj|Tj,τ,ν~,b,C,ω)dMjq(Tj|Tj)=min1,p(rj|Tj,τ,ν~,b,C,ω)p(Tj)q(Tj|Tj)p(rj|Tj,τ,ν~,b,C,ω)p(Tj)q(Tj|Tj).

The transition kernel q(·) assigns probabilities of 0.25, 0.25, 0.40, and 0.10 to the moves GROW, PRUNE, SWAP, and CHANGE respectively. Compared to quantile BART (Kindo et al. 2016), we use the same transition kernel and priors for the trees, with the only difference being the likelihood function.

For example, in the GROW case, the likelihood ratio can be computed as

p(rj|Tj,τ,ν~,b,C,ω)p(rj|Tj,τ,ν~,b,C,ω)=ltwo children[iGlp(ri,j,l|Mj,Tj,τ,ν~,b,C,ω)p(μjl|Tj,τ,ν~,b,C,ω)dμjl]lparent[iGlp(ri,j,l|Mj,Tj,τ,ν~,b,C,ω)p(μjl|Tj,τ,ν~,b,C,ω)dμjl]=(A10)of left child node×(A10)of right child node(A10)of parent node=τ-1(τ-1+σμ2BP)(τ-1+σμ2BL)(τ-1+σμ2BR)×exp{σμ22τ-1AL2τ-1+σμ2BL+AR2τ-1+σμ2BR-AP2τ-1+σμ2BP},

where subscript PL and R denote “parent", “left", and “right" nodes. BP:=(i,k)GPν~i-1ξ2k-2 and AP:=(i,k)GPri,j,lν~i-1ξ2k-2, where GP is the set of (ik)s fall into parent node. Similarly define AL, BL, AR, and BR .

Given an update tree, μjl, lth terminal node parameter at tree Tj in (A9) can be drawn as:

f(μjl|Tj,rj,τ,ν~,b,C,ω)exp[-(i,k)Gl(ri,j,l-μjl)22τ-1ξ2k2ν~i]exp(-μjl22σμ2)exp[-((i,k)Gl12τ-1ξ2k2ν~i+12σμ2)μjl2+2(i,k)Glri,j,l2τ-1ξ2k2ν~iμjl],

which is a Gaussian distribution.

Appendix B Prior Sensitivity Analysis for the Error Precision Parameter τ

As discussed in the main text, BART-CQR assumes an ALD for the error with precision parameter τ for which we place a Gamma prior:

ττaτ-1exp(-bττ).

To assess sensitivity to the hyperparameters (aτ,bτ), we considered three prior configurations reflecting conservative, default, and aggressive prior beliefs (Figure 3):

  • Conservative: (aτ=1,bτ=2)

  • Default: (aτ=2,bτ=1)

  • Aggressive: (aτ=10,bτ=1)

These choices span a wide range of assumptions regarding the concentration and spread of the ALD errors. Our empirical results suggest that the posterior estimates and predictive performance of BART-CQR remain stable across these prior settings, indicating that the method is not highly sensitive to the calibration of τ (see Table 4). Therefore, in contrast to standard BART, our model does not require fine-tuned calibration of the prior on τ, although it remains flexible for users who wish to encode informative prior beliefs.

Fig. 3.

Fig. 3

(Left) Density functions of the ALD with θ=0.5 under three different values of the precision parameter τ. (Right) Prior densities of the precision parameter τ under three different Gamma priors: conservative (aτ=1,bτ=2), default (aτ=2,bτ=1), and aggressive (aτ=10,bτ=1). These priors reflect increasing levels of certainty about the error precision.

Table 4.

RMSE of BART-CQR under different Gamma priors on τ, based on simulations using Friedman’s five dimensional test function with various error distributions

(aτ=1,bτ=2) (aτ=2,bτ=1) (aτ=10,bτ=1)
Normal 1.809(0.174) 1.780(0.162) 1.776(0.165)
Contaminated Normal 2.196(0.208) 2.217(0.201) 2.197(0.194)
t(2) 1.947(0.196) 1.938(0.209) 1.915(0.213)
ALD (θ=0.1) 8.611(1.004) 8.385(0.987) 8.461(0.949)
ALD (θ=0.9) 8.169(1.065) 7.924(0.910) 7.922(0.94)

Appendix C Additional Simulation: Effect of Increasing Sample Size under Contaminated Normal Errors

We conducted simulations under the contaminated normal setting using increased training sample sizes. Specifically, we used test sizes of 100, 100, and 200 for training sizes of 200, 400, and 800, respectively. As shown in Table 5, the performance of BART-CQR improves as the sample size increases, with notably lower RMSE at n=1000 compared to n=200. This trend confirms that the strength of our approach becomes more evident with larger sample sizes. Although performance improves across all methods as the sample size increases, BART-CQR consistently achieves the lowest RMSE across all settings, demonstrating its robustness and efficiency even in larger sample scenarios.

Table 5.

RMSE (standard error in parentheses) over 100 replications under the contaminated normal setting for increasing training sample sizes (100, 400, and 800). Corresponding test sizes were 100, 100, and 200, respectively

BART-CQR QBART (25%) QBART (50%) QBART (75%) BART CQ regression
n=200 2.712(0.427) 8.285(1.630) 8.411(1.660) 7.276(1.487) 3.729(0.629) 3.245(0.440)
n=500 1.722(0.185) 8.580(1.172) 7.680(0.943) 6.309(0.755) 3.333(0.569) 3.050(0.310)
n=1000 1.618(0.245) 7.549(0.692) 6.151(0.616) 4.868(0.413) 3.206(0.363) 2.864(0.274)

Author Contributions

Y.L: first author, methodology, computation and writing. R.L and M.V: methodology, conceptualization and writing. C. Z: corresponding author, methodology, conceptualization and writing.

Funding

Open access funding provided by the National Institutes of Health.

Data Availability

No datasets were generated or analysed during the current study.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. Alhamzawi, R.: Bayesian analysis of composite quantile regression. Stat. Biosci. 8(2), 358–373 (2016) [Google Scholar]
  2. Basak, P., Linero, A., Sinha, D., Lipsitz, S.: Semiparametric analysis of clustered interval-censored survival data using soft Bayesian additive regression trees (SBART). Biometrics 78(3), 880–893 (2022) [DOI] [PubMed] [Google Scholar]
  3. Baumeister, C., Huber, F., Marcellino, M.: Risky Oil: It’s All in the Tails. National Bureau of Economic Research (2024)
  4. Benoit, D.F., Van den Poel, D.: bayesQR: a bayesian approach to quantile regression. J. Stat. Softw. 76, 1–32 (2017)36568334 [Google Scholar]
  5. Cao, T., Wu, J., Wang, Y.G.: An adaptive trimming approach to Bayesian additive regression trees. Complex Intel. Syst. 10(5), 6805–6823 (2024) [Google Scholar]
  6. Chen, K., Müller, H.G.: Conditional quantile analysis when covariates are functions, with application to growth data. J. R. Stat. Soc. Ser. B Stat Methodol. 74(1), 67–89 (2012)
  7. Chipman, H., George, E., Hahn, R., McCulloch, R., Pratola, M., Sparapani, R.: Bayesian additive regression trees, computational approaches, pp. 1–23. Statistics Reference Online, Wiley StatsRef (2014)
  8. Chipman, H.A., George, E.I., McCulloch, R.E.: BART: bayesian additive regression trees. Ann. Appl. Stat. 6(1), 266–298 (2012) [Google Scholar]
  9. Clark, T.E., Huber, F., Koop, G., Marcellino, M., Pfarrhofer, M.: Tail forecasting with multivariate Bayesian additive regression trees. Int. Econ. Rev. 64(3), 979–1022 (2023) [Google Scholar]
  10. Clark, T.E., Huber, F., Koop, G., Marcellino, M., Pfarrhofer, M.: Investigating growth-at-risk using a multicountry nonparametric quantile factor model. J. Bus. Econ. Stat. 42(4), 1302–1317 (2024) [Google Scholar]
  11. Fitzenberger, B., Koenker, R., Machado, J.A.: Economic applications of quantile regression. Springer Science & Business Media (2013)
  12. Friedman, J.H.: Multivariate adaptive regression splines. Ann. Stat. 19(1), 1–67 (1991) [Google Scholar]
  13. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Annals of statistics. p. 1189–1232 (2001)
  14. Galvao, A.F., Kato, K.: Smoothed quantile regression for panel data. Journal of econometrics. 193(1), 92–112 (2016) [Google Scholar]
  15. Hao, L., Naiman, D.Q.: Quantile regression. No. 149, Sage; (2007)
  16. Haugen, M.A., Stein, M.L., Moyer, E.J., Sriver, R.L.: Estimating changes in temperature distributions in a large ensemble of climate simulations using quantile regression. J. Clim. 31(20), 8573–8588 (2018) [Google Scholar]
  17. Huang, H., Chen, Z.: Bayesian composite quantile regression. J. Stat. Comput. Simul. 85(18), 3744–3754 (2015) [Google Scholar]
  18. Jiang, R., Zhou, Z.G., Qian, W.M., Chen, Y.: Two step composite quantile regression for single-index models. Comput. Stat. Data Anal. 64, 180–191 (2013) [Google Scholar]
  19. Jiang, X., Jiang, J., Song, X.: Oracle model selection for nonlinear models based on weighted composite quantile regression. Statistica Sinica. p. 1479–1506 (2012)
  20. Kai, B., Li, R., Zou, H.: Local composite quantile regression smoothing: an efficient and safe alternative to local polynomial regression. J. R. Stat. Soc. Ser. B Stat Methodol. 72(1), 49–69 (2010) [DOI] [PMC free article] [PubMed]
  21. Kindo, B.P.: Bayesian Ensemble of Regression Trees for Multinomial Probit and Quantile Regression. Phd thesis, University of South Carolina (2016)
  22. Kindo, B.P., Wang, H., Hanson, T., Peña, E.A.: Bayesian quantile additive regression trees (2016). https://arxiv.org/abs/1607.02676, arXiv:1607.02676 [stat.ML] [DOI] [PMC free article] [PubMed]
  23. Koenker, R.: Quantile regression, vol. 38. Cambridge University Press, Cambridge (2005) [Google Scholar]
  24. Koenker, R., Bassett Jr, G.: Regression quantiles. Econometrica: journal of the Econometric Society. p. 33–50 (1978)
  25. Koenker, R., Hallock, K.F.: Quantile regression. J. Econ. Perspect. 15(4), 143–156 (2001) [Google Scholar]
  26. Koenker, R., Ng, P., Portnoy, S.: Quantile smoothing splines. Biometrika 81(4), 673–680 (1994) [Google Scholar]
  27. Kozumi, H., Kobayashi, G.: Gibbs sampling methods for Bayesian quantile regression. J. Stat. Comput. Simul. 81(11), 1565–1578 (2011) [Google Scholar]
  28. Lee, D., Neocleous, T.: Bayesian quantile regression for count data with application to environmental epidemiology. J. R. Stat. Soc.: Ser. C: Appl. Stat. 59(5), 905–920 (2010)
  29. Lee, J., Hwang, B.S.: Ordered probit Bayesian additive regression trees for ordinal data. Stat. 13(1), e643 (2024) [Google Scholar]
  30. Li, Q., Xi, R., Lin, N.: Bayesian Regularized Quantile Regression. Bayesian Anal. 5(3), 533–556 (2010) [Google Scholar]
  31. Linero, A.R.: Bayesian regression trees for high-dimensional prediction and variable selection. J. Am. Stat. Assoc. 113(522), 626–636 (2018) [Google Scholar]
  32. Linero, A.R., Yang, Y.: Bayesian regression tree ensembles that adapt to smoothness and sparsity. J. R. Stat. Soc. Ser. B Stat Methodol. 80(5), 1087–1110 (2018)
  33. Marrocu, E., Paci, R., Zara, A.: Micro-economic determinants of tourist expenditure: a quantile regression approach. Tour. Manage. 50, 13–30 (2015) [Google Scholar]
  34. Powell, D.: Quantile regression with nonadditive fixed effects. Empirical Economics 63(5), 2675–2691 (2022) [Google Scholar]
  35. Qian, W., Yang, Y., Zou, H.: HDtweedie: The Lasso for Tweedie’s Compound Poisson Model Using an IRLS-BMD Algorithm (2022), https://CRAN.R-project.org/package=HDtweedie, r package version 1.2
  36. R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2024), https://www.R-project.org/
  37. Reich, B.J., Fuentes, M., Dunson, D.B.: Bayesian spatial quantile regression. J. Am. Stat. Assoc. 106(493), 6–20 (2011) [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Sparapani, R.A., Logan, B.R., McCulloch, R.E., Laud, P.W.: Nonparametric survival analysis using Bayesian additive regression trees (BART). Stat. Med. 35(16), 2741–2753 (2016) [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Sriram, K., Ramamoorthi, R., Ghosh, P.: Posterior Consistency of Bayesian Quantile Regression Based on the Misspecified Asymmetric Laplace Density. Bayesian Anal. 8(2), 479–504 (2013) [Google Scholar]
  40. Tan, Y.V., Roy, J.: Bayesian additive regression trees and the General BART model. Stat. Med. 38(25), 5048–5069 (2019) [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Taylor, J.W.: A quantile regression neural network approach to estimating the conditional density of multiperiod returns. J. Forecast. 19(4), 299–311 (2000) [Google Scholar]
  42. Thompson, P., Cai, Y., Moyeed, R., Reeve, D., Stander, J.: Bayesian nonparametric quantile regression using splines. Comput. Stat. Data Anal. 54(4), 1138–1150 (2010) [Google Scholar]
  43. Um, S., Linero, A.R., Sinha, D., Bandyopadhyay, D.: Bayesian additive regression trees for multivariate skewed responses. Stat. Med. 42(3), 246–263 (2023) [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Venables, W.N., Ripley. B.D.: Modern Applied Statistics with S. Fourth ed. New York: Springer; (2002). https://www.stats.ox.ac.uk/pub/MASS4/, iSBN 0-387-95457-0
  45. Wei, Y., Kehm, R.D., Goldberg, M., Terry, M.B.: Applications for quantile regression in epidemiology. Curr. Epidemiol. Rep. 6, 191–199 (2019) [Google Scholar]
  46. Wei, Y., Pere, A., Koenker, R., He, X.: Quantile regression methods for reference growth charts. Stat. Med. 25(8), 1369–1382 (2006) [DOI] [PubMed] [Google Scholar]
  47. Xu, Q., Deng, K., Jiang, C., Sun, F., Huang, X.: Composite quantile regression neural network with applications. Expert Syst. Appl. 76, 129–139 (2017) [Google Scholar]
  48. Xu, S.G., Reich, B.J.: Bayesian nonparametric quantile process regression and estimation of marginal quantile effects. Biometrics 79(1), 151–164 (2023) [DOI] [PubMed] [Google Scholar]
  49. Xu, Y., Hogan, J., Daniels, M., Kantor, R., I., Mwangi, A.: Augmentation Samplers for Multinomial Probit Bayesian Additive Regression Trees. Journal of Computational and Graphical Statistics, 34(2), 498–508 (2024) [DOI] [PMC free article] [PubMed]
  50. Yang, Y., Wang, H.J., He, X.: Posterior inference in Bayesian quantile regression with asymmetric Laplace likelihood. Int. Stat. Rev. 84(3), 327–344 (2016) [Google Scholar]
  51. Yu, K., Jones, M.: Local linear quantile regression. J. Am. Stat. Assoc. 93(441), 228–237 (1998) [Google Scholar]
  52. Yu, K., Moyeed, R.A.: Bayesian quantile regression. Stat. Probab. Lett. 54(4), 437–447 (2001)
  53. Yuan, X., Xiang, X., Zhang, X.: Bayesian composite quantile regression for the single-index model. PLoS ONE 18(5), e0285277 (2023) [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Zhang, T., Geng, G., Liu, Y., Chang, H.H.: Application of Bayesian additive regression trees for estimating daily concentrations of PM2. 5 components. Atmosphere 11(11), 1233 (2020) [DOI] [PMC free article] [PubMed]
  55. Zhao, K., Lian, H.: A note on the efficiency of composite quantile regression. J. Stat. Comput. Simul. 86(7), 1334–1341 (2016) [Google Scholar]
  56. Zhao, Z., Xiao, Z.: Efficient regressions via optimally combining quantile information. Economet. Theor. 30(6), 1272–1314 (2014) [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Zheng, Q., Peng, L., He, X.: Globally adaptive quantile regression with ultra-high dimensional data. Ann. Stat. 43(5), 2225 (2015) [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Zou, H., Yuan, M.: Composite quantile regression and the oracle model selection theory. Ann. Stat. 36(3), 1108–1126 (2008) [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

No datasets were generated or analysed during the current study.


Articles from Statistics and Computing are provided here courtesy of Springer

RESOURCES