Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Feb 10.
Published in final edited form as: Stat Med. 2022 Nov 25;42(3):246–263. doi: 10.1002/sim.9613

Bayesian Additive Regression Trees for Multivariate Skewed Responses

Seungha Um 1,*, Antonio R Linero 2, Debajyoti Sinha 1, Dipankar Bandyopadhyay 3
PMCID: PMC9851978  NIHMSID: NIHMS1850406  PMID: 36433639

Summary

This paper introduces a nonparametric regression approach for univariate and multivariate skewed responses using Bayesian additive regression trees (BART). Existing BART methods use ensembles of decision trees to model a mean function, and have become popular recently due to their high prediction accuracy and ease of use. The usual assumption of a univariate Gaussian error distribution, however, is restrictive in many biomedical applications. Motivated by an oral health study, we provide a useful extension of BART, the skewBART model, to address this problem. We then extend skewBART to allow for multivariate responses, with information shared across the decision trees associated with different responses within the same subject. The methodology accommodates within-subject association, and allows varying skewness parameters for the varying multivariate responses. We illustrate the benefits of our multivariate skewBART proposal over existing alternatives via simulation studies and application to the oral health dataset with bivariate highly skewed responses. Our methodology is implementable via the R package skewBART, available on GitHub.

Keywords: Bayesian nonparametrics, ensembling methods, nonlinear regression, skew-normal

1 |. INTRODUCTION

Highly skewed multivariate responses are commonly observed in many biomedical and clinical research problems. For example, in a preliminary analysis (see Figure 1) of periodontal disease (PD) status1 among a population of Type-2 diabetic (T2D) Gullah-speaking African-Americans (henceforth, the GAAD study) two popular PD endpoints — the average periodontal pocket depth (PPD) and average clinical attachment level (CAL) — are highly skewed. The data analytic goal here is to assess the cross-sectional associations of important covariates, say, age and diabetes level, on the bivariate responses (average PPD and average CAL of all measured tooth-sites) of any patient from a highly diabetic population.

FIGURE 1.

FIGURE 1

GAAD data: Scatter plot of PPD and CAL responses, averaged for each subject, with marginal density estimates.

To model the relationship between a k-dimensional multivariate observed response yi and a p-dimensional vector of covariates xi without imposing any pre-determined functional form, we consider the nonparametric regression of yi on the corresponding xi, given by

yi=f(xi)+ϵi, (1)

where the errors ϵi are iid with common multivariate density hϵ(·) for i = 1, …, n. The unknown function f returns k-dimensional vectors corresponding to xi. For univariate case with k = 1, a popular strategy for modeling f is to use an ensemble of decision trees, such as random forests2 or boosted decision trees3.

In this paper, we focus on the Bayesian additive regression trees (BART) model introduced by Chipman et al.4. Unlike the parametric regression functions utilized to model skewed multivariate responses5, decision tree ensembles efficiently capture both nonlinear and interaction effects of covariates in f(x), automatically. Additionally, compared to other machine learning algorithms, BART models provide a number of appealing advantages. First, they are typically robust to the choice of tuning parameters, and yield high prediction accuracy. Second, unlike many competing machine learning algorithms, BART allows for uncertainty quantification by providing a full posterior distribution for the predictions. Recent work has also established attractive theoretical properties for BART6,7. BART models have successfully been applied to many statistical problems, including classification8, survival analysis9,10, density estimation11, and high dimensional sparse regression12.

Existing BART models come with various limitations, however. For example, in studies with highly skewed responses, the error density hϵ(·) being a multivariate Gaussian density may not be appropriate. Bhingare et al.,5 showed that the mean PPD and mean CAL are highly skewed, and methods which adapt to skewed responses tend to outperform methods based on Gaussian errors regarding precision of the estimated parameters and predictions of future outcomes. Second, as illustrated by Linero and Yang7, the estimate of f(·) obtained from the usual BART models lack smoothness, leading to suboptimal performance when the true f(x) varies smoothly with x. To improve performance of the estimate when the true f(·) is believed to be smooth, Linero and Yang7 introduced the soft BART (SBART) model.

In this paper, we introduce the skewBART model, which takes the error density hϵ(·) in equation (1) to be a multivariate skewed normal density to accommodate non-Gaussian responses. We also extend the SBART model for univariate response to obtain smooth estimates of f(·) under skewed errors. We develop an efficient Gibbs sampler based on the Bayesian backfitting algorithm4 to perform practical nonparametric Bayesian inference. In addition to accommodating skewness of the error, we introduce a skewed multivariate response model, with multivariate decision trees used for regression functions and a multivariate skewed distribution for the error vector for each subject. This is motivated from the fact that most epidemiological studies/trials on PD model CAL, thereby quantifying the (past) disease history and progression, in contrast to the current disease status measured by PPD13. Hence, a thorough assessment of PD may suggest modeling PPD and CAL considering a multivariate framework. The use of multivariate decision trees has been used both to handle mixed-type responses such as zero-inflated log-normal/gamma distributions14, as well as to implement targeted smoothing over space or time15. To the best of our knowledge, this is the first attempt to bring multivariate decision tree methods to bear in the BART framework to model multivariate skewed response data. To increase the precision of the inference, our multi-skewBART model takes into account the association of the multivariate responses within each subject, borrowing information across the different responses by using the same decision trees for the mean vector. We further illustrate the advantages of our skewBART and multi-skewBART proposals in practice on the GAAD study.

The rest of the paper is organized as follows. After briefly reviewing both the BART model and the important properties of the skew normal distribution, Section 2 introduces the skewBART and multi-skewBART models, respectively for univariate and multivariate responses. Details on the Bayesian inference, including prior specifications and the related Markov chain Monte Carlo (MCMC) algorithm for posterior computation are outlined in Section 3. In Section 4, we compare the finite sample performances (including out-of-sample predictions) of our proposals to existing alternatives using synthetic data. In Section 5, we demonstrate the application to the motivating GAAD oral health dataset. Section 6 concludes with a discussion. Details on the proposed MCMC algorithm appear in the Appendix.

2 |. THE SKEWBART AND MULTI-SKEWBART MODELS

2.1 |. Review of the BART and SBART models

For the scalar response, the BART framework introduced by Chipman et al.4 models f(xi) in (1) as a sum of m trees

f(xi)=t=1mg(xi;𝒯t,t), (2)

where 𝒯t is a binary tree structure with nt leaf nodes and t={μt1,,μtnt} is a collection of parameters for nt leaves of the tth tree 𝒯t with hϵ being a Gaussian density. The function g(xi; 𝒯t, ℳt) returns =1ntμt η(xi;𝒯t,) where η(xi; 𝒯t, ) is the indicator that xi is associated with leaf node in 𝒯t, i.e., g(xi; 𝒯t, ℳt) = μtℓ if-and-only-if xi is associated to leaf of tree 𝒯t; for example, if x = (0.62, 0.45) in Figure 2 (bottom-left), then g(x, 𝒯t, ℳt) = μt1. The tree 𝒯t consists of interior splitting rules of the form [xjCb] for each branch node b, which induces a partition on the covariate space. At each interior node, the splitting rule uses predictor j with prior probability sj, where s = (s1, …, sp) is a probability vector. Throughout this manuscript, we place a Dirichlet(ap, …, ap) hyperprior on s to allow the model to filter out irrelevant predictors12. The cutpoint Cb of the splitting rule is assigned a uniform prior over its possible values.

FIGURE 2.

FIGURE 2

Top: an example of a binary tree 𝒯t where the terminal nodes are labeled with the corresponding parameters μtℓ. Bottom: the corresponding partition of the predictor space 𝒳 = [0, 1]2 from BART (left panel) and SBART (right panel) when κ = 0.1.

To avoid overfitting, each tree is given a regularization prior which shrinks the individual leaf parameters to 0 so that each tree contributes only a small portion of the overall fit. It is regularized by the depth of each tree and the prior distribution of leaf parameters. The probability that each node at depth d is non-terminal is given by γ(1 + d)β for hyperparameters γ ∈ (0, 1) and β ∈ [0, ∞). The default prior specification4 sets γ = 0.95 and β = 2, which encourages the trees to be shallow (rarely exceeding depth 2–3). The leaf parameters μtℓ are given iid 𝒩(0,σμ2/m) priors, ensuring that the prior variance of f(x) is constant as a function of m.

To define a BART model (2) of the non-parametric regression in (1) for the univariate response yi, we must specify a likelihood for the observed data D = {(yi, xi) : i = 1, ⋯ , n}, a prior on the unknown model parameters related to the tree structures, and a prior on the parameters σ2 of the unknown univariate error distribution hϵ(· | σ2). Let 𝓣 = {𝒯1, …, 𝒯m} denote the set of all trees and let 𝓜 = {ℳ1, …, ℳm} denote the set of leaf parameters for all trees. Bayesian computation then proceeds by using MCMC based samples from the joint posterior p(𝓣, 𝓜| D), which is proportional to the likelihood times the joint prior p(θ) of all the unknown parameters θ. When the error distribution is 𝒩(0, σ2), the prior consists of a conjugate inverse gamma prior p(σ2) for σ2, independent priors p(𝒯t) for the tree structures 𝒯t, and conditionally-independent priors for the terminal node parameters. For the leaf node set ℳt = {μtℓ : 1 ≤ nt} conditional on the tree structure 𝒯t, a conjugate normal distribution 𝒩(0,σμ2/m) is widely used as the common independent prior p(μtl). The resulting joint prior distribution factors as

p(𝓣,𝓜,σ2)=t=1m[p(𝒯t,t)]p(σ2)=t=1m[p(t𝒯t)p(𝒯t)]p(σ2)=t=1m[=1ntp(μt𝒯t)]p(𝒯t)p(σ2). (3)

By denoting the (n × 1) vector of partial residuals excluding the tth tree as Rt = (Rt1, …, Rtn) where Rti=yiktg(xi;𝒯k,k), the integrated likelihood function of 𝒯t is

p(𝒯tRt,σ)p(𝒯t)p(Rtt,𝒯t,σ)p(t𝒯t,σ)dt (4)

given 𝒯t = {𝒯k : kt} and ℳt = {ℳk : kt}. For brevity, in equation (4) and all following expressions of posterior and conditional posteriors, we suppress dependence on covariate vector x. Because we have a conjugate prior for ℳt, the left-hand side of (4) is available in closed form. This allows the use of the Bayesian backfitting algorithm16 to update (𝒯t, ℳt) sequentially for t = 1, …, m by using the Metropolis-Hastings algorithm4 to sample 𝒯t using (4) and then sampling ℳt from its full conditional.

Despite the recent popularity of BART, the estimates obtained from BART have some drawbacks; in particular, BART lacks smoothness due to the distinct partitions as illustrated in Figure 2 (bottom-left). To overcome this lack of smoothness, Linero and Yang7 proposed SBART, which forms predictions by averaging over many random paths down the tree. In the splitting rule in SBART at branch b, x goes left at branch b with probability ψ(xjb;Cb,κb)=ψ(xjCbκb), where κb > 0 is a bandwidth parameter controlling the sharpness of the decision. By averaging over all possible paths down the tree, each leaf from SBART has a global impact on f which allows the model to share information across different covariate regions. Figure 2 (bottom-right) illustrates the smoothed partition of the predictors space in SBART with the logistic ψ(u) = (1 + eu)−1. The splitting rules of BART induce a distinct partition of the predictor space 𝒳 = [0, 1]2. On the other hand, SBART smooths the BART partition. Such smoothness is useful when the underlying f(x) is believed to be smooth. We note that if the bandwidth parameter κ approaches 0, the splitting rule of SBART converges to the splitting rule of the original BART model.

Throughout, we write f ~ BARTm(α, β, σμ) and f ~ BARTm if the scalar valued multivariable function f follows the BART prior respectively with the specified hyperparameter values of (α, β, m, σμ) and with the default values of Chipman et al.4. Similarly, we write f ~ SBARTm(α, β, σμ) and f ~ SBARTm to denote SBART priors respectively with pre-specified hyperparameters and with the default values recommended by Linero and Yang7.

2.2 |. Review of the skew-normal distribution

We use 𝒮𝒩(ξ, σ2, α) to denote a univariate skew-normal (SN) distribution17 with probability density function 2σϕ(yξσ)Φ(αyξσ), where ϕ and Φ are the standard normal density and cumulative distribution functions, respectively. The parameters ξ and σ are location and scale parameters, while α is a shape parameter which allows for the density to exhibit skewness. The stochastic representation of the univariate skew-normal distribution, such that E ~ 𝒮𝒩(ξ, σ2, α), is

E=dW+λZ (5)

where W and Z are respectively independent 𝒩(ξ, τ2) and 𝒩+(0, 1) random variables, with λ=(ασ)/1+α2 and τ=σ/1+α2; here, 𝒩+(μ, σ2) denotes a 𝒩(μ, σ2) truncated with the support of (0, ∞).

Different extensions18,19,20 of the univariate SN distribution to the multivariate setting have been proposed. For extending skew-BART to multivariate responses, we consider the generalized multivariate skew-normal (MSN) distribution19,21, which is both practical and flexible. A (k × 1) random vector E follows an MSN distribution, denoted by E ~ 𝒮𝒩k(μ, Σ, Λ), if its probability density function is given by

2kϕk(eμ,Ω)Φk(ΛΩ1(eμ)0,Δ1) (6)

where Λ = diag(λ1, …, λk) is the (k × k) skewness matrix, μk is the location vector, Σ is a (k × k) positive definite scale matrix, Ω = Σ + ΛΛ and Δ = Ik + ΛΣ−1Λ. Also, ϕk(y|μ, Σ) and Φk(y|μ, Σ) are the density and cumulative distribution functions of a multivariate normal distribution, denoted by 𝒩k(μ, Σ), with mean μk and (k×k) positive definite covariance Σ. Note that (6) belongs to the class of fundamental skew distributions considered by Arellano-Valle and Genton19. From Proposition 1 of Arellano-Valle et al.21, E can be represented stochastically as

E=dW+ΛZ, (7)

where W ~ 𝒩k(μ, Σ) and Z ~ 𝒩k+(0, Ik) are independent k-variate normal and truncated normal random vectors. The family of MSN densities has the desirable and practically useful property of being closed under marginalization. Additionally, each of the individual components of the MSN distribution of (6) has a univariate skew-normal marginal of (5). Sahu et al.20 and Bhingare et al.5 present physical interpretations of the univariate skew-normal density of (5), and the multivariate skew-normal density (7), in terms of convolution of a skewing shocks Z to a symmetric distribution.

2.3 |. Univariate skewBART

We first introduce the univariate skewBART model to extend BART to accommodate skewed responses when k = 1. We model the scalar-valued nonparametric regression function f(·) in (1) using a sum-of-trees model with an SN distributed error as

{yi=f(xi)+ϵi  with  ϵi~iid𝒮𝒩(0,σ2,α)f(xi)=t=1mg(xi;𝒯t,t). (8)

Using the stochastic representation of (5), we can rewrite (8) as

yi=t=1mg(xi;𝒯t,t)+λZi+Wi (9)

where Wi ~ 𝒩(0, τ2), Zi ~ 𝒩+(0, 1), λ=ασ/1+α2 and τ=σ/1+α2. For the scalar yi, we use the SBART of Linero and Yang7 with f ~ SBARTm to incorporate smoothness in f(·). The prior specifications described in Section 3.1 along with the reparameterization in (9) allows us to construct a simple data augmentation step to fit the skewBART model of (8).

2.4 |. Multivariate skewBART

We extend the sum-of-trees model (8) using the MSN distribution (7) to accommodate multivariate outcomes. This extension improves prediction accuracy by incorporating the correlation between responses, as well as the dependency in error estimation. Our proposed multivariate skewBART model, called the multi-skewBART model, for the multivariate response Y=(y1,,yn)n×k models the nonparametric multi-variable f(·) in (1) as

{yi=f(xi)+ϵi with ϵi~iid𝒮𝒩k(0,Σ,Λ)f(xi)=t=1mg(xi;𝒯t,t), (10)

where g(xi; 𝒯t, ℳt) returns a k-dimensional μtℓ if xi is associated with the leaf node in 𝒯t for = 1, …, nt. For nonparametric f(·) in (10), we say that f has a multivariate BART prior, denoted by f ~ MultiBARTm(α, β, ΣM), where (α, β, m) are the usual BART hyperparameters for the trees and (k × k) matrix ΣM is the prior covariance of μtℓ ~ 𝒩k(0, ΣM). Further details of this prior specification are presented in Section 3. The tree 𝒯t in (10) has a binary tree structure like univariate skewBART, except that t={μt1,,μtnt} is now a set of nt vectors leaf parameters, each of dimension k, of the t-th tree 𝒯t. Using the stochastic representation of the MSN distribution in (7), the multi-skewBART model of (10) is now expressed as

yi=t=1mg(xi;𝒯t,t)+ΛZi+Wi, (11)

where Zi=(Zi1,,Zik)~iid𝒩k+(0,Ik),Wi=(Wi1,,Wik)~iid𝒩k(0,Σ), and the scale-matrix Σ and the skewness matrix Λ are defined as in (6). For the multivariate model, we use the BART framework to consider the association among k response components without adapting to the smoothness of f.

3 |. PRIOR SPECIFICATION AND POSTERIOR COMPUTATION

Previous works have repeatedly observed that the default priors proposed by Chipman et al.4 work remarkably well in practice; for example, both Chipman et al.4 (for BART) and Linero and Yang7 (for SBART) observe that, across a large number of benchmark datasets, the default priors are often competitive with more computationally expensive methods based on tuning the hyperparameters via cross-validation, with the default priors even occasionally outperforming the cross-validation methods. For the tree structures, we use the default prior for the BART model and provide priors for the rest of the parameters of the error distribution in (1). We then derive Gibbs samplers for fitting the models.

3.1 |. skewBART: Prior choices and MCMC computation

As a preprocessing step, we apply a quantile normalization to each of p components of the covariate vectors so that each covariate is approximately uniform on (0, 1). Additionally, following Chipman et al.4, the dependent variable yi is also standardized. The leaf parameters μtℓ of the standardized yi are assumed independent and identically distributed 𝒩(0,σμ2/m) thus regularizing the effect of the individual tree components to contribute only a small part in the overall fit.

For choosing the bandwidth κb, Linero and Yang7 recommend using tree specific κt’s shared across branches in a fixed tree, with κt ~ Exp(0.1). We specify a half-Cauchy prior τ ~ Cauchy+(0, τ0), where τ0 is chosen empirically. We obtain τ0 by fitting the lasso using the glmnet package in R. We use a conjugate univariate normal 𝒩(0, δ) prior for λ to allow both positive and negative skewness in (8), where a large value of δ corresponds to a non-informative prior opinion about the amount of skewness. In summary, for our univariate response model in (9), the priors are given by f ~ SBARTm and

λ~𝒩(0,δ)Zi~𝒩+(0,1),Wi~𝒩(0,τ2)τ~Cauchy+(0,τ0).

Below we describe our MCMC algorithm with a data augmentation step for fitting skewBART. Our strategy is to augment the latent variables Z = (Z1, …, Zn) within the Bayesian backfitting algorithm16 in order to sample from the posterior distribution

p(𝓣,𝓜,λ,τ,ZY)

using MCMC, where Y = (y1, …, yn) for univariate case. Our Gibbs sampling algorithm uses a Metropolis-within-Gibbs strategy to iteratively update (𝒯t, ℳt) for t = 1, …, m in a fashion that leaves the full conditional

p(𝒯t,t𝒯t,t,Z,λ,τ,Y), (12)

invariant. Then, samples of τ, λ and Z are drawn from the conditional posterior distributions

p(τ𝓣,𝓜,Z,λ,Y),p(λ𝓣,𝓜,Z,τ,Y),p(Z𝓣,𝓜,λ,τ,Y). (13)

Sampling from (𝒩t, ℳt) in (12) is equivalent to sampling from

p(𝒯t,tRt,τ) (14)

where Rt = (Rt1, ⋯ , Rtn) with Rti=yiktg(xi;𝒯k,k)λZi. This allows us to use the existing Bayesian backfitting algorithm to update the (𝒯t, ℳt).4,22 The conditional posterior p(λ | 𝓣, 𝓜, Z, τ, Y) of λ given the rest of the parameters is also Gaussian because it has a form similar to the posterior of the regression parameters of the linear regression model of Y on Z under a Gaussian prior for λ. The full conditionals of the components of Z can be derived in closed form as a collection of independent truncated Gaussian random variables. The Bayesian backfitting algorithm for skewBART is described in Algorithm 1 below, with the exact conditional posterior distributions corresponding to (13) relegated to the Appendix.

3 |.

3.2 |. multi-skewBART: Prior choices and MCMC computation

To extend the univariate skewBART model to multivariate k-dimensional response yi, we need to replace the univariate Zi, μtℓ, λ and Wi with corresponding vectors Zi = (Zi1, ⋯ , Zik), μtℓ = (μ1tℓ, ⋯ , μktℓ), (λ1, ⋯ , λk) and Wi = (Wi1, ⋯ , Wik). We also need to specify the appropriate priors and hyperpriors for these vectors. For prior p(μtℓ | 𝒯t) we use the conjugate multivariate normal distribution, 𝒩k(μM, ΣM), which allows the (4) to have a closed form expression as in the univariate BART model. The priors on μM and ΣM are chosen to assign high probability to the range observed values of each of k responses. Similar to the univariate case with hyperparameter τ of Wi, we use an empirical estimate of Σ to choose the center S01 and the scale ν0 of the inverse-Wishert hyper-prior of (k × k) matrix Σ. For skewness parameters (λ1, …, λk), we use a conjugate multivariate normal prior 𝒩k(0, Σλ). In summary, for our proposed multivariate response model in (11), the priors are given by f ~ MultiBART(0.95, 2, ΣM) and

(λ1,,λk)~𝒩k(0,Σλ),Zi~𝒩k+(0,Ik),Wi~𝒩k(0,Σ),Σ~inverse-Wishart(v0,S01).

We take ΣM to be a diagonal matrix with entries given by the default parameters from the respective univariate BART model of Chipman et al.4 The Gibbs sampler draws (𝒯t, ℳt) given (𝒯t, ℳt, Z, Λ, Σ) from

p(𝒯t,tRt,Σ),

where Rt = (Rt1, …, Rtn) with Rti=yiktg(xi;𝒯k,k)ΛZi. Then, we sample (λ1, …, λk), Σ and Z from each full conditional distribution given as

p(λ1,,λk𝓣,,Z,Σ,Y),
p(Σ𝓣,,Z,λ1,,λk,Y),
p(Z𝓣,,λ1,,λk,Σ,Y).

where ℳ = {ℳ1, …, ℳm} and each ℳt is a collection of k-dimensional leaf vectors associated with tree 𝒯t. The MCMC algorithm for this model is similar to the original MCMC scheme for univariate skewBART model, but now extended to the multivariate setting.

4 |. SIMULATION STUDY

In this section, we conduct simulation studies to compare skewBART and multi-skewBART to existing methods. The simulations are designed to evaluate the estimation accuracy of the regression function and out-of-sample predictive performance.

4.1 |. Univariate responses

We first examine the inferences obtained by skewBART compared to BART and SBART when the true error distribution takes on four different skewness levels. For each replicated dataset we simulate n = 250 observations using the nonparametric regression model of (1) with the known population mean function

yi=10 sin(πxi1xi2)+20(xi30.5)2+10xi4+5xi5+ϵi (15)

introduced by Friedman23, where xi ~ Uniform([0, 1]5) and ϵi~iid𝒮𝒩(0,1,α). For each experiment we use 200 trees, 5000 MCMC samples, and a burn-in of 2500 draws. Figure 3 presents density estimates of the distribution of the residuals from fitting the three competing methods overlaid with the true error density function, corresponding to the various choices of α, the skewness parameter. The plots on the left (upper and lower) panels correspond to moderate skewness, i.e., α = −1 and α = +1, while the two on the right (upper and lower) panels represent a high magnitude of skewness, i.e., α = −10 and α = +10. Under both moderate and heavy skewness, the empirical distribution of the residuals from our proposed skewBART model appears closest to the true error distribution.

FIGURE 3.

FIGURE 3

Simulation study (univariate): Error densities corresponding to four different skewness parameters, α = (1, −1, 10, −10), overlaid with the density estimates from the skewBART, SBART and BART fits.

Next, we compare skewBART to BART when the data generating process varies with skewness levels. We simulate n = 250 observations, with σ2 = 1 and use m = 200 trees. The skewness values are equally spaced points in the range (−10, 10), with increment of 1. We set hyperparameters for the priors of trees in BART and SBART following recommendations by Chipman et al4, and Linero and Yang7. To compare model performance, we use the Conditional Predictive Ordinates (CPO)24,25, where CPOi = p(yi | Yi) is the predictive density of the i-th observation yi given the cross-validated data Yi = (y1, …, yi−1, yi+1, …, yn). For notational convenience, we suppress the dependence of these response distributions on covariates X. A natural summary statistic of the CPOi’s is the log pseudo marginal likelihood (LPML), given by LPML=i=1nlog (CPOi). Two competing models ℳ1 and ℳ2 can be compared using the pseudo-Bayes factor i=1nCPOi1/i=1nCPOi2=exp(LPML1LPML2), where the superscript indicates the model under which these quantities are calculated24. Pseudo-Bayes factors are related to the more well-known Bayes factor24,26, which may not be appropriate under noninformative priors27. The evidence in the literature supporting the use of pseudo-Bayes factors with BART models is limited; however, LPML is still a very practical tool because (i) unlike Bayes factors, LPML is based on leave-one-out cross-validation (making it more robust than Bayes factors under nonparametric and/or noninformative priors), and (ii) it can be conveniently and reliably computed using MCMC samples from the full posterior.

Using the loo package in R, the LPML can be conveniently computed after obtaining the MCMC samples from the joint posterior. Details on the computation of LPML appear in the Appendix. Results corresponding to each skewness level are presented in Figure 4 with 10 replications per simulation setting. Larger values of LPML indicate a better fit of the model.

FIGURE 4.

FIGURE 4

Simulation study (univariate): The average values of LPML over 10 replications of the simulation experiment with skewness α ∈ (−10, 10). Each dot represents the averaged LPML corresponding to each skewness level and smoothing splines are added for each method to ease visualization.

Among the methods considered, skewBART performs the best, demonstrating improvement over SBART as skewness increases. For sufficiently large |α|, we observe a substantial increase in the LPML for skewBART, since skewBART can capture the excess skewness through the non-Gaussian error assumption. Also, the performances of the skewBART and SBART models are indistinguishable for low to negligible skewness, implying that skewBART does not suffer in terms of performance when the responses are not skewed.

We note that, counter-intuitively, the LPMLs for all models increase as skewness increases; this occurs because the error variance is proportional to 1∕(1+α2), so that increasing the skewness decreases the overall noise. Generally, skewBART performs much better than the alternatives as skewness increases.

4.2 |. Bivariate responses

In this section, we examine the benefits of adapting skewness for bivariate responses using synthetic data. We generate yi from the bivariate version of model (1) with k = 2 and p = 5 where, xi ~ Uniform([0, 1]5) and ϵi ~ 𝒮𝒩2(0, Σ, Λ), with the scale matrix Σ and skewness matrix Λ. The function f returns 2-dimensional vector for each 5-dimensional xi. We also use Friedman’s example for f with N = 250 and σ1 = σ2 = 1. Following Arellano-valle et al28, we consider 9 different settings varying with λ1, λ2, the skewness parameters, and ρ, the correlation parameter in the bivariate specification, as displayed in Figure 5. We compare multi-skewBART to multi-BART, the multivariate version of the standard BART model, via LPML. For both models we use 200 trees and 5,000 MCMC draws. Results of this simulation are presented in Table 1. The result shows that multi-skewBART outperforms multi-BART with larger LPML under all settings. It appears that both skewness and correlation have an impact on the performance. Overall, higher correlation leads to higher LPML, with the highest LPML attained under (λ1, λ2) = (0, 3), with fixed correlation for both models. Thus, we conclude that the multi-skewBART fit is better, and its predictive performance remains highly competitive, regardless of the magnitude of skewness and correlation.

FIGURE 5.

FIGURE 5

Simulation study (bivariate): The contour plots of the bivariate skew-normal distribution for μ = (0, 0) and σ1 = σ2 = 1 with different values of (λ1, λ2) and ρ.

TABLE 1.

Simulation study (bivariate): LPML corresponding to the fits of the multi-skewBART and Multi-BART, for various settings of λ1, λ2 and ρ.

multi-skewBART Multi-BART
(λ1, λ2) = (0, 3) ρ = 0 −356.677 −360.145
ρ = 0.5 −330.328 −346.016
ρ = 0.9 −287.791 −308.521
(λ1, λ2) = (2, 3) ρ = 0 −396.685 −427.258
ρ = 0.5 −382.333 −412.068
ρ = 0.9 −386.478 −400.468
(λ1, λ2) = (−2, 2) ρ = 0 −369.528 −396.559
ρ = 0.5 −312.813 −384.575
ρ = 0.9 −311.486 −325.217

Next, we examined how the LPMLs for skewBART and multi-skewBART are impacted by the number of predictors varying from p =5 to 200. Using Friedman’s example, we simulate n = 250 observations with σ2 = 1 and α=3 for skewBART and with σ12=σ22=1, (λ1, λ2) = (−2, 2) and ρ = 0.5 for multi-skewBART. With the m = 200 trees, results are given in the top panel of Figure 6, with 10 replications per simulation setting. We see that, as the the number of predictors increases, multi-skewBART is less sensitive to irrelevant predictors than skewBART. This is because the multi-skewBART model shares information about the relevant predictors across the different outcomes via the sharing of the decision trees14. However, both skewBART and multi-skewBART appear typically insensitive to the number of irrelevant predictors. We note that this is in large part due to our use of the Dirichlet prior proposed by Linero12, and the robustness to the inclusion of irrelevant predictors we observe is consistent with other simulations done on this problem.

FIGURE 6.

FIGURE 6

Simulation study: The average values of LPML over 10 replications with p ∈ (5, 200) (top) with fixed m = 200 trees (top) for skewBART and multi-skewBART. The average values of LPML over 10 replications with m ∈ (10, 200) with fixed p = 5 (bottom). Smoothing splines (with corresponding 95% confidence bands) are added for each method to ease visualization.

The number of trees is an important tuning parameter in the BART model. To examine how the number of trees impacts the LPML for skewBART and multi-skewBART, we fix p = 5 and vary the number of trees from 10 to 200. Results are given in the bottom panel of Figure 6, with 10 replications per simulation setting. We see a slight increase in LPML as the number of trees increases to the optimal choice. This result is consistent with what has been observed in other studies4,14: predictive performance of BART models is typically robust to the number of trees provided we include sufficiently many. We find that this behavior holds for both the skewBART and multi-skewBART models.

5 |. APPLICATION: THE GAAD STUDY

PD primarily results from the inflammation of the gums and bone that surround and support the teeth29. The motivating GAAD study used two correlated biomarkers, the PPD and CAL, of PD status. The CAL is defined as the distance (in mm) down a tooth’s root that is no longer attached to the surrounding bone by the periodontal ligament30. The PPD is defined as the distance (also in mm) from the gingival margin to the bottom of the gingival pocket. The CAL and PPD are measured at six pre-specified tooth sites of each available tooth, excluding the third molars/wisdom teeth31. When no teeth are missing, CAL (PPD) measurements are available from 168 sites of the subject. The Joint EU/USA Periodontal Epidemiology Working Group guidelines recommend the average CAL and the average of PPD as standard measurements for studying the prevalence and severity of PD, and for overall tooth level periodontal status in epidemiological studies32. Thus, we consider the bivariate response (PPDi, CALi) for subject i to be the average of the PPD and the average of CAL values across all available sites (maximum of 168 sites). For sake of brevity, we will omit the word average hereafter and instead just use, say, CAL to mean average CAL of the subject. The study also includes various subject-level covariates, such as age (in years), body mass index (BMI, in kg∕m2), glycated haemoglobin level (HbA1c, in %), gender (1 for female, 0 for male), and smoking status (1 for past or present smoker, 0 for never). Our analysis use 288 subjects with complete covariate information, mostly females (about 76%), mean age of about 55 years (range 26–87 years), 31% smokers, and about 68% obese subjects (defined as BMI ≥ 30). The predominance of female subjects in our data is not spurious, and resonates with proportions of Gullah females recruited in other studies33. Furthermore, the substantial evidence of adverse effects of T2D on PD has been extensively explored in oral health34, and the current study is no exception. The HbA1c is considered a standard of care for testing and monitoring T2D. The American Diabetes Association recommends a target HbA1c level of < 7% (well-controlled), ideally between 4–6, for people with T2D35,36. In our data, we have 60% subjects having poorly controlled T2D (i.e., with HbA1c ≥ 7), while the rest 40% are well-controlled (not T2D free). Unlike previous approaches based on parametric regression functions30 and semiparametric formulations with skewed errors5, our approach incorporates tree-based unknown nonparametric regression function to capture both nonlinear and interaction effects of these covariates. Code for implementing these models is available on GitHub at https://github.com/Seungha-Um/skewBART.

We first present a univariate analysis of the GAAD study by fitting three competing methods (skewBART, BART, and SBART) to the PPD and CAL responses separately. We also consider fitting the models after applying a log transformation to the outcomes, as this is often a popular transformation to ameliorate the skewness and improve the fit of the normal error models. All methods use 5,000 MCMC draws with 200 trees, and we compare their fits via LPML. Results of the model comparison are summarized in Table 2. For both responses, the estimated LPMLs reveal that skewBART outperforms the competing approaches, with or without the log transform — we notice that performing a log transform does result in better LPML, however the skewness of error distributions are still important for model fit even after the log-transform. This suggests that the assumption of a marginal skew-normal error instead of Gaussian error is very appropriate for the analysis of GAAD study. The performance of SBART and BART are similar here; SBART enabling smooth regression function provides only a slight boost compared to BART.

TABLE 2.

GAAD data analysis: Model fit summaries (LPML values) obtained from fitting skewBART, SBART and BART models, separately to the PPD and CAL. The “Transformation” column indicates whether bivariate responses were log-transformed.

Transformation skewBART SBART BART
CAL None −380 −440 −446
Log −358 −360 −364
PPD None −284 −340 −346
Log −272 −277 −279

Now, we apply multi-skewBART to jointly model the bivariate skewed responses, CAL and PPD. We compare the fit of multi-skewBART with multi-BART via LPML, using 5,000 MCMC draws. As expected, the multi-skewBART model allowing different skewness levels for CAL and PPD as well as dependence of PPD and CAL within same subject outperforms (LPML = −614 vs −642) the multi-BART for the bivariate response. Same conclusion holds even for log-transformed responses (LPML = −527 vs −529). We conclude that the multivariate skew-normal assumption improves the modeling of the GAAD study.

Next, we examine how the multivariate model multi-skewBART impacts the prediction accuracy even though both skewBART and multi-skewBART assume SN errors. The multi-skewBART also allows sharing of information across decision trees and captures within-subject association. We compare the skewBART and multi-skewBART models using root mean squared error (RMSE), RMSE2=(n)1i=1n{yiζi^)}2, of each response. Here {(yi,xi):i=1,,n} denotes a collection of held-out observations of a particular response, say, PPD, and ζi^ is the posterior predicted value of these n PPD responses based on rest of the data. To make a direct comparison, the responses are standardized and the results are averaged over 20 splits into training and testing sets. Results are presented in Table 3. Although multi-skewBART doesn’t aim for smooth regression functions, it gives the best performance in terms of RMSE. This result implies that the association between CAL and PPD within each subject plays a more essential role than the smoothness levels of the regression function of the GAAD study. Also, multi-skewBART shares information across the different responses by using the same decision trees to generate predictions for both responses. The central message conveyed here is that separately modeling PPD and CAL responses ignoring their bi-directional dependence fails to share information across two responses and compromises the prediction accuracy.

TABLE 3.

GAAD data analysis: Root mean squared error (RMSE) computed over 20 replications for the responses (PPD and CAL) from the skewBART and multi-skewBART fits.

skewBART multi-skewBART
CAL 1.345 0.838
PPD 0.626 0.423

As there exists strong evidence of a bi-directional association between T2D and PD36,37, we now focus on evaluating the association of varying HbA1c levels on PPD and CAL. The marginal effect of HbA1c levels is evaluated by Friedman’s partial dependence function38 which provides a summary of the effect due to the covariates of interest by averaging over the other covariates. If xi can be partitioned as xi = (xiI, xiC), where xiI is the set of covariates of interest (HbA1c, Gender, smoking status) and xiC is the complement set, then the partial dependence of the response yi at a value xI* of the covariate of interest is defined as

μI(xI*)=E[yixiI=xI*]n1i=1nμ(xI*,xiC)

where E() denotes the expectation operator and μ(x)=f(x)+2/π(λ1,,λk) is the conditional mean of yi given covariate vector (xI*,xiC) (instead of observed xi) under skew-normal error. We then compare the posterior mean predictions obtained from fitting skewBART and multi-skewBART models within the four subgroups defined by the combination of gender and smoking status. The results are displayed in Figure 7 with both skewBART and multi-skewBART fits. We observe that with increasing HbA1c levels both responses exhibit an overall increasing trend regardless of the subgroups or the model used. This reconfirms the overall (positive) association between HbA1c and PD. We can also see the effect of smoothing, with the multi-skewBART fit being much more rugged compared to the univariate skewBART fits. For both the skewBART and multi-skewBART fits for PPD, once gender is fixed, the fits corresponding to the smokers and non-smokers appear close; males are predicted to have a much higher PPD than females, regardless of their smoking status or HbA1c levels. We also observe that the effect of smoking is homogeneous with respect to the level of HbA1c. This implies that males are more likely to be prone to active/current PD (PPD representing current/active disease status) than females29, irrespective of smoking status. However, this is not the story from the univariate fit of CAL, where the differences between the smokers and non-smokers appear prominent within genders. The prediction curves are very close between male non-smokers and female smokers for higher levels (say, ≥ 11) of HbA1c for both models.

FIGURE 7.

FIGURE 7

GAAD data analysis: Plots of the posterior means of the partial dependence functions for CAL and PPD responses as a function of HbA1c corresponding to the 4 subgroups (varying with gender and smoking status) from fitting the skewBART (upper panel), and multi-skewBART (bottom panel) models; age and BMI are averaged over.

In terms of differences between the models, we note that, while skewBART is capable of leveraging smoothness, it produces higher RMSE on test data (see Table 3). This implies that considering the association between responses for the GAAD data is more important in generating good predictions than trying to leverage smoothness in the regression function.

6 |. CONCLUSION

In this paper, we proposed the skewBART model, and extended it to the multi-skewBART model which handles multivariate outcomes. The main idea is to use either a univariate SN density or an MSN density as the error density within the BART framework. We showed that, when the error distribution is skewed, skewBART and multi-skewBART provide better model fits than the original BART model; also, multi-skewBART enjoys additional benefits for multivariate responses due to its ability to account for within-subject dependence of the outcomes and because it uses multivariate decision trees to share information across regression functions. We showed that multi-skewBART produces better model fits on the GAAD data than fitting the two outcomes separately using skewBART.

As pointed out by a reviewer, George et al.39 also considered very flexible extensions to BART by assuming the error distribution to be a Dirichlet process mixture of normals (DPM). This model is capable of addressing skewness, as well as other non-Gaussian features, such as heavy-tailedness of the error. For our purposes, the use of a skew-normal distribution is convenient as it (i) is more parsimonious, allowing for simpler comparisons to non-skewed alternatives; (ii) is less flexible, and hence potentially less prone to over-fitting; and (iii) is easier to extend to multivariate outcomes.

The current work considers two continuous responses, PPD and CAL. However, in practice, data responses (elements in the multivariate response vector) can be of mixed types, such as binary ‘bleeding on probing’ outcomes in PD modeling. In such situations, considering a latent variable, or factor modeling, framework with BART specifications may be worthwhile. Also, the number of available (non-missing) site-level responses (within each subject) can be correlated with the PPD, or CAL responses, leading to the informative cluster size (ICS) scenario40, and exploration of BART under ICS is non-existent. Computationally, we have relied on the Gibbs sampler and Metropolis-Hastings for tree updates. In big data settings (such as in observational PD databases), scalable Bayesian methods will likely be required. Finally, multi-skewBART could also use SBART-style smoothing to enforce smoothness. Although we have shown that accounting for the association between responses is more important for our conclusions than trying to leverage smoothness, it may be possible to improve inferences by extending multi-skewBART to incorporate the SBART framework. All of these are important avenues for future work, and will be considered elsewhere.

Supplementary Material

supinfo2
supinfo1
supinfo3
supinfo4
supinfo5

ACKNOWLEDGEMENTS

The authors thank the anonymous Associate Editor and two reviewers, whose constructive comments led to a significantly improved version of the manuscript. They remain thankful to the Center for Oral Health Research at the Medical University of South Carolina for providing the motivating dataset, and the context of this work. Bandyopadhyay acknowledges partial support from grants R01DE031134 and R21DE031879, awarded by the United States National Institutes of Health. This material is also based upon work supported by the National Science Foundation under Grant No. DMS-214493, the Pfeifer Foundation of Cancer Research, and the Hobbs Foundation.

APPENDIX

A. MCMC FOR SKEWBART

We provide details on the Bayesian backfitting algorithm for the skewBART model. Recall that the model is given by

yi=t=1mg(xi;𝒯t,t)+λZi+Wi,Wi~iid𝒩(0,τ2),
τ~Cauchy+(0,τ0),
λ~𝒩(0,δ).

Throughout, we write [V | •] to denote the full-conditional distribution of V. Since Zi ~ 𝒩+(0, 1), we update Zi as

[Zi]~𝒩+(λRi*λ2+τ2,τ2λ2+τ2)

where Ri*=yif(xi).

Define R*=(R1*,,Rn*) and Z = (Z1, …, Zn). By standard results for Bayesian linear regression, the update of λ is

[λ]~𝒩(ZR*ZZ+1/δ,(ZZ/τ2+1/δ)1)

The full conditional distribution for τ is proportional to

(τ2)n/2 exp[12τ2(R*λZ)(R*λZ)]1π(τ0+τ2/τ0),

which can be sampled using (for example) slice sampling41.

B. MCMC FOR MULTI-SKEWBART

The hierarchical model for multi-skewBART is

yi=t=1g(xi,𝒯t,t)+ΛZi+Wi,Wi~iid𝒩k(0,Σ)
(λ1,,λk)~𝒩k(0,Σλ)
Σ~inverse-Wishart(v0,S01)

where Λ = diag(λ1, ⋯ , λk). Since Zi ~ 𝒩k+(0, Ik) by the property of skew Normal distribution, we update Zi as

[Zi]~𝒩k+((ΛΣ1Λ+Ik)1ΛΣ1Ri*,ΛΣ1Λ+Ik)1),

where Ri*=yitg(xi,𝒯t,Mt). Here 𝒩k+(μ, Σ) is a k-dimensional normal distribution with location vector μ, covariance matrix Σ and truncated in the positive k-dimensional quadrant k+. The conditional posterior to update (λ1, ⋯ , λk) is

[(λ1,,λk)]~𝒩k((i=1nMiΣ1Mi+Σλ1)1i=1nMiΣ1Ri*,(i=1nMiΣ1Mi+Σλ1)1)

where Mi = diag(Zi1, ⋯ , Zik). The conditional posterior to update Σ is

Σ~inverse-Wishart(v0+n,(BB+S0)1)

where B is a matrix whose kth row is (Ri*ΛZi). The Bayesian backfitting algorithm for fitting multi-skewBART is now essentially the same as the algorithm for skewBART but with the updates of (Σ, {Zi}, Λ) replacing the updates for (τ, {Zi}, λ). The only remaining consideration is computing the integrated likelihood, which we do in the following section.

C. THE INTEGRATED LIKELIHOOD FOR MULTI-SKEWBART

To update 𝓣, we require the conditional posterior of 𝒯t given the rest of the parameters. The derivation of this conditional posterior for multi-skewBART is actually a generalization of the derivation of the conditional posterior of 𝒯t for univariate skewBART, and we provide only a sketch of the derivation of this conditional posterior for multi-skewBART. Let ℒt denote the collection of leaf nodes of tree t and let [x] mean that the covariate value x is associated to leaf node of tree t. The conditional posterior is

p(𝒯tRt,Σ)=p𝒯(𝒯t)i=1nϕk(Yif(Xi)+ΛZi,Σ)[tϕk(μtl0,D)]dμt=p𝒯(𝒯t)ti:Xi{ϕk(Rtiμt,Σ)}ϕk(μt0,D)dμt,

where ϕk(U | μ, Σ) is the multivariate 𝒩k(μ, Σ) density and μtℓ has 𝒩k(0, D) prior. This integrated likelihood can be computed in closed form easily by using standard properties of the multivariate Gaussian distribution. Additionally, by conjugacy of the multivariate normal distribution, we have the full conditionals

μt~𝒩k((D1+NΣ1)1Σ1iRti,(D1+NΣ1)1), (C1)

where N is the number of observations associated to leaf of tree t. We adopt the same Metropolis-Hastings steps for modifying the tree structure form as with skewBART.

D. COMPUTATION OF LPML

The logarithm of the pseudo-marginal likelihood (LPML)24,42 is given by LPML=i=1nlog(CPOi), where CPOi = p(yi | Yi) is the predictive density of observed yi given rest of the observed data Yi (suppressing dependence on X). Due to a simplifying result24, a Monte Carlo approximation of CPOi can be obtained using samples θ1, …, θS from the joint posterior p(θ | Y) given full observed data Y as

p(yiYi)=[p(Yθ)p(θ)p(Y)p(yiθ)dθ]111Ss=1S{p(yiθs)}1,

where θ denotes the collection of all model parameters. The above approximation of CPOi is a type of harmonic mean estimator, which are generally known to be unstable (possibly having infinite variance). This approximation of CPOi can be improved by Pareto smoothed importance sampling (PSIS)43. PSIS applies a smoothing procedure to the importance weights by replacing the largest importance sample ratios with the expected order statistics of the fitted generalized Pareto distribution. The reliability of the approximations can be assessed by the estimated shape parameter k^ of the generalized Pareto distribution. The loo package suggests that approximations of our CPOi with k^<0.7 are reliable.

References

  • 1.Fernandes JK, Wiegand RE, Salinas CF, et al. Periodontal disease status in Gullah African Americans with Type 2 diabetesliving in South Carolina. Journal of Periodontology 2009; 80(7): 1062–1068. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Breiman L. Random forests. Machine Learning 2001; 45: 5–32. [Google Scholar]
  • 3.Freund Y, Schapire R. A short introduction to boosting. Journal Japanese Society For Artificial Intelligence For Artificial Intelligence 1999; 14(771–780): 1612. [Google Scholar]
  • 4.Chipman HA, George EI, McCulloch RE. BART: Bayesian additive regression trees. The Annals of Applied Statistics 2010; 4(1): 266 – 298. [Google Scholar]
  • 5.Bhingare A, Sinha D, Pati D, Bandyopadhyay D, Lipsitz SR. Semiparametric Bayesian latent variable regression for skewedmultivariate data. Biometrics 2019; 75(2): 528–538. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Ročková V, Van Der Pas S. Posterior concentration for Bayesian regression trees and forests. The Annals of Statistics 2020; 48(4): 2108 – 2131. [Google Scholar]
  • 7.Linero A, Yang Y. Bayesian regression tree ensembles that adapt to smoothness and sparsity. Journal of the Royal Statistical Society, Series B 2018; 80: 1087–1110. [Google Scholar]
  • 8.Murray JS. Log-Linear Bayesian additive regression trees for multinomial logistic and count regression models. Journal of the American Statistical Association 2021; 116(534): 756–769. [Google Scholar]
  • 9.Sparapani RA, Logan BR, McCulloch RE, Laud PW. Nonparametric survival analysis using Bayesian Additive RegressionTrees (BART). Statistics in Medicine 2016; 35(16): 2741–2753. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Basak P, Linero A, Sinha D, Lipsitz S. Semiparametric analysis of clustered interval-censored survival data using softBayesian additive regression trees (SBART). Biometrics 2022; 78(3): 880–893. [DOI] [PubMed] [Google Scholar]
  • 11.Li Y, Linero AR, Murray J. Adaptive Conditional Distribution Estimation with Bayesian Decision Tree Ensembles. Journal of the American Statistical Association 2022; 0(0): 1–14. [Google Scholar]
  • 12.Linero AR. Bayesian regression trees for high-dimensional prediction and variable selection. Journal of the American Statistical Association 2018; 113(522): 626–636. [Google Scholar]
  • 13.Page RC, Eke PI. Case definitions for use in population-based surveillance of periodontitis. Journal of Periodontology 2007; 78: 1387–1399. [DOI] [PubMed] [Google Scholar]
  • 14.Linero AR, Sinha D, Lipsitz SR. Semiparametric mixed-scale models using shared Bayesian forests. Biometrics 2020; 76(1): 131–144. [DOI] [PubMed] [Google Scholar]
  • 15.Starling JE, Murray JS, Carvalho CM, Bukowski RK, Scott JG. BART with targeted smoothing: An analysis of patient-specific stillbirth risk. The Annals of Applied Statistics 2020; 14(1): 28 – 50. [Google Scholar]
  • 16.Hastie T, Tibshirani R. Bayesian backfitting. Statistical Science 2000; 15(3): 196–213. [Google Scholar]
  • 17.Azzalini A The skew-normal distribution and related multivariate families. Scandinavian Journal of Statistics 2005; 32(2): 159–188. [Google Scholar]
  • 18.Azzalini A, Capitanio A. Statistical applications of the multivariate skew normal distribution. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 1999; 61(3): 579–602. [Google Scholar]
  • 19.Arellano-Valle RB, Genton MG. On fundamental skew distributions. Journal of Multivariate Analysis 2005; 96(1): 93–116. [Google Scholar]
  • 20.Sahu SK, Dey DK, Branco MD. A new class of multivariate skew distributions with applications to Bayesian regression models. Canadian Journal of Statistics 2003; 31(2): 129–150. [Google Scholar]
  • 21.Arellano-Valle R, Bolfarine H, Lachos V. Bayesian inference for skew-normal linear mixed models. Journal of Applied Statistics 2007; 34(6): 663–682. [Google Scholar]
  • 22.Tan YV, Roy J. Bayesian additive regression trees and the General BART model. Statistics in Medicine 2019; 38(25): 5048–5069. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Friedman JH. Multivariate adaptive regression splines. The Annals of Statistics 1991; 19(1): 1–67. [Google Scholar]
  • 24.Gelfand AE, Dey DK. Bayesian model choice: asymptotics and exact calculations. Journal of the Royal Statistical Society. Series B (Methodological) 1994; 56(3): 501–514. [Google Scholar]
  • 25.Gelfand AE. Model determination using sampling-based methods. In: Gilks WR, Richardson S, Spiegelhalter DJ., eds.Markov Chain Monte Carlo in Practice. London, UK: Chapman & Hall. 1996. (pp. 145–161). [Google Scholar]
  • 26.Berger JO, Pericchi LR. The Intrinsic Bayes Factor for Model Selection and Prediction. Journal of the American Statistical Association 1996; 91(433): 109–122. [Google Scholar]
  • 27.Kass RE, Raftery AE. Bayes Factors. Journal of the American Statistical Association 1995; 90(430): 773–795. [Google Scholar]
  • 28.Arellano-Valle R, Bolfarine H, Lachos V. Bayesian inference for skew-normal linear mixed models. Journal of Applied Statistics 2007; 34(6): 663–682. [Google Scholar]
  • 29.Eke PI, Dye B, Wei L, Thornton-Evans G, Genco R. Prevalence of periodontitis in adults in the United States: 2009 and 2010. Journal of Dental Research 2012; 91(10): 914–920. [DOI] [PubMed] [Google Scholar]
  • 30.Reich BJ, Bandyopadhyay D. A latent factor model for spatial data with informative missingness. The Annals of Applied Statistics 2010; 4(1): 439–459. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Kingman A, Susin C, Albandar JM. Effect of partial recording protocols on severity estimates of periodontal disease. Journal of clinical Periodontology 2008; 35(8): 659–667. [DOI] [PubMed] [Google Scholar]
  • 32.Holtfreter B, Albandar JM, Dietrich T, et al. Standards for reporting chronic periodontitis prevalence and severity in epidemiologic studies: Proposed standards from the Joint EU/USA Periodontal Epidemiology Working Group. Journal of Clinical Periodontology 2015; 42(5): 407–412. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Johnson-Spruill I, Hammond P, Davis B, McGee Z, Louden D. Health of Gullah families in South Carolina with type 2 diabetes. The Diabetes Educator 2009; 35(1): 117–123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Taylor GW, Borgnakke WS. Periodontal disease: Associations with diabetes, glycemic control and complications. Oral Diseases 2008; 14(3): 191–203. [DOI] [PubMed] [Google Scholar]
  • 35.Reichard P, Nilsson BY, Rosenqvist U. The effect of long-term intensified insulin treatment on the development ofmicrovascular complications of diabetes mellitus. New England Journal of Medicine 1993; 329(5): 304–309. [DOI] [PubMed] [Google Scholar]
  • 36.Mealey BL, Oates TW. Diabetes mellitus and periodontal diseases. Journal of Periodontology 2006; 77(8): 1289–1303. [DOI] [PubMed] [Google Scholar]
  • 37.Herring M, Shah S. Periodontal disease and control of diabetes mellitus. The Journal of the American Osteopathic Association 2006; 106: 416–421. [PubMed] [Google Scholar]
  • 38.Friedman JH. Greedy function approximation: A gradient boosting machine.. The Annals of Statistics 2001; 29(5): 1189 – 1232. [Google Scholar]
  • 39.George E, Laud P, Logan B, McCulloch R, Sparapani R. Fully Nonparametric Bayesian Additive Regression Trees. In: Jeliazkov I, Tobias JL., eds. Topics in Identification, Limited Dependent Variables, Partial Observability, Experimentation, and Flexible Modeling: Part B. 40B of Advances in Econometrics. Emerald Publishing Limited. 2019. (pp. 89–110) [Google Scholar]
  • 40.Li X, Bandyopadhyay D, Lipsitz S, Sinha D. Likelihood methods for binary responses of present components in a cluster. Biometrics 2011; 67(2): 629–635. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Neal RM. Slice sampling. The Annals of Statistics 2003; 31(3): 705–767. [Google Scholar]
  • 42.Pettit LI. The conditional predictive ordinate for the Normal distribution. Journal of the Royal Statistical Society. Series B (Methodological) 1990; 52(1): 175–184. [Google Scholar]
  • 43.Vehtari A, Gelman A, Gabry J. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing 2017; 27(5): 1413–1432. [Google Scholar]
  • 44.Linero AR. A review of tree-based Bayesian methods. Communications for Statistical Applications and Methods 2017; 24(6): 543–559. [Google Scholar]
  • 45.Hill JL. Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics 2011; 20(1): 217–240. [Google Scholar]
  • 46.Ibrahim JG, Chen MH, Sinha D. Criterion-based methods for Bayesian model assessment. Statistica Sinica 2001; 11(2): 419–443. [Google Scholar]
  • 47.Gelfand AE, Smith AF. Sampling-based approaches to calculating marginal densities. Journal of the American statistical association 1990; 85(410): 398–409. [Google Scholar]
  • 48.Reich BJ, Bandyopadhyay D, Bondell HD. A nonparametric spatial model for periodontal data with non-random missingness. Journal of the American Statistical Association 2013; 108(503): 820–831. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Azzalini A A class of distributions which includes the normal ones. Scandinavian Journal of Statistics 1985; 12(2): 171–178. [Google Scholar]
  • 50.Caruana R. Multitask learning. Machine Learning 1997; 28: 41–75. [Google Scholar]
  • 51.Darby ML, Walsh M. Dental Hygiene: Theory and Practice. Elsevier Health Sciences. 2014; 4 edn. [Google Scholar]
  • 52.Bandyopadhyay D, Lachos VH, Abanto-Valle CA, Ghosh P. Linear mixed models for skew-normal/independent bivariateresponses with an application to periodontal disease. Statistics in Medicine 2010; 29(25): 2643–2655. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Geisser S, Eddy WF. A predictive approach to model selection. Journal of the American Statistical Association 1979; 74(365): 153–160. [Google Scholar]
  • 54.Johnson-Spruill I, Hammond P, Davis B, McGee Z, Louden D. Health of Gullah families in South Carolina with Type 2 diabetes. The Diabetes Educator 2009; 35(1): 117–123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Susin C, Kingman A, Albandar JM. Effect of partial recording protocols on estimates of prevalence of periodontal disease. Journal of Periodontology 2005; 76(2): 262–267. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supinfo2
supinfo1
supinfo3
supinfo4
supinfo5

RESOURCES