Bayesian Additive Regression Trees for Multivariate Skewed Responses

Seungha Um; Antonio R Linero; Debajyoti Sinha; Dipankar Bandyopadhyay

doi:10.1002/sim.9613

. Author manuscript; available in PMC: 2024 Feb 10.

Published in final edited form as: Stat Med. 2022 Nov 25;42(3):246–263. doi: 10.1002/sim.9613

Bayesian Additive Regression Trees for Multivariate Skewed Responses

Seungha Um ^1,^*, Antonio R Linero ², Debajyoti Sinha ¹, Dipankar Bandyopadhyay ³

PMCID: PMC9851978 NIHMSID: NIHMS1850406 PMID: 36433639

Summary

This paper introduces a nonparametric regression approach for univariate and multivariate skewed responses using Bayesian additive regression trees (BART). Existing BART methods use ensembles of decision trees to model a mean function, and have become popular recently due to their high prediction accuracy and ease of use. The usual assumption of a univariate Gaussian error distribution, however, is restrictive in many biomedical applications. Motivated by an oral health study, we provide a useful extension of BART, the skewBART model, to address this problem. We then extend skewBART to allow for multivariate responses, with information shared across the decision trees associated with different responses within the same subject. The methodology accommodates within-subject association, and allows varying skewness parameters for the varying multivariate responses. We illustrate the benefits of our multivariate skewBART proposal over existing alternatives via simulation studies and application to the oral health dataset with bivariate highly skewed responses. Our methodology is implementable via the R package skewBART, available on GitHub.

Keywords: Bayesian nonparametrics, ensembling methods, nonlinear regression, skew-normal

1 |. INTRODUCTION

Highly skewed multivariate responses are commonly observed in many biomedical and clinical research problems. For example, in a preliminary analysis (see Figure 1) of periodontal disease (PD) status¹ among a population of Type-2 diabetic (T2D) Gullah-speaking African-Americans (henceforth, the GAAD study) two popular PD endpoints — the average periodontal pocket depth (PPD) and average clinical attachment level (CAL) — are highly skewed. The data analytic goal here is to assess the cross-sectional associations of important covariates, say, age and diabetes level, on the bivariate responses (average PPD and average CAL of all measured tooth-sites) of any patient from a highly diabetic population.

GAAD data: Scatter plot of PPD and CAL responses, averaged for each subject, with marginal density estimates.

To model the relationship between a k-dimensional multivariate observed response y_i and a p-dimensional vector of covariates x_i without imposing any pre-determined functional form, we consider the nonparametric regression of y_i on the corresponding x_i, given by

y_{i} = f (x_{i}) + ϵ_{i},

(1)

where the errors ϵ_i are iid with common multivariate density h_ϵ(·) for i = 1, …, n. The unknown function f returns k-dimensional vectors corresponding to x_i. For univariate case with k = 1, a popular strategy for modeling f is to use an ensemble of decision trees, such as random forests² or boosted decision trees³.

In this paper, we focus on the Bayesian additive regression trees (BART) model introduced by Chipman et al.⁴. Unlike the parametric regression functions utilized to model skewed multivariate responses⁵, decision tree ensembles efficiently capture both nonlinear and interaction effects of covariates in f(x), automatically. Additionally, compared to other machine learning algorithms, BART models provide a number of appealing advantages. First, they are typically robust to the choice of tuning parameters, and yield high prediction accuracy. Second, unlike many competing machine learning algorithms, BART allows for uncertainty quantification by providing a full posterior distribution for the predictions. Recent work has also established attractive theoretical properties for BART^6,7. BART models have successfully been applied to many statistical problems, including classification⁸, survival analysis^9,10, density estimation¹¹, and high dimensional sparse regression¹².

Existing BART models come with various limitations, however. For example, in studies with highly skewed responses, the error density h_ϵ(·) being a multivariate Gaussian density may not be appropriate. Bhingare et al.,⁵ showed that the mean PPD and mean CAL are highly skewed, and methods which adapt to skewed responses tend to outperform methods based on Gaussian errors regarding precision of the estimated parameters and predictions of future outcomes. Second, as illustrated by Linero and Yang⁷, the estimate of f(·) obtained from the usual BART models lack smoothness, leading to suboptimal performance when the true f(x) varies smoothly with x. To improve performance of the estimate when the true f(·) is believed to be smooth, Linero and Yang⁷ introduced the soft BART (SBART) model.

In this paper, we introduce the skewBART model, which takes the error density h_ϵ(·) in equation (1) to be a multivariate skewed normal density to accommodate non-Gaussian responses. We also extend the SBART model for univariate response to obtain smooth estimates of f(·) under skewed errors. We develop an efficient Gibbs sampler based on the Bayesian backfitting algorithm⁴ to perform practical nonparametric Bayesian inference. In addition to accommodating skewness of the error, we introduce a skewed multivariate response model, with multivariate decision trees used for regression functions and a multivariate skewed distribution for the error vector for each subject. This is motivated from the fact that most epidemiological studies/trials on PD model CAL, thereby quantifying the (past) disease history and progression, in contrast to the current disease status measured by PPD¹³. Hence, a thorough assessment of PD may suggest modeling PPD and CAL considering a multivariate framework. The use of multivariate decision trees has been used both to handle mixed-type responses such as zero-inflated log-normal/gamma distributions¹⁴, as well as to implement targeted smoothing over space or time¹⁵. To the best of our knowledge, this is the first attempt to bring multivariate decision tree methods to bear in the BART framework to model multivariate skewed response data. To increase the precision of the inference, our multi-skewBART model takes into account the association of the multivariate responses within each subject, borrowing information across the different responses by using the same decision trees for the mean vector. We further illustrate the advantages of our skewBART and multi-skewBART proposals in practice on the GAAD study.

The rest of the paper is organized as follows. After briefly reviewing both the BART model and the important properties of the skew normal distribution, Section 2 introduces the skewBART and multi-skewBART models, respectively for univariate and multivariate responses. Details on the Bayesian inference, including prior specifications and the related Markov chain Monte Carlo (MCMC) algorithm for posterior computation are outlined in Section 3. In Section 4, we compare the finite sample performances (including out-of-sample predictions) of our proposals to existing alternatives using synthetic data. In Section 5, we demonstrate the application to the motivating GAAD oral health dataset. Section 6 concludes with a discussion. Details on the proposed MCMC algorithm appear in the Appendix.

2 |. THE SKEWBART AND MULTI-SKEWBART MODELS

2.1 |. Review of the BART and SBART models

For the scalar response, the BART framework introduced by Chipman et al.⁴ models f(x_i) in (1) as a sum of m trees

f (x_{i}) = \sum_{t = 1}^{m} g (x_{i}; 𝒯_{t}, ℳ_{t}),

(2)

where 𝒯_t is a binary tree structure with n_t leaf nodes and $ℳ_{t} = {μ_{t 1}, \dots, μ_{t n_{t}}}$ is a collection of parameters for n_t leaves of the tth tree 𝒯_t with h_ϵ being a Gaussian density. The function g(x_i; 𝒯_t, ℳ_t) returns $\sum_{ℓ = 1}^{n_{t}} μ_{t ℓ} η (x_{i}; 𝒯_{t}, ℓ)$ where η(x_i; 𝒯_t, ℓ) is the indicator that x_i is associated with leaf node ℓ in 𝒯_t, i.e., g(x_i; 𝒯_t, ℳ_t) = μ_tℓ if-and-only-if x_i is associated to leaf ℓ of tree 𝒯_t; for example, if x = (0.62, 0.45) in Figure 2 (bottom-left), then g(x, 𝒯_t, ℳ_t) = μ_t1. The tree 𝒯_t consists of interior splitting rules of the form [x_j ≤ C_b] for each branch node b, which induces a partition on the covariate space. At each interior node, the splitting rule uses predictor j with prior probability s_j, where s = (s₁, …, s_p) is a probability vector. Throughout this manuscript, we place a Dirichlet(a∕p, …, a∕p) hyperprior on s to allow the model to filter out irrelevant predictors¹². The cutpoint C_b of the splitting rule is assigned a uniform prior over its possible values.

Top: an example of a binary tree 𝒯_t where the terminal nodes are labeled with the corresponding parameters μ_tℓ. Bottom: the corresponding partition of the predictor space 𝒳 = [0, 1]² from BART (left panel) and SBART (right panel) when κ = 0.1.

To avoid overfitting, each tree is given a regularization prior which shrinks the individual leaf parameters to 0 so that each tree contributes only a small portion of the overall fit. It is regularized by the depth of each tree and the prior distribution of leaf parameters. The probability that each node at depth d is non-terminal is given by γ(1 + d)^−β for hyperparameters γ ∈ (0, 1) and β ∈ [0, ∞). The default prior specification⁴ sets γ = 0.95 and β = 2, which encourages the trees to be shallow (rarely exceeding depth 2–3). The leaf parameters μ_tℓ are given iid $𝒩 (0, σ_{μ}^{2} / m)$ priors, ensuring that the prior variance of f(x) is constant as a function of m.

To define a BART model (2) of the non-parametric regression in (1) for the univariate response y_i, we must specify a likelihood for the observed data D = {(y_i, x_i) : i = 1, ⋯ , n}, a prior on the unknown model parameters related to the tree structures, and a prior on the parameters σ² of the unknown univariate error distribution h_ϵ(· | σ²). Let 𝓣 = {𝒯₁, …, 𝒯_m} denote the set of all trees and let 𝓜 = {ℳ₁, …, ℳ_m} denote the set of leaf parameters for all trees. Bayesian computation then proceeds by using MCMC based samples from the joint posterior p(𝓣, 𝓜| D), which is proportional to the likelihood times the joint prior p(θ) of all the unknown parameters θ. When the error distribution is 𝒩(0, σ²), the prior consists of a conjugate inverse gamma prior p(σ²) for σ², independent priors p(𝒯_t) for the tree structures 𝒯_t, and conditionally-independent priors for the terminal node parameters. For the leaf node set ℳ_t = {μ_tℓ : 1 ≤ ℓ ≤ n_t} conditional on the tree structure 𝒯_t, a conjugate normal distribution $𝒩 (0, σ_{μ}^{2} / m)$ is widely used as the common independent prior p(μ_tl). The resulting joint prior distribution factors as

p (𝓣, 𝓜, σ^{2}) = \prod_{t = 1}^{m} [p (𝒯_{t}, ℳ_{t})] p (σ^{2}) = \prod_{t = 1}^{m} [p (ℳ_{t} ∣ 𝒯_{t}) p (𝒯_{t})] p (σ^{2}) = \prod_{t = 1}^{m} [\prod_{ℓ = 1}^{n_{t}} p (μ_{t ℓ} ∣ 𝒯_{t})] p (𝒯_{t}) p (σ^{2}) .

(3)

By denoting the (n × 1) vector of partial residuals excluding the tth tree as R_t = (R_t1, …, R_tn)^⊤ where $R_{t i} = y_{i} - \sum_{k \neq t} g (x_{i}; 𝒯_{k}, ℳ_{k})$ , the integrated likelihood function of 𝒯_t is

p (𝒯_{t} ∣ R_{t}, σ) \propto p (𝒯_{t}) \int p (R_{t} ∣ ℳ_{t}, 𝒯_{t}, σ) p (ℳ_{t} ∣ 𝒯_{t}, σ) d ℳ_{t}

(4)

given 𝒯_−t = {𝒯_k : k ≠ t} and ℳ_−t = {ℳ_k : k ≠ t}. For brevity, in equation (4) and all following expressions of posterior and conditional posteriors, we suppress dependence on covariate vector x. Because we have a conjugate prior for ℳ_t, the left-hand side of (4) is available in closed form. This allows the use of the Bayesian backfitting algorithm¹⁶ to update (𝒯_t, ℳ_t) sequentially for t = 1, …, m by using the Metropolis-Hastings algorithm⁴ to sample 𝒯_t using (4) and then sampling ℳ_t from its full conditional.

Despite the recent popularity of BART, the estimates obtained from BART have some drawbacks; in particular, BART lacks smoothness due to the distinct partitions as illustrated in Figure 2 (bottom-left). To overcome this lack of smoothness, Linero and Yang⁷ proposed SBART, which forms predictions by averaging over many random paths down the tree. In the splitting rule in SBART at branch b, x goes left at branch b with probability $ψ (x_{j_{b}}; C_{b}, κ_{b}) = ψ (\frac{x_{j} - C_{b}}{κ_{b}})$ , where κ_b > 0 is a bandwidth parameter controlling the sharpness of the decision. By averaging over all possible paths down the tree, each leaf from SBART has a global impact on f which allows the model to share information across different covariate regions. Figure 2 (bottom-right) illustrates the smoothed partition of the predictors space in SBART with the logistic ψ(u) = (1 + e^−u)⁻¹. The splitting rules of BART induce a distinct partition of the predictor space 𝒳 = [0, 1]². On the other hand, SBART smooths the BART partition. Such smoothness is useful when the underlying f(x) is believed to be smooth. We note that if the bandwidth parameter κ approaches 0, the splitting rule of SBART converges to the splitting rule of the original BART model.

Throughout, we write f ~ BART_m(α, β, σ_μ) and f ~ BART_m if the scalar valued multivariable function f follows the BART prior respectively with the specified hyperparameter values of (α, β, m, σ_μ) and with the default values of Chipman et al.⁴. Similarly, we write f ~ SBART_m(α, β, σ_μ) and f ~ SBART_m to denote SBART priors respectively with pre-specified hyperparameters and with the default values recommended by Linero and Yang⁷.

2.2 |. Review of the skew-normal distribution

We use 𝒮𝒩(ξ, σ², α) to denote a univariate skew-normal (SN) distribution¹⁷ with probability density function $\frac{2}{σ} ϕ (\frac{y - ξ}{σ}) Φ (α \frac{y - ξ}{σ})$ , where ϕ and Φ are the standard normal density and cumulative distribution functions, respectively. The parameters ξ and σ are location and scale parameters, while α is a shape parameter which allows for the density to exhibit skewness. The stochastic representation of the univariate skew-normal distribution, such that E ~ 𝒮𝒩(ξ, σ², α), is

E \overset{d}{=} W + λ Z

(5)

where W and Z are respectively independent 𝒩(ξ, τ²) and 𝒩₊(0, 1) random variables, with $λ = (α σ) / \sqrt{1 + α^{2}}$ and $τ = σ / \sqrt{1 + α^{2}}$ ; here, 𝒩₊(μ, σ²) denotes a 𝒩(μ, σ²) truncated with the support of (0, ∞).

Different extensions^18,19,20 of the univariate SN distribution to the multivariate setting have been proposed. For extending skew-BART to multivariate responses, we consider the generalized multivariate skew-normal (MSN) distribution^19,21, which is both practical and flexible. A (k × 1) random vector E follows an MSN distribution, denoted by E ~ 𝒮𝒩_k(μ, Σ, Λ), if its probability density function is given by

2^{k} ϕ_{k} (e ∣ μ, Ω) Φ_{k} (Λ^{⊤} Ω^{- 1} (e - μ) ∣ 0, Δ^{- 1})

(6)

where Λ = diag(λ₁, …, λ_k) is the (k × k) skewness matrix, $μ \in ℝ^{k}$ is the location vector, Σ is a (k × k) positive definite scale matrix, Ω = Σ + ΛΛ^⊤ and Δ = I_k + Λ^⊤Σ⁻¹Λ. Also, ϕ_k(y|μ, Σ) and Φ_k(y|μ, Σ) are the density and cumulative distribution functions of a multivariate normal distribution, denoted by 𝒩_k(μ, Σ), with mean $μ \in ℝ^{k}$ and (k×k) positive definite covariance Σ. Note that (6) belongs to the class of fundamental skew distributions considered by Arellano-Valle and Genton¹⁹. From Proposition 1 of Arellano-Valle et al.²¹, E can be represented stochastically as

E \overset{d}{=} W + Λ Z,

(7)

where W ~ 𝒩_k(μ, Σ) and Z ~ 𝒩_k+(0, I_k) are independent k-variate normal and truncated normal random vectors. The family of MSN densities has the desirable and practically useful property of being closed under marginalization. Additionally, each of the individual components of the MSN distribution of (6) has a univariate skew-normal marginal of (5). Sahu et al.²⁰ and Bhingare et al.⁵ present physical interpretations of the univariate skew-normal density of (5), and the multivariate skew-normal density (7), in terms of convolution of a skewing shocks Z to a symmetric distribution.

2.3 |. Univariate `skewBART`

We first introduce the univariate skewBART model to extend BART to accommodate skewed responses when k = 1. We model the scalar-valued nonparametric regression function f(·) in (1) using a sum-of-trees model with an SN distributed error as

{\begin{array}{l} y_{i} = f (x_{i}) + ϵ_{i} with ϵ_{i} \overset{i i d}{~} 𝒮 𝒩 (0, σ^{2}, α) \\ f (x_{i}) = \sum_{t = 1}^{m} g (x_{i}; 𝒯_{t}, ℳ_{t}) . \end{array}

(8)

Using the stochastic representation of (5), we can rewrite (8) as

y_{i} = \sum_{t = 1}^{m} g (x_{i}; 𝒯_{t}, ℳ_{t}) + λ Z_{i} + W_{i}

(9)

where W_i ~ 𝒩(0, τ²), Z_i ~ 𝒩₊(0, 1), $λ = α σ / \sqrt{1 + α^{2}}$ and $τ = σ / \sqrt{1 + α^{2}}$ . For the scalar y_i, we use the SBART of Linero and Yang⁷ with f ~ SBART_m to incorporate smoothness in f(·). The prior specifications described in Section 3.1 along with the reparameterization in (9) allows us to construct a simple data augmentation step to fit the skewBART model of (8).

2.4 |. Multivariate `skewBART`

We extend the sum-of-trees model (8) using the MSN distribution (7) to accommodate multivariate outcomes. This extension improves prediction accuracy by incorporating the correlation between responses, as well as the dependency in error estimation. Our proposed multivariate skewBART model, called the multi-skewBART model, for the multivariate response $Y = {(y_{1}, \dots, y_{n})}^{⊤} \in ℝ^{n \times k}$ models the nonparametric multi-variable f(·) in (1) as

{\begin{array}{l} y_{i} = f (x_{i}) + ϵ_{i} with ϵ_{i} \overset{i i d}{~} 𝒮 𝒩_{k} (0, Σ, Λ) \\ f (x_{i}) = \sum_{t = 1}^{m} g (x_{i}; 𝒯_{t}, ℳ_{t}), \end{array}

(10)

where g(x_i; 𝒯_t, ℳ_t) returns a k-dimensional μ_tℓ if x_i is associated with the leaf node ℓ in 𝒯_t for ℓ = 1, …, n_t. For nonparametric f(·) in (10), we say that f has a multivariate BART prior, denoted by f ~ MultiBART_m(α, β, Σ_M), where (α, β, m) are the usual BART hyperparameters for the trees and (k × k) matrix Σ_M is the prior covariance of μ_tℓ ~ 𝒩_k(0, Σ_M). Further details of this prior specification are presented in Section 3. The tree 𝒯_t in (10) has a binary tree structure like univariate skewBART, except that $ℳ_{t} = {μ_{t 1}, \dots, μ_{t n_{t}}}$ is now a set of n_t vectors leaf parameters, each of dimension k, of the t-th tree 𝒯_t. Using the stochastic representation of the MSN distribution in (7), the multi-skewBART model of (10) is now expressed as

y_{i} = \sum_{t = 1}^{m} g (x_{i}; 𝒯_{t}, ℳ_{t}) + Λ Z_{i} + W_{i},

(11)

where $Z_{i} = {(Z_{i 1}, \dots, Z_{i k})}^{⊤} \overset{i i d}{~} 𝒩_{k +} (0, I_{k}), W_{i} = {(W_{i 1}, \dots, W_{i k})}^{⊤} \overset{i i d}{~} 𝒩_{k} (0, Σ)$ , and the scale-matrix Σ and the skewness matrix Λ are defined as in (6). For the multivariate model, we use the BART framework to consider the association among k response components without adapting to the smoothness of f.

3 |. PRIOR SPECIFICATION AND POSTERIOR COMPUTATION

Previous works have repeatedly observed that the default priors proposed by Chipman et al.⁴ work remarkably well in practice; for example, both Chipman et al.⁴ (for BART) and Linero and Yang⁷ (for SBART) observe that, across a large number of benchmark datasets, the default priors are often competitive with more computationally expensive methods based on tuning the hyperparameters via cross-validation, with the default priors even occasionally outperforming the cross-validation methods. For the tree structures, we use the default prior for the BART model and provide priors for the rest of the parameters of the error distribution in (1). We then derive Gibbs samplers for fitting the models.

3.1 |. `skewBART`: Prior choices and MCMC computation

As a preprocessing step, we apply a quantile normalization to each of p components of the covariate vectors so that each covariate is approximately uniform on (0, 1). Additionally, following Chipman et al.⁴, the dependent variable y_i is also standardized. The leaf parameters μ_tℓ of the standardized y_i are assumed independent and identically distributed $𝒩 (0, σ_{μ}^{2} / m)$ thus regularizing the effect of the individual tree components to contribute only a small part in the overall fit.

For choosing the bandwidth κ_b, Linero and Yang⁷ recommend using tree specific κ_t’s shared across branches in a fixed tree, with κ_t ~ Exp(0.1). We specify a half-Cauchy prior τ ~ Cauchy₊(0, τ₀), where τ₀ is chosen empirically. We obtain τ₀ by fitting the lasso using the glmnet package in R. We use a conjugate univariate normal 𝒩(0, δ) prior for λ to allow both positive and negative skewness in (8), where a large value of δ corresponds to a non-informative prior opinion about the amount of skewness. In summary, for our univariate response model in (9), the priors are given by f ~ SBART_m and

\begin{array}{c} λ ~ 𝒩 (0, δ) & Z_{i} ~ 𝒩_{+} (0, 1), \\ W_{i} ~ 𝒩 (0, τ^{2}) & τ ~ {Cauchy}_{+} (0, τ_{0}) . \end{array}

Below we describe our MCMC algorithm with a data augmentation step for fitting skewBART. Our strategy is to augment the latent variables Z = (Z₁, …, Z_n) within the Bayesian backfitting algorithm¹⁶ in order to sample from the posterior distribution

p (𝓣, 𝓜, λ, τ, Z ∣ Y)

using MCMC, where Y = (y₁, …, y_n)^⊤ for univariate case. Our Gibbs sampling algorithm uses a Metropolis-within-Gibbs strategy to iteratively update (𝒯_t, ℳ_t) for t = 1, …, m in a fashion that leaves the full conditional

p (𝒯_{t}, ℳ_{t} ∣ 𝒯_{- t}, ℳ_{- t}, Z, λ, τ, Y),

(12)

invariant. Then, samples of τ, λ and Z are drawn from the conditional posterior distributions

p (τ ∣ 𝓣, 𝓜, Z, λ, Y), p (λ ∣ 𝓣, 𝓜, Z, τ, Y), p (Z ∣ 𝓣, 𝓜, λ, τ, Y) .

(13)

Sampling from (𝒩_t, ℳ_t) in (12) is equivalent to sampling from

p (𝒯_{t}, ℳ_{t} ∣ R_{t}, τ)

(14)

where R_t = (R_t1, ⋯ , R_tn)^⊤ with $R_{t i} = y_{i} - \sum_{k \neq t} g (x_{i}; 𝒯_{k}, ℳ_{k}) - λ Z_{i}$ . This allows us to use the existing Bayesian backfitting algorithm to update the (𝒯_t, ℳ_t).^4,22 The conditional posterior p(λ | 𝓣, 𝓜, Z, τ, Y) of λ given the rest of the parameters is also Gaussian because it has a form similar to the posterior of the regression parameters of the linear regression model of Y on Z under a Gaussian prior for λ. The full conditionals of the components of Z can be derived in closed form as a collection of independent truncated Gaussian random variables. The Bayesian backfitting algorithm for skewBART is described in Algorithm 1 below, with the exact conditional posterior distributions corresponding to (13) relegated to the Appendix.

3 |.

3.2 |. `multi-skewBART`: Prior choices and MCMC computation

To extend the univariate skewBART model to multivariate k-dimensional response y_i, we need to replace the univariate Z_i, μ_tℓ, λ and W_i with corresponding vectors Z_i = (Z_i1, ⋯ , Z_ik)^⊤, μ_tℓ = (μ_1tℓ, ⋯ , μ_ktℓ)^⊤, (λ₁, ⋯ , λ_k) and W_i = (W_i1, ⋯ , W_ik)^⊤. We also need to specify the appropriate priors and hyperpriors for these vectors. For prior p(μ_tℓ | 𝒯_t) we use the conjugate multivariate normal distribution, 𝒩_k(μ_M, Σ_M), which allows the (4) to have a closed form expression as in the univariate BART model. The priors on μ_M and Σ_M are chosen to assign high probability to the range observed values of each of k responses. Similar to the univariate case with hyperparameter τ of W_i, we use an empirical estimate of Σ to choose the center $S_{0}^{- 1}$ and the scale ν₀ of the inverse-Wishert hyper-prior of (k × k) matrix Σ. For skewness parameters (λ₁, …, λ_k), we use a conjugate multivariate normal prior 𝒩_k(0, Σ_λ). In summary, for our proposed multivariate response model in (11), the priors are given by f ~ MultiBART(0.95, 2, Σ_M) and

\begin{array}{l} (λ_{1}, \dots, λ_{k}) ~ 𝒩_{k} (0, Σ_{λ}), & Z_{i} ~ 𝒩_{k +} (0, I_{k}), \\ W_{i} ~ 𝒩_{k} (0, Σ), & Σ ~ inverse-Wishart (v_{0}, S_{0}^{- 1}) . \end{array}

We take Σ_M to be a diagonal matrix with entries given by the default parameters from the respective univariate BART model of Chipman et al.⁴ The Gibbs sampler draws (𝒯_t, ℳ_t) given (𝒯_−t, ℳ_−t, Z, Λ, Σ) from

p (𝒯_{t}, ℳ_{t} ∣ R_{t}, Σ),

where R_t = (R_t1, …, R_tn)^⊤ with $R_{t i} = y_{i} - \sum_{k \neq t} g (x_{i}; 𝒯_{k}, ℳ_{k}) - Λ Z_{i}$ . Then, we sample (λ₁, …, λ_k), Σ and Z from each full conditional distribution given as

p (λ_{1}, \dots, λ_{k} ∣ 𝓣, ℳ, Z, Σ, Y),

p (Σ ∣ 𝓣, ℳ, Z, λ_{1}, \dots, λ_{k}, Y),

p (Z ∣ 𝓣, ℳ, λ_{1}, \dots, λ_{k}, Σ, Y) .

where ℳ = {ℳ₁, …, ℳ_m} and each ℳ_t is a collection of k-dimensional leaf vectors associated with tree 𝒯_t. The MCMC algorithm for this model is similar to the original MCMC scheme for univariate skewBART model, but now extended to the multivariate setting.

4 |. SIMULATION STUDY

In this section, we conduct simulation studies to compare skewBART and multi-skewBART to existing methods. The simulations are designed to evaluate the estimation accuracy of the regression function and out-of-sample predictive performance.

4.1 |. Univariate responses

We first examine the inferences obtained by skewBART compared to BART and SBART when the true error distribution takes on four different skewness levels. For each replicated dataset we simulate n = 250 observations using the nonparametric regression model of (1) with the known population mean function

y_{i} = 10 sin (π x_{i 1} x_{i 2}) + 20 {(x_{i 3} - 0.5)}^{2} + 10 x_{i 4} + 5 x_{i 5} + ϵ_{i}

(15)

introduced by Friedman²³, where x_i ~ Uniform([0, 1]⁵) and $ϵ_{i} \overset{i i d}{~} 𝒮 𝒩 (0, 1, α)$ . For each experiment we use 200 trees, 5000 MCMC samples, and a burn-in of 2500 draws. Figure 3 presents density estimates of the distribution of the residuals from fitting the three competing methods overlaid with the true error density function, corresponding to the various choices of α, the skewness parameter. The plots on the left (upper and lower) panels correspond to moderate skewness, i.e., α = −1 and α = +1, while the two on the right (upper and lower) panels represent a high magnitude of skewness, i.e., α = −10 and α = +10. Under both moderate and heavy skewness, the empirical distribution of the residuals from our proposed skewBART model appears closest to the true error distribution.

Simulation study (univariate): Error densities corresponding to four different skewness parameters, α = (1, −1, 10, −10), overlaid with the density estimates from the `skewBART`, SBART and BART fits.

Next, we compare skewBART to BART when the data generating process varies with skewness levels. We simulate n = 250 observations, with σ² = 1 and use m = 200 trees. The skewness values are equally spaced points in the range (−10, 10), with increment of 1. We set hyperparameters for the priors of trees in BART and SBART following recommendations by Chipman et al⁴, and Linero and Yang⁷. To compare model performance, we use the Conditional Predictive Ordinates (CPO)^24,25, where CPO_i = p(y_i | Y_−i) is the predictive density of the i-th observation y_i given the cross-validated data Y_−i = (y₁, …, y_i−1, y_i+1, …, y_n). For notational convenience, we suppress the dependence of these response distributions on covariates X. A natural summary statistic of the CPO_i’s is the log pseudo marginal likelihood (LPML), given by $LPML = \sum_{i = 1}^{n} log ({CPO}_{i})$ . Two competing models ℳ₁ and ℳ₂ can be compared using the pseudo-Bayes factor $\prod_{i = 1}^{n} {CPO}_{i}^{ℳ_{1}} / \prod_{i = 1}^{n} {CPO}_{i}^{ℳ_{2}} = exp ({LPML}^{ℳ_{1}} - {LPML}^{ℳ_{2}})$ , where the superscript indicates the model under which these quantities are calculated²⁴. Pseudo-Bayes factors are related to the more well-known Bayes factor^24,26, which may not be appropriate under noninformative priors²⁷. The evidence in the literature supporting the use of pseudo-Bayes factors with BART models is limited; however, LPML is still a very practical tool because (i) unlike Bayes factors, LPML is based on leave-one-out cross-validation (making it more robust than Bayes factors under nonparametric and/or noninformative priors), and (ii) it can be conveniently and reliably computed using MCMC samples from the full posterior.

Using the loo package in R, the LPML can be conveniently computed after obtaining the MCMC samples from the joint posterior. Details on the computation of LPML appear in the Appendix. Results corresponding to each skewness level are presented in Figure 4 with 10 replications per simulation setting. Larger values of LPML indicate a better fit of the model.

Simulation study (univariate): The average values of LPML over 10 replications of the simulation experiment with skewness α ∈ (−10, 10). Each dot represents the averaged LPML corresponding to each skewness level and smoothing splines are added for each method to ease visualization.

Among the methods considered, skewBART performs the best, demonstrating improvement over SBART as skewness increases. For sufficiently large |α|, we observe a substantial increase in the LPML for skewBART, since skewBART can capture the excess skewness through the non-Gaussian error assumption. Also, the performances of the skewBART and SBART models are indistinguishable for low to negligible skewness, implying that skewBART does not suffer in terms of performance when the responses are not skewed.

We note that, counter-intuitively, the LPMLs for all models increase as skewness increases; this occurs because the error variance is proportional to 1∕(1+α²), so that increasing the skewness decreases the overall noise. Generally, skewBART performs much better than the alternatives as skewness increases.

4.2 |. Bivariate responses

In this section, we examine the benefits of adapting skewness for bivariate responses using synthetic data. We generate y_i from the bivariate version of model (1) with k = 2 and p = 5 where, x_i ~ Uniform([0, 1]⁵) and ϵ_i ~ 𝒮𝒩₂(0, Σ, Λ), with the scale matrix Σ and skewness matrix Λ. The function f returns 2-dimensional vector for each 5-dimensional x_i. We also use Friedman’s example for f with N = 250 and σ₁ = σ₂ = 1. Following Arellano-valle et al²⁸, we consider 9 different settings varying with λ₁, λ₂, the skewness parameters, and ρ, the correlation parameter in the bivariate specification, as displayed in Figure 5. We compare multi-skewBART to multi-BART, the multivariate version of the standard BART model, via LPML. For both models we use 200 trees and 5,000 MCMC draws. Results of this simulation are presented in Table 1. The result shows that multi-skewBART outperforms multi-BART with larger LPML under all settings. It appears that both skewness and correlation have an impact on the performance. Overall, higher correlation leads to higher LPML, with the highest LPML attained under (λ₁, λ₂) = (0, 3), with fixed correlation for both models. Thus, we conclude that the multi-skewBART fit is better, and its predictive performance remains highly competitive, regardless of the magnitude of skewness and correlation.

Simulation study (bivariate): The contour plots of the bivariate skew-normal distribution for μ = (0, 0) and σ₁ = σ₂ = 1 with different values of (λ₁, λ₂) and ρ.

TABLE 1.

Simulation study (bivariate): LPML corresponding to the fits of the multi-skewBART and Multi-BART, for various settings of λ₁, λ₂ and ρ.

		multi-skewBART	Multi-BART
(λ₁, λ₂) = (0, 3)	ρ = 0	−356.677	−360.145
	ρ = 0.5	−330.328	−346.016
	ρ = 0.9	−287.791	−308.521
(λ₁, λ₂) = (2, 3)	ρ = 0	−396.685	−427.258
	ρ = 0.5	−382.333	−412.068
	ρ = 0.9	−386.478	−400.468
(λ₁, λ₂) = (−2, 2)	ρ = 0	−369.528	−396.559
	ρ = 0.5	−312.813	−384.575
	ρ = 0.9	−311.486	−325.217

Open in a new tab

Next, we examined how the LPMLs for skewBART and multi-skewBART are impacted by the number of predictors varying from p =5 to 200. Using Friedman’s example, we simulate n = 250 observations with σ² = 1 and α=3 for skewBART and with $σ_{1}^{2} = σ_{2}^{2} = 1$ , (λ₁, λ₂) = (−2, 2) and ρ = 0.5 for multi-skewBART. With the m = 200 trees, results are given in the top panel of Figure 6, with 10 replications per simulation setting. We see that, as the the number of predictors increases, multi-skewBART is less sensitive to irrelevant predictors than skewBART. This is because the multi-skewBART model shares information about the relevant predictors across the different outcomes via the sharing of the decision trees¹⁴. However, both skewBART and multi-skewBART appear typically insensitive to the number of irrelevant predictors. We note that this is in large part due to our use of the Dirichlet prior proposed by Linero¹², and the robustness to the inclusion of irrelevant predictors we observe is consistent with other simulations done on this problem.

Simulation study: The average values of LPML over 10 replications with p ∈ (5, 200) (top) with fixed m = 200 trees (top) for `skewBART` and `multi-skewBART`. The average values of LPML over 10 replications with m ∈ (10, 200) with fixed p = 5 (bottom). Smoothing splines (with corresponding 95% confidence bands) are added for each method to ease visualization.

The number of trees is an important tuning parameter in the BART model. To examine how the number of trees impacts the LPML for skewBART and multi-skewBART, we fix p = 5 and vary the number of trees from 10 to 200. Results are given in the bottom panel of Figure 6, with 10 replications per simulation setting. We see a slight increase in LPML as the number of trees increases to the optimal choice. This result is consistent with what has been observed in other studies^4,14: predictive performance of BART models is typically robust to the number of trees provided we include sufficiently many. We find that this behavior holds for both the skewBART and multi-skewBART models.

5 |. APPLICATION: THE GAAD STUDY

PD primarily results from the inflammation of the gums and bone that surround and support the teeth²⁹. The motivating GAAD study used two correlated biomarkers, the PPD and CAL, of PD status. The CAL is defined as the distance (in mm) down a tooth’s root that is no longer attached to the surrounding bone by the periodontal ligament³⁰. The PPD is defined as the distance (also in mm) from the gingival margin to the bottom of the gingival pocket. The CAL and PPD are measured at six pre-specified tooth sites of each available tooth, excluding the third molars/wisdom teeth³¹. When no teeth are missing, CAL (PPD) measurements are available from 168 sites of the subject. The Joint EU/USA Periodontal Epidemiology Working Group guidelines recommend the average CAL and the average of PPD as standard measurements for studying the prevalence and severity of PD, and for overall tooth level periodontal status in epidemiological studies³². Thus, we consider the bivariate response (PPD_i, CAL_i) for subject i to be the average of the PPD and the average of CAL values across all available sites (maximum of 168 sites). For sake of brevity, we will omit the word average hereafter and instead just use, say, CAL to mean average CAL of the subject. The study also includes various subject-level covariates, such as age (in years), body mass index (BMI, in kg∕m²), glycated haemoglobin level (HbA1c, in %), gender (1 for female, 0 for male), and smoking status (1 for past or present smoker, 0 for never). Our analysis use 288 subjects with complete covariate information, mostly females (about 76%), mean age of about 55 years (range 26–87 years), 31% smokers, and about 68% obese subjects (defined as BMI ≥ 30). The predominance of female subjects in our data is not spurious, and resonates with proportions of Gullah females recruited in other studies³³. Furthermore, the substantial evidence of adverse effects of T2D on PD has been extensively explored in oral health³⁴, and the current study is no exception. The HbA1c is considered a standard of care for testing and monitoring T2D. The American Diabetes Association recommends a target HbA1c level of < 7% (well-controlled), ideally between 4–6, for people with T2D^35,36. In our data, we have 60% subjects having poorly controlled T2D (i.e., with HbA1c ≥ 7), while the rest 40% are well-controlled (not T2D free). Unlike previous approaches based on parametric regression functions³⁰ and semiparametric formulations with skewed errors⁵, our approach incorporates tree-based unknown nonparametric regression function to capture both nonlinear and interaction effects of these covariates. Code for implementing these models is available on GitHub at https://github.com/Seungha-Um/skewBART.

We first present a univariate analysis of the GAAD study by fitting three competing methods (skewBART, BART, and SBART) to the PPD and CAL responses separately. We also consider fitting the models after applying a log transformation to the outcomes, as this is often a popular transformation to ameliorate the skewness and improve the fit of the normal error models. All methods use 5,000 MCMC draws with 200 trees, and we compare their fits via LPML. Results of the model comparison are summarized in Table 2. For both responses, the estimated LPMLs reveal that skewBART outperforms the competing approaches, with or without the log transform — we notice that performing a log transform does result in better LPML, however the skewness of error distributions are still important for model fit even after the log-transform. This suggests that the assumption of a marginal skew-normal error instead of Gaussian error is very appropriate for the analysis of GAAD study. The performance of SBART and BART are similar here; SBART enabling smooth regression function provides only a slight boost compared to BART.

TABLE 2.

GAAD data analysis: Model fit summaries (LPML values) obtained from fitting skewBART, SBART and BART models, separately to the PPD and CAL. The “Transformation” column indicates whether bivariate responses were log-transformed.

	Transformation	skewBART	SBART	BART
CAL	None	−380	−440	−446
CAL	Log	−358	−360	−364
PPD	None	−284	−340	−346
PPD	Log	−272	−277	−279

Open in a new tab

Now, we apply multi-skewBART to jointly model the bivariate skewed responses, CAL and PPD. We compare the fit of multi-skewBART with multi-BART via LPML, using 5,000 MCMC draws. As expected, the multi-skewBART model allowing different skewness levels for CAL and PPD as well as dependence of PPD and CAL within same subject outperforms (LPML = −614 vs −642) the multi-BART for the bivariate response. Same conclusion holds even for log-transformed responses (LPML = −527 vs −529). We conclude that the multivariate skew-normal assumption improves the modeling of the GAAD study.

Next, we examine how the multivariate model multi-skewBART impacts the prediction accuracy even though both skewBART and multi-skewBART assume SN errors. The multi-skewBART also allows sharing of information across decision trees and captures within-subject association. We compare the skewBART and multi-skewBART models using root mean squared error (RMSE), ${RMSE}^{2} = {(n^{⋆})}^{- 1} \sum_{i = 1}^{n^{⋆}} {y_{i}^{⋆} - \hat{ζ_{i}^{⋆}})}^{2}$ , of each response. Here ${(y_{i}^{⋆}, x_{i}^{⋆}) : i = 1, \dots, n^{⋆}}$ denotes a collection of held-out observations of a particular response, say, PPD, and $\hat{ζ_{i}^{⋆}}$ is the posterior predicted value of these n^⋆ PPD responses based on rest of the data. To make a direct comparison, the responses are standardized and the results are averaged over 20 splits into training and testing sets. Results are presented in Table 3. Although multi-skewBART doesn’t aim for smooth regression functions, it gives the best performance in terms of RMSE. This result implies that the association between CAL and PPD within each subject plays a more essential role than the smoothness levels of the regression function of the GAAD study. Also, multi-skewBART shares information across the different responses by using the same decision trees to generate predictions for both responses. The central message conveyed here is that separately modeling PPD and CAL responses ignoring their bi-directional dependence fails to share information across two responses and compromises the prediction accuracy.

TABLE 3.

GAAD data analysis: Root mean squared error (RMSE) computed over 20 replications for the responses (PPD and CAL) from the skewBART and multi-skewBART fits.

	skewBART	multi-skewBART
CAL	1.345	0.838
PPD	0.626	0.423

Open in a new tab

As there exists strong evidence of a bi-directional association between T2D and PD^36,37, we now focus on evaluating the association of varying HbA1c levels on PPD and CAL. The marginal effect of HbA1c levels is evaluated by Friedman’s partial dependence function³⁸ which provides a summary of the effect due to the covariates of interest by averaging over the other covariates. If x_i can be partitioned as x_i = (x_iI, x_iC), where x_iI is the set of covariates of interest (HbA1c, Gender, smoking status) and x_iC is the complement set, then the partial dependence of the response y_i at a value $x_{I}^{*}$ of the covariate of interest is defined as

μ_{I} (x_{I}^{*}) = E [y_{i} ∣ x_{i I} = x_{I}^{*}] \approx n^{- 1} \sum_{i = 1}^{n} μ (x_{I}^{*}, x_{i C})

where $E (\cdot)$ denotes the expectation operator and $μ (x) = f (x) + \sqrt{2 / π} {(λ_{1}, \dots, λ_{k})}^{⊤}$ is the conditional mean of y_i given covariate vector ( $x_{I}^{*}, x_{i C}$ ) (instead of observed x_i) under skew-normal error. We then compare the posterior mean predictions obtained from fitting skewBART and multi-skewBART models within the four subgroups defined by the combination of gender and smoking status. The results are displayed in Figure 7 with both skewBART and multi-skewBART fits. We observe that with increasing HbA1c levels both responses exhibit an overall increasing trend regardless of the subgroups or the model used. This reconfirms the overall (positive) association between HbA1c and PD. We can also see the effect of smoothing, with the multi-skewBART fit being much more rugged compared to the univariate skewBART fits. For both the skewBART and multi-skewBART fits for PPD, once gender is fixed, the fits corresponding to the smokers and non-smokers appear close; males are predicted to have a much higher PPD than females, regardless of their smoking status or HbA1c levels. We also observe that the effect of smoking is homogeneous with respect to the level of HbA1c. This implies that males are more likely to be prone to active/current PD (PPD representing current/active disease status) than females²⁹, irrespective of smoking status. However, this is not the story from the univariate fit of CAL, where the differences between the smokers and non-smokers appear prominent within genders. The prediction curves are very close between male non-smokers and female smokers for higher levels (say, ≥ 11) of HbA1c for both models.

GAAD data analysis: Plots of the posterior means of the partial dependence functions for CAL and PPD responses as a function of HbA1c corresponding to the 4 subgroups (varying with gender and smoking status) from fitting the `skewBART` (upper panel), and `multi-skewBART` (bottom panel) models; age and BMI are averaged over.

In terms of differences between the models, we note that, while skewBART is capable of leveraging smoothness, it produces higher RMSE on test data (see Table 3). This implies that considering the association between responses for the GAAD data is more important in generating good predictions than trying to leverage smoothness in the regression function.

6 |. CONCLUSION

In this paper, we proposed the skewBART model, and extended it to the multi-skewBART model which handles multivariate outcomes. The main idea is to use either a univariate SN density or an MSN density as the error density within the BART framework. We showed that, when the error distribution is skewed, skewBART and multi-skewBART provide better model fits than the original BART model; also, multi-skewBART enjoys additional benefits for multivariate responses due to its ability to account for within-subject dependence of the outcomes and because it uses multivariate decision trees to share information across regression functions. We showed that multi-skewBART produces better model fits on the GAAD data than fitting the two outcomes separately using skewBART.

As pointed out by a reviewer, George et al.³⁹ also considered very flexible extensions to BART by assuming the error distribution to be a Dirichlet process mixture of normals (DPM). This model is capable of addressing skewness, as well as other non-Gaussian features, such as heavy-tailedness of the error. For our purposes, the use of a skew-normal distribution is convenient as it (i) is more parsimonious, allowing for simpler comparisons to non-skewed alternatives; (ii) is less flexible, and hence potentially less prone to over-fitting; and (iii) is easier to extend to multivariate outcomes.

The current work considers two continuous responses, PPD and CAL. However, in practice, data responses (elements in the multivariate response vector) can be of mixed types, such as binary ‘bleeding on probing’ outcomes in PD modeling. In such situations, considering a latent variable, or factor modeling, framework with BART specifications may be worthwhile. Also, the number of available (non-missing) site-level responses (within each subject) can be correlated with the PPD, or CAL responses, leading to the informative cluster size (ICS) scenario⁴⁰, and exploration of BART under ICS is non-existent. Computationally, we have relied on the Gibbs sampler and Metropolis-Hastings for tree updates. In big data settings (such as in observational PD databases), scalable Bayesian methods will likely be required. Finally, multi-skewBART could also use SBART-style smoothing to enforce smoothness. Although we have shown that accounting for the association between responses is more important for our conclusions than trying to leverage smoothness, it may be possible to improve inferences by extending multi-skewBART to incorporate the SBART framework. All of these are important avenues for future work, and will be considered elsewhere.

◻

Supplementary Material

supinfo2

NIHMS1850406-supplement-supinfo2.tex^{(1.3KB, tex)}

supinfo1

NIHMS1850406-supplement-supinfo1.sty^{(44.7KB, sty)}

supinfo3

NIHMS1850406-supplement-supinfo3.bbl^{(6.6KB, bbl)}

supinfo4

NIHMS1850406-supplement-supinfo4.bst^{(18.6KB, bst)}

supinfo5

NIHMS1850406-supplement-supinfo5.cls^{(107.1KB, cls)}

ACKNOWLEDGEMENTS

The authors thank the anonymous Associate Editor and two reviewers, whose constructive comments led to a significantly improved version of the manuscript. They remain thankful to the Center for Oral Health Research at the Medical University of South Carolina for providing the motivating dataset, and the context of this work. Bandyopadhyay acknowledges partial support from grants R01DE031134 and R21DE031879, awarded by the United States National Institutes of Health. This material is also based upon work supported by the National Science Foundation under Grant No. DMS-214493, the Pfeifer Foundation of Cancer Research, and the Hobbs Foundation.

APPENDIX

A. MCMC FOR `SKEWBART`

We provide details on the Bayesian backfitting algorithm for the skewBART model. Recall that the model is given by

y_{i} = \sum_{t = 1}^{m} g (x_{i}; 𝒯_{t}, ℳ_{t}) + λ Z_{i} + W_{i}, W_{i} \overset{i i d}{~} 𝒩 (0, τ^{2}),

τ ~ {Cauchy}_{+} (0, τ_{0}),

λ ~ 𝒩 (0, δ) .

Throughout, we write [V | •] to denote the full-conditional distribution of V. Since Z_i ~ 𝒩₊(0, 1), we update Z_i as

[Z_{i} ∣ •] ~ 𝒩_{+} (\frac{λ R_{i}^{*}}{λ^{2} + τ^{2}}, \frac{τ^{2}}{λ^{2} + τ^{2}})

where $R_{i}^{*} = y_{i} - f (x_{i})$ .

Define $R^{*} = (R_{1}^{*}, \dots, R_{n}^{*})$ and Z = (Z₁, …, Z_n). By standard results for Bayesian linear regression, the update of λ is

[λ ∣ •] ~ 𝒩 (\frac{Z^{⊤} R^{*}}{Z^{⊤} Z + 1 / δ}, {(Z^{⊤} Z / τ^{2} + 1 / δ)}^{- 1})

The full conditional distribution for τ is proportional to

{(τ^{2})}^{- n / 2} exp [- \frac{1}{2 τ^{2}} {(R^{*} - λ Z)}^{⊤} (R^{*} - λ Z)] \frac{1}{π (τ_{0} + τ^{2} / τ_{0})},

which can be sampled using (for example) slice sampling⁴¹.

B. MCMC FOR `MULTI-SKEWBART`

The hierarchical model for multi-skewBART is

y_{i} = \sum_{t = 1}^{⊤} g (x_{i}, 𝒯_{t}, ℳ_{t}) + Λ Z_{i} + W_{i}, W_{i} \overset{i i d}{~} 𝒩_{k} (0, Σ)

{(λ_{1}, \dots, λ_{k})}^{⊤} ~ 𝒩_{k} (0, Σ_{λ})

Σ ~ inverse-Wishart (v_{0}, S_{0}^{- 1})

where Λ = diag(λ₁, ⋯ , λ_k). Since Z_i ~ 𝒩_k+(0, I_k) by the property of skew Normal distribution, we update Z_i as

[Z_{i} ∣ •] ~ 𝒩_{k +} {({(Λ Σ^{- 1} Λ + I_{k})}^{- 1} Λ Σ^{- 1} R_{i}^{*}, Λ Σ^{- 1} Λ + I_{k})}^{- 1}),

where $R_{i}^{*} = y_{i} - \sum_{t} g (x_{i}, 𝒯_{t}, M_{t})$ . Here 𝒩_k+(μ, Σ) is a k-dimensional normal distribution with location vector μ, covariance matrix Σ and truncated in the positive k-dimensional quadrant $ℝ^{k +}$ . The conditional posterior to update (λ₁, ⋯ , λ_k) is

[(λ_{1}, \dots, λ_{k}) ∣ •] ~ 𝒩_{k} ({(\sum_{i = 1}^{n} M_{i} Σ^{- 1} M_{i} + Σ_{λ}^{- 1})}^{- 1} \sum_{i = 1}^{n} M_{i} Σ^{- 1} R_{i}^{*}, {(\sum_{i = 1}^{n} M_{i} Σ^{- 1} M_{i} + Σ_{λ}^{- 1})}^{- 1})

where M_i = diag(Z_i1, ⋯ , Z_ik). The conditional posterior to update Σ is

Σ ~ inverse-Wishart (v_{0} + n, {(B^{⊤} B + S_{0})}^{- 1})

where B is a matrix whose kth row is ${(R_{i}^{*} - Λ Z_{i})}^{⊤}$ . The Bayesian backfitting algorithm for fitting multi-skewBART is now essentially the same as the algorithm for skewBART but with the updates of (Σ, {Z_i}, Λ) replacing the updates for (τ, {Z_i}, λ). The only remaining consideration is computing the integrated likelihood, which we do in the following section.

C. THE INTEGRATED LIKELIHOOD FOR MULTI-SKEWBART

To update 𝓣, we require the conditional posterior of 𝒯_t given the rest of the parameters. The derivation of this conditional posterior for multi-skewBART is actually a generalization of the derivation of the conditional posterior of 𝒯_t for univariate skewBART, and we provide only a sketch of the derivation of this conditional posterior for multi-skewBART. Let ℒ_t denote the collection of leaf nodes of tree t and let [x ⇝ ℓ] mean that the covariate value x is associated to leaf node ℓ of tree t. The conditional posterior is

p (𝒯_{t} ∣ R_{t}, Σ) = p_{𝒯} (𝒯_{t}) \int \prod_{i = 1}^{n} ϕ_{k} (Y_{i} ∣ f (X_{i}) + Λ Z_{i}, Σ) [\prod_{ℓ \in ℒ_{t}} ϕ_{k} (μ_{t l} ∣ 0, D)] d μ_{t ℓ} = p_{𝒯} (𝒯_{t}) \prod_{ℓ \in ℒ_{t}} \int \prod_{i : X_{i} \to ℓ} {ϕ_{k} (R_{t i} ∣ μ_{t ℓ}, Σ)} ϕ_{k} (μ_{t ℓ} ∣ 0, D) d μ_{t ℓ},

where ϕ_k(U | μ, Σ) is the multivariate 𝒩_k(μ, Σ) density and μ_tℓ has 𝒩_k(0, D) prior. This integrated likelihood can be computed in closed form easily by using standard properties of the multivariate Gaussian distribution. Additionally, by conjugacy of the multivariate normal distribution, we have the full conditionals

μ_{t ℓ} ~ 𝒩_{k} ({(D^{- 1} + N_{ℓ} Σ^{- 1})}^{- 1} Σ^{- 1} \sum_{i} R_{t i}, {(D^{- 1} + N_{ℓ} Σ^{- 1})}^{- 1}),

(C1)

where N_ℓ is the number of observations associated to leaf ℓ of tree t. We adopt the same Metropolis-Hastings steps for modifying the tree structure form as with skewBART.

D. COMPUTATION OF LPML

The logarithm of the pseudo-marginal likelihood (LPML)^24,42 is given by $LPML = \sum_{i = 1}^{n} log ({CPO}_{i})$ , where CPO_i = p(y_i | Y_−i) is the predictive density of observed y_i given rest of the observed data Y_−i (suppressing dependence on X). Due to a simplifying result²⁴, a Monte Carlo approximation of CPO_i can be obtained using samples θ¹, …, θ^S from the joint posterior p(θ | Y) given full observed data Y as

p (y_{i} ∣ Y_{- i}) = {[\int \frac{p (Y ∣ θ) p (θ)}{p (Y) p (y_{i} ∣ θ)} d θ]}^{- 1} \approx \frac{1}{\frac{1}{S} \sum_{s = 1}^{S} {p (y_{i} ∣ θ^{s})}^{- 1}},

where θ denotes the collection of all model parameters. The above approximation of CPO_i is a type of harmonic mean estimator, which are generally known to be unstable (possibly having infinite variance). This approximation of CPO_i can be improved by Pareto smoothed importance sampling (PSIS)⁴³. PSIS applies a smoothing procedure to the importance weights by replacing the largest importance sample ratios with the expected order statistics of the fitted generalized Pareto distribution. The reliability of the approximations can be assessed by the estimated shape parameter $\hat{k}$ of the generalized Pareto distribution. The loo package suggests that approximations of our CPO_i with $\hat{k} < 0.7$ are reliable.

References

1.Fernandes JK, Wiegand RE, Salinas CF, et al. Periodontal disease status in Gullah African Americans with Type 2 diabetesliving in South Carolina. Journal of Periodontology 2009; 80(7): 1062–1068. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Breiman L. Random forests. Machine Learning 2001; 45: 5–32. [Google Scholar]
3.Freund Y, Schapire R. A short introduction to boosting. Journal Japanese Society For Artificial Intelligence For Artificial Intelligence 1999; 14(771–780): 1612. [Google Scholar]
4.Chipman HA, George EI, McCulloch RE. BART: Bayesian additive regression trees. The Annals of Applied Statistics 2010; 4(1): 266 – 298. [Google Scholar]
5.Bhingare A, Sinha D, Pati D, Bandyopadhyay D, Lipsitz SR. Semiparametric Bayesian latent variable regression for skewedmultivariate data. Biometrics 2019; 75(2): 528–538. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Ročková V, Van Der Pas S. Posterior concentration for Bayesian regression trees and forests. The Annals of Statistics 2020; 48(4): 2108 – 2131. [Google Scholar]
7.Linero A, Yang Y. Bayesian regression tree ensembles that adapt to smoothness and sparsity. Journal of the Royal Statistical Society, Series B 2018; 80: 1087–1110. [Google Scholar]
8.Murray JS. Log-Linear Bayesian additive regression trees for multinomial logistic and count regression models. Journal of the American Statistical Association 2021; 116(534): 756–769. [Google Scholar]
9.Sparapani RA, Logan BR, McCulloch RE, Laud PW. Nonparametric survival analysis using Bayesian Additive RegressionTrees (BART). Statistics in Medicine 2016; 35(16): 2741–2753. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Basak P, Linero A, Sinha D, Lipsitz S. Semiparametric analysis of clustered interval-censored survival data using softBayesian additive regression trees (SBART). Biometrics 2022; 78(3): 880–893. [DOI] [PubMed] [Google Scholar]
11.Li Y, Linero AR, Murray J. Adaptive Conditional Distribution Estimation with Bayesian Decision Tree Ensembles. Journal of the American Statistical Association 2022; 0(0): 1–14. [Google Scholar]
12.Linero AR. Bayesian regression trees for high-dimensional prediction and variable selection. Journal of the American Statistical Association 2018; 113(522): 626–636. [Google Scholar]
13.Page RC, Eke PI. Case definitions for use in population-based surveillance of periodontitis. Journal of Periodontology 2007; 78: 1387–1399. [DOI] [PubMed] [Google Scholar]
14.Linero AR, Sinha D, Lipsitz SR. Semiparametric mixed-scale models using shared Bayesian forests. Biometrics 2020; 76(1): 131–144. [DOI] [PubMed] [Google Scholar]
15.Starling JE, Murray JS, Carvalho CM, Bukowski RK, Scott JG. BART with targeted smoothing: An analysis of patient-specific stillbirth risk. The Annals of Applied Statistics 2020; 14(1): 28 – 50. [Google Scholar]
16.Hastie T, Tibshirani R. Bayesian backfitting. Statistical Science 2000; 15(3): 196–213. [Google Scholar]
17.Azzalini A The skew-normal distribution and related multivariate families. Scandinavian Journal of Statistics 2005; 32(2): 159–188. [Google Scholar]
18.Azzalini A, Capitanio A. Statistical applications of the multivariate skew normal distribution. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 1999; 61(3): 579–602. [Google Scholar]
19.Arellano-Valle RB, Genton MG. On fundamental skew distributions. Journal of Multivariate Analysis 2005; 96(1): 93–116. [Google Scholar]
20.Sahu SK, Dey DK, Branco MD. A new class of multivariate skew distributions with applications to Bayesian regression models. Canadian Journal of Statistics 2003; 31(2): 129–150. [Google Scholar]
21.Arellano-Valle R, Bolfarine H, Lachos V. Bayesian inference for skew-normal linear mixed models. Journal of Applied Statistics 2007; 34(6): 663–682. [Google Scholar]
22.Tan YV, Roy J. Bayesian additive regression trees and the General BART model. Statistics in Medicine 2019; 38(25): 5048–5069. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Friedman JH. Multivariate adaptive regression splines. The Annals of Statistics 1991; 19(1): 1–67. [Google Scholar]
24.Gelfand AE, Dey DK. Bayesian model choice: asymptotics and exact calculations. Journal of the Royal Statistical Society. Series B (Methodological) 1994; 56(3): 501–514. [Google Scholar]
25.Gelfand AE. Model determination using sampling-based methods. In: Gilks WR, Richardson S, Spiegelhalter DJ., eds.Markov Chain Monte Carlo in Practice. London, UK: Chapman & Hall. 1996. (pp. 145–161). [Google Scholar]
26.Berger JO, Pericchi LR. The Intrinsic Bayes Factor for Model Selection and Prediction. Journal of the American Statistical Association 1996; 91(433): 109–122. [Google Scholar]
27.Kass RE, Raftery AE. Bayes Factors. Journal of the American Statistical Association 1995; 90(430): 773–795. [Google Scholar]
28.Arellano-Valle R, Bolfarine H, Lachos V. Bayesian inference for skew-normal linear mixed models. Journal of Applied Statistics 2007; 34(6): 663–682. [Google Scholar]
29.Eke PI, Dye B, Wei L, Thornton-Evans G, Genco R. Prevalence of periodontitis in adults in the United States: 2009 and 2010. Journal of Dental Research 2012; 91(10): 914–920. [DOI] [PubMed] [Google Scholar]
30.Reich BJ, Bandyopadhyay D. A latent factor model for spatial data with informative missingness. The Annals of Applied Statistics 2010; 4(1): 439–459. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Kingman A, Susin C, Albandar JM. Effect of partial recording protocols on severity estimates of periodontal disease. Journal of clinical Periodontology 2008; 35(8): 659–667. [DOI] [PubMed] [Google Scholar]
32.Holtfreter B, Albandar JM, Dietrich T, et al. Standards for reporting chronic periodontitis prevalence and severity in epidemiologic studies: Proposed standards from the Joint EU/USA Periodontal Epidemiology Working Group. Journal of Clinical Periodontology 2015; 42(5): 407–412. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Johnson-Spruill I, Hammond P, Davis B, McGee Z, Louden D. Health of Gullah families in South Carolina with type 2 diabetes. The Diabetes Educator 2009; 35(1): 117–123. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Taylor GW, Borgnakke WS. Periodontal disease: Associations with diabetes, glycemic control and complications. Oral Diseases 2008; 14(3): 191–203. [DOI] [PubMed] [Google Scholar]
35.Reichard P, Nilsson BY, Rosenqvist U. The effect of long-term intensified insulin treatment on the development ofmicrovascular complications of diabetes mellitus. New England Journal of Medicine 1993; 329(5): 304–309. [DOI] [PubMed] [Google Scholar]
36.Mealey BL, Oates TW. Diabetes mellitus and periodontal diseases. Journal of Periodontology 2006; 77(8): 1289–1303. [DOI] [PubMed] [Google Scholar]
37.Herring M, Shah S. Periodontal disease and control of diabetes mellitus. The Journal of the American Osteopathic Association 2006; 106: 416–421. [PubMed] [Google Scholar]
38.Friedman JH. Greedy function approximation: A gradient boosting machine.. The Annals of Statistics 2001; 29(5): 1189 – 1232. [Google Scholar]
39.George E, Laud P, Logan B, McCulloch R, Sparapani R. Fully Nonparametric Bayesian Additive Regression Trees. In: Jeliazkov I, Tobias JL., eds. Topics in Identification, Limited Dependent Variables, Partial Observability, Experimentation, and Flexible Modeling: Part B. 40B of Advances in Econometrics. Emerald Publishing Limited. 2019. (pp. 89–110) [Google Scholar]
40.Li X, Bandyopadhyay D, Lipsitz S, Sinha D. Likelihood methods for binary responses of present components in a cluster. Biometrics 2011; 67(2): 629–635. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Neal RM. Slice sampling. The Annals of Statistics 2003; 31(3): 705–767. [Google Scholar]
42.Pettit LI. The conditional predictive ordinate for the Normal distribution. Journal of the Royal Statistical Society. Series B (Methodological) 1990; 52(1): 175–184. [Google Scholar]
43.Vehtari A, Gelman A, Gabry J. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing 2017; 27(5): 1413–1432. [Google Scholar]
44.Linero AR. A review of tree-based Bayesian methods. Communications for Statistical Applications and Methods 2017; 24(6): 543–559. [Google Scholar]
45.Hill JL. Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics 2011; 20(1): 217–240. [Google Scholar]
46.Ibrahim JG, Chen MH, Sinha D. Criterion-based methods for Bayesian model assessment. Statistica Sinica 2001; 11(2): 419–443. [Google Scholar]
47.Gelfand AE, Smith AF. Sampling-based approaches to calculating marginal densities. Journal of the American statistical association 1990; 85(410): 398–409. [Google Scholar]
48.Reich BJ, Bandyopadhyay D, Bondell HD. A nonparametric spatial model for periodontal data with non-random missingness. Journal of the American Statistical Association 2013; 108(503): 820–831. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Azzalini A A class of distributions which includes the normal ones. Scandinavian Journal of Statistics 1985; 12(2): 171–178. [Google Scholar]
50.Caruana R. Multitask learning. Machine Learning 1997; 28: 41–75. [Google Scholar]
51.Darby ML, Walsh M. Dental Hygiene: Theory and Practice. Elsevier Health Sciences. 2014; 4 edn. [Google Scholar]
52.Bandyopadhyay D, Lachos VH, Abanto-Valle CA, Ghosh P. Linear mixed models for skew-normal/independent bivariateresponses with an application to periodontal disease. Statistics in Medicine 2010; 29(25): 2643–2655. [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Geisser S, Eddy WF. A predictive approach to model selection. Journal of the American Statistical Association 1979; 74(365): 153–160. [Google Scholar]
54.Johnson-Spruill I, Hammond P, Davis B, McGee Z, Louden D. Health of Gullah families in South Carolina with Type 2 diabetes. The Diabetes Educator 2009; 35(1): 117–123. [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Susin C, Kingman A, Albandar JM. Effect of partial recording protocols on estimates of prevalence of periodontal disease. Journal of Periodontology 2005; 76(2): 262–267. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supinfo2

NIHMS1850406-supplement-supinfo2.tex^{(1.3KB, tex)}

supinfo1

NIHMS1850406-supplement-supinfo1.sty^{(44.7KB, sty)}

supinfo3

NIHMS1850406-supplement-supinfo3.bbl^{(6.6KB, bbl)}

supinfo4

NIHMS1850406-supplement-supinfo4.bst^{(18.6KB, bst)}

supinfo5

NIHMS1850406-supplement-supinfo5.cls^{(107.1KB, cls)}

[R1] 1.Fernandes JK, Wiegand RE, Salinas CF, et al. Periodontal disease status in Gullah African Americans with Type 2 diabetesliving in South Carolina. Journal of Periodontology 2009; 80(7): 1062–1068. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Breiman L. Random forests. Machine Learning 2001; 45: 5–32. [Google Scholar]

[R3] 3.Freund Y, Schapire R. A short introduction to boosting. Journal Japanese Society For Artificial Intelligence For Artificial Intelligence 1999; 14(771–780): 1612. [Google Scholar]

[R4] 4.Chipman HA, George EI, McCulloch RE. BART: Bayesian additive regression trees. The Annals of Applied Statistics 2010; 4(1): 266 – 298. [Google Scholar]

[R5] 5.Bhingare A, Sinha D, Pati D, Bandyopadhyay D, Lipsitz SR. Semiparametric Bayesian latent variable regression for skewedmultivariate data. Biometrics 2019; 75(2): 528–538. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Ročková V, Van Der Pas S. Posterior concentration for Bayesian regression trees and forests. The Annals of Statistics 2020; 48(4): 2108 – 2131. [Google Scholar]

[R7] 7.Linero A, Yang Y. Bayesian regression tree ensembles that adapt to smoothness and sparsity. Journal of the Royal Statistical Society, Series B 2018; 80: 1087–1110. [Google Scholar]

[R8] 8.Murray JS. Log-Linear Bayesian additive regression trees for multinomial logistic and count regression models. Journal of the American Statistical Association 2021; 116(534): 756–769. [Google Scholar]

[R9] 9.Sparapani RA, Logan BR, McCulloch RE, Laud PW. Nonparametric survival analysis using Bayesian Additive RegressionTrees (BART). Statistics in Medicine 2016; 35(16): 2741–2753. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Basak P, Linero A, Sinha D, Lipsitz S. Semiparametric analysis of clustered interval-censored survival data using softBayesian additive regression trees (SBART). Biometrics 2022; 78(3): 880–893. [DOI] [PubMed] [Google Scholar]

[R11] 11.Li Y, Linero AR, Murray J. Adaptive Conditional Distribution Estimation with Bayesian Decision Tree Ensembles. Journal of the American Statistical Association 2022; 0(0): 1–14. [Google Scholar]

[R12] 12.Linero AR. Bayesian regression trees for high-dimensional prediction and variable selection. Journal of the American Statistical Association 2018; 113(522): 626–636. [Google Scholar]

[R13] 13.Page RC, Eke PI. Case definitions for use in population-based surveillance of periodontitis. Journal of Periodontology 2007; 78: 1387–1399. [DOI] [PubMed] [Google Scholar]

[R14] 14.Linero AR, Sinha D, Lipsitz SR. Semiparametric mixed-scale models using shared Bayesian forests. Biometrics 2020; 76(1): 131–144. [DOI] [PubMed] [Google Scholar]

[R15] 15.Starling JE, Murray JS, Carvalho CM, Bukowski RK, Scott JG. BART with targeted smoothing: An analysis of patient-specific stillbirth risk. The Annals of Applied Statistics 2020; 14(1): 28 – 50. [Google Scholar]

[R16] 16.Hastie T, Tibshirani R. Bayesian backfitting. Statistical Science 2000; 15(3): 196–213. [Google Scholar]

[R17] 17.Azzalini A The skew-normal distribution and related multivariate families. Scandinavian Journal of Statistics 2005; 32(2): 159–188. [Google Scholar]

[R18] 18.Azzalini A, Capitanio A. Statistical applications of the multivariate skew normal distribution. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 1999; 61(3): 579–602. [Google Scholar]

[R19] 19.Arellano-Valle RB, Genton MG. On fundamental skew distributions. Journal of Multivariate Analysis 2005; 96(1): 93–116. [Google Scholar]

[R20] 20.Sahu SK, Dey DK, Branco MD. A new class of multivariate skew distributions with applications to Bayesian regression models. Canadian Journal of Statistics 2003; 31(2): 129–150. [Google Scholar]

[R21] 21.Arellano-Valle R, Bolfarine H, Lachos V. Bayesian inference for skew-normal linear mixed models. Journal of Applied Statistics 2007; 34(6): 663–682. [Google Scholar]

[R22] 22.Tan YV, Roy J. Bayesian additive regression trees and the General BART model. Statistics in Medicine 2019; 38(25): 5048–5069. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] 23.Friedman JH. Multivariate adaptive regression splines. The Annals of Statistics 1991; 19(1): 1–67. [Google Scholar]

[R24] 24.Gelfand AE, Dey DK. Bayesian model choice: asymptotics and exact calculations. Journal of the Royal Statistical Society. Series B (Methodological) 1994; 56(3): 501–514. [Google Scholar]

[R25] 25.Gelfand AE. Model determination using sampling-based methods. In: Gilks WR, Richardson S, Spiegelhalter DJ., eds.Markov Chain Monte Carlo in Practice. London, UK: Chapman & Hall. 1996. (pp. 145–161). [Google Scholar]

[R26] 26.Berger JO, Pericchi LR. The Intrinsic Bayes Factor for Model Selection and Prediction. Journal of the American Statistical Association 1996; 91(433): 109–122. [Google Scholar]

[R27] 27.Kass RE, Raftery AE. Bayes Factors. Journal of the American Statistical Association 1995; 90(430): 773–795. [Google Scholar]

[R28] 28.Arellano-Valle R, Bolfarine H, Lachos V. Bayesian inference for skew-normal linear mixed models. Journal of Applied Statistics 2007; 34(6): 663–682. [Google Scholar]

[R29] 29.Eke PI, Dye B, Wei L, Thornton-Evans G, Genco R. Prevalence of periodontitis in adults in the United States: 2009 and 2010. Journal of Dental Research 2012; 91(10): 914–920. [DOI] [PubMed] [Google Scholar]

[R30] 30.Reich BJ, Bandyopadhyay D. A latent factor model for spatial data with informative missingness. The Annals of Applied Statistics 2010; 4(1): 439–459. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] 31.Kingman A, Susin C, Albandar JM. Effect of partial recording protocols on severity estimates of periodontal disease. Journal of clinical Periodontology 2008; 35(8): 659–667. [DOI] [PubMed] [Google Scholar]

[R32] 32.Holtfreter B, Albandar JM, Dietrich T, et al. Standards for reporting chronic periodontitis prevalence and severity in epidemiologic studies: Proposed standards from the Joint EU/USA Periodontal Epidemiology Working Group. Journal of Clinical Periodontology 2015; 42(5): 407–412. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] 33.Johnson-Spruill I, Hammond P, Davis B, McGee Z, Louden D. Health of Gullah families in South Carolina with type 2 diabetes. The Diabetes Educator 2009; 35(1): 117–123. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] 34.Taylor GW, Borgnakke WS. Periodontal disease: Associations with diabetes, glycemic control and complications. Oral Diseases 2008; 14(3): 191–203. [DOI] [PubMed] [Google Scholar]

[R35] 35.Reichard P, Nilsson BY, Rosenqvist U. The effect of long-term intensified insulin treatment on the development ofmicrovascular complications of diabetes mellitus. New England Journal of Medicine 1993; 329(5): 304–309. [DOI] [PubMed] [Google Scholar]

[R36] 36.Mealey BL, Oates TW. Diabetes mellitus and periodontal diseases. Journal of Periodontology 2006; 77(8): 1289–1303. [DOI] [PubMed] [Google Scholar]

[R37] 37.Herring M, Shah S. Periodontal disease and control of diabetes mellitus. The Journal of the American Osteopathic Association 2006; 106: 416–421. [PubMed] [Google Scholar]

[R38] 38.Friedman JH. Greedy function approximation: A gradient boosting machine.. The Annals of Statistics 2001; 29(5): 1189 – 1232. [Google Scholar]

[R39] 39.George E, Laud P, Logan B, McCulloch R, Sparapani R. Fully Nonparametric Bayesian Additive Regression Trees. In: Jeliazkov I, Tobias JL., eds. Topics in Identification, Limited Dependent Variables, Partial Observability, Experimentation, and Flexible Modeling: Part B. 40B of Advances in Econometrics. Emerald Publishing Limited. 2019. (pp. 89–110) [Google Scholar]

[R40] 40.Li X, Bandyopadhyay D, Lipsitz S, Sinha D. Likelihood methods for binary responses of present components in a cluster. Biometrics 2011; 67(2): 629–635. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] 41.Neal RM. Slice sampling. The Annals of Statistics 2003; 31(3): 705–767. [Google Scholar]

[R42] 42.Pettit LI. The conditional predictive ordinate for the Normal distribution. Journal of the Royal Statistical Society. Series B (Methodological) 1990; 52(1): 175–184. [Google Scholar]

[R43] 43.Vehtari A, Gelman A, Gabry J. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing 2017; 27(5): 1413–1432. [Google Scholar]

[R44] 44.Linero AR. A review of tree-based Bayesian methods. Communications for Statistical Applications and Methods 2017; 24(6): 543–559. [Google Scholar]

[R45] 45.Hill JL. Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics 2011; 20(1): 217–240. [Google Scholar]

[R46] 46.Ibrahim JG, Chen MH, Sinha D. Criterion-based methods for Bayesian model assessment. Statistica Sinica 2001; 11(2): 419–443. [Google Scholar]

[R47] 47.Gelfand AE, Smith AF. Sampling-based approaches to calculating marginal densities. Journal of the American statistical association 1990; 85(410): 398–409. [Google Scholar]

[R48] 48.Reich BJ, Bandyopadhyay D, Bondell HD. A nonparametric spatial model for periodontal data with non-random missingness. Journal of the American Statistical Association 2013; 108(503): 820–831. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R49] 49.Azzalini A A class of distributions which includes the normal ones. Scandinavian Journal of Statistics 1985; 12(2): 171–178. [Google Scholar]

[R50] 50.Caruana R. Multitask learning. Machine Learning 1997; 28: 41–75. [Google Scholar]

[R51] 51.Darby ML, Walsh M. Dental Hygiene: Theory and Practice. Elsevier Health Sciences. 2014; 4 edn. [Google Scholar]

[R52] 52.Bandyopadhyay D, Lachos VH, Abanto-Valle CA, Ghosh P. Linear mixed models for skew-normal/independent bivariateresponses with an application to periodontal disease. Statistics in Medicine 2010; 29(25): 2643–2655. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R53] 53.Geisser S, Eddy WF. A predictive approach to model selection. Journal of the American Statistical Association 1979; 74(365): 153–160. [Google Scholar]

[R54] 54.Johnson-Spruill I, Hammond P, Davis B, McGee Z, Louden D. Health of Gullah families in South Carolina with Type 2 diabetes. The Diabetes Educator 2009; 35(1): 117–123. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R55] 55.Susin C, Kingman A, Albandar JM. Effect of partial recording protocols on estimates of prevalence of periodontal disease. Journal of Periodontology 2005; 76(2): 262–267. [DOI] [PubMed] [Google Scholar]

PERMALINK

Bayesian Additive Regression Trees for Multivariate Skewed Responses

Seungha Um

Antonio R Linero

Debajyoti Sinha

Dipankar Bandyopadhyay

Summary

1 |. INTRODUCTION

FIGURE 1.

2 |. THE SKEWBART AND MULTI-SKEWBART MODELS

2.1 |. Review of the BART and SBART models

FIGURE 2.

2.2 |. Review of the skew-normal distribution

2.3 |. Univariate skewBART

2.4 |. Multivariate skewBART

3 |. PRIOR SPECIFICATION AND POSTERIOR COMPUTATION

3.1 |. skewBART: Prior choices and MCMC computation

3.2 |. multi-skewBART: Prior choices and MCMC computation

4 |. SIMULATION STUDY

4.1 |. Univariate responses

FIGURE 3.

FIGURE 4.

4.2 |. Bivariate responses

FIGURE 5.

TABLE 1.

FIGURE 6.

5 |. APPLICATION: THE GAAD STUDY

TABLE 2.

TABLE 3.

FIGURE 7.

6 |. CONCLUSION

Supplementary Material

ACKNOWLEDGEMENTS

APPENDIX

A. MCMC FOR SKEWBART

B. MCMC FOR MULTI-SKEWBART

C. THE INTEGRATED LIKELIHOOD FOR MULTI-SKEWBART

D. COMPUTATION OF LPML

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

2.3 |. Univariate `skewBART`

2.4 |. Multivariate `skewBART`

3.1 |. `skewBART`: Prior choices and MCMC computation

3.2 |. `multi-skewBART`: Prior choices and MCMC computation

A. MCMC FOR `SKEWBART`

B. MCMC FOR `MULTI-SKEWBART`