Joint Models for the Association of Longitudinal Binary and Continuous Processes With Application to a Smoking Cessation Trial

Xuefeng Liu; Michael J Daniels; Bess Marcus

doi:10.1198/016214508000000904

. Author manuscript; available in PMC: 2010 Jun 1.

Published in final edited form as: J Am Stat Assoc. 2009 Jun 1;104(486):429–438. doi: 10.1198/016214508000000904

Joint Models for the Association of Longitudinal Binary and Continuous Processes With Application to a Smoking Cessation Trial

Xuefeng Liu ¹, Michael J Daniels ², Bess Marcus ³

PMCID: PMC2746699 NIHMSID: NIHMS143238 PMID: 20161053

Abstract

Joint models for the association of a longitudinal binary and a longitudinal continuous process are proposed for situations in which their association is of direct interest. The models are parameterized such that the dependence between the two processes is characterized by unconstrained regression coefficients. Bayesian variable selection techniques are used to parsimoniously model these coefficients. A Markov chain Monte Carlo (MCMC) sampling algorithm is developed for sampling from the posterior distribution, using data augmentation steps to handle missing data. Several technical issues are addressed to implement the MCMC algorithm efficiently. The models are motivated by, and are used for, the analysis of a smoking cessation clinical trial in which an important question of interest was the effect of the (exercise) treatment on the relationship between smoking cessation and weight gain.

Keywords: Calibrated posterior predictive p-value, Data augmentation, Dependence, Joint models, Markov chain Monte Carlo, Parameter expansion, Stochastic search variable selection

1. INTRODUCTION

In some longitudinal studies, although one time-varying outcome may be of primary interest, several related processes are measured. Examples in smoking cessation studies include smoking status and weight change and smoking status and alcohol use. In such studies, the association between the processes can reveal a great deal about the mechanism of behavior change. For example, a motivation for using exercise as an adjunct therapy for smoking cessation is to reduce the dependence between weight gain and relapse back to smoking and between the fear of weight gain and the inability to make a successful quit attempt. In this study we built models in the setting of two processes: a longitudinal binary process (e.g., smoking cessation) and a longitudinal continuous process (e.g., weight change). Our primary interest was to model the association between the two processes. We apply the models to a recent smoking cessation trial (Marcus et al. 2003), where the investigators were interested in the relation between smoking status and weight change.

Our approach builds on and extends recent research on joint modeling of mixed outcomes and Bayesian variable selection. For joint modeling, a well-known technique of joint modeling of mixed outcomes is based on introducing a partly observed random variable following a bivariate normal distribution, where one component defines the continuous outcome and the second, latent component defines the binary outcome through the common probit transformation. Authors taking this type of approach include Catalano and Ryan (1992), Cox and Wermuth (1992), Dunson (2000), Fitzmaurice and Laird (1995), Gueorguieva and Agresti (2001), Regan and Catalano (1999), Roy and Lin (2000), and Sammel, Ryan, and Legler (1997). In the present work, we extend this approach to longitudinal data with T individual measurements by considering a partly observed random variable following a 2T-variate normal distribution where the first T components define the binary outcomes by applying probit transformations and the last T components are continuous outcomes. Moreover, the main question of interest in previous studies was the effect of some treatment or therapy on the mean of the response vector. The effect of the treatment on the association between two outcomes was not of main interest. We propose similar models in which the association between two processes is of concern. To do this, we use a Bartlett decomposition of the covariance matrix (Bartlett 1933).

The association matrix induced by the Bartlett decomposition is high-dimensional and expected to be sparse, so we borrow ideas from the Bayesian variable selection literature to reduce the number of parameters (George and McCulloch 1993, 1997; Smith and Kohn 2002). Other related work includes that of Carlin and Chib (1995), Chipman (1996), Hoeting, Raftery, and Madigan (1996), and Wakefield and Bennett (1996). George and McCulloch developed stochastic search variable selection (SSVS) to select promising subsets of a set of covariates X₁, . . ., X_p for further consideration in regression models. Smith and Kohn (2002) proposed similar techniques for modeling a covariance matrix with high dimension for longitudinal data. In this study we construct a hierarchical prior to parsimoniously model the association between two longitudinal processes by extending the ideas of Smith and Kohn (2002). The hierarchical specification has the advantage that potential 0’s in the association matrix can be identified and estimates of the parameters can be calculated to account for the model uncertainty associated with determining which elements are 0’s.

We develop a Markov chain Monte Carlo (MCMC) algorithm to estimate the posterior distribution of the parameters in the models. To implement the MCMC algorithm more efficiently, we address several technical issues, including efficiently sampling from truncated multivariate normal (TMVN) distributions and efficiently sampling a correlation matrix from its full conditional distribution. Geweke (1991) and Robert (1995) proposed a Gibbs sampling algorithm to sample from a TMVN distribution. This algorithm is somewhat inefficient in that it requires repeatedly evaluating conditional means and variances from univariate conditional normals and can result in high autocorrelations in the chain. Barnard, McCulloch, and Meng (2000) and Chib and Greenberg (1998) suggested using the Griddy Gibbs (GG) sampler and the random-walk Metropolis-Hastings (RW-MH) algorithm to sample a correlation matrix. Although the GG sampler is simple to implement, it is not computationally efficient. The RW-MH algorithm has the problem of potentially slow mixing. We propose better ways to handle these issues.

The article is organized as follows. We introduce a smoking cessation clinical trial that motivates this research in Section 2. In Section 3 we propose joint models and hierarchical priors used to parsimoniously model the association between two processes. In Section 4 we describe MCMC sampling techniques to estimate the posterior distribution of parameters and address several technical issues regarding efficient implementation of the MCMC algorithm. We discuss the deviance information criterion (DIC) for model comparison and the calibrated posterior predictive p value for goodness of fit in Section 5. Finally, we present the results and conclusions from the analysis of the clinical trial in Section 6.

2. APPLICATION: SMOKING CESSATION TRIAL

Commit to Quit II (CTQ II) (Marcus et al. 2003, 2005) was a 4-year randomized controlled clinical trial designed to test the efficacy of moderate-intensity physical activity as an aid for smoking cessation among women. This study was a logical progression of previous work (CTQ I) (Marcus, King, Albrecht, Parisi, and Abrams 1997; Marcus et al. 1999) on the efficacy of vigorous-intensity exercise to aid smoking cessation and weight regulation in women smokers, because moderate-intensity exercise is less arduous and can be performed by healthy individuals without medical supervision. In the CTQ II trial, 217 healthy women aged 18-65 who had regularly smoked 5 or more cigarettes per day for at least 1 year and who had routinely participated in moderate or vigorous intensity physical activity for 90 minutes or less each week were recruited and randomized to one of the two conditions (treatments): a moderate-intensity exercise condition or a contact condition. These two treatments are designated exercise and wellness. All recruited women participated in an 8-week cognitive-behavioral group-based smoking cessation program, followed by a 12-month follow-up. Participants in the exercise group were required to attend one supervised exercise session per week, on their smoking cessation treatment night. They were also given written instructions for home exercises. The duration and intensity of the exercise were gradually increased to 165 minutes per week, which could be performed onsite or at home. Participants in the contact group received lectures, films, and handouts on a variety of health and lifestyle issues. All participants were encouraged to attend makeup sessions if they failed to attend any session during the 8 weeks of treatments. Smoking status was determined through self-report and carbon monoxide testing at each session. In addition, participants were weighed on a weekly basis during the 8 weeks of treatment. This design allowed for a comparison of the effect of moderate-intensity physical activity plus standard smoking cessation with the effect of contact plus standard smoking cessation.

The primary outcome was quit status (a longitudinal binary outcome), but another outcome—weight change (a longitudinal continuous outcome)—also was measured. The investigators were interested in two questions:

Does moderate-intensity exercise have significant effects on smoking cessation? In the study of the CTQ I trial, the investigators found significant differences between vigorous activity and contact control group through 12 months of follow-up.
Does exercise effect smoking cessation by weakening the association between smoking status and weight gain?

Question 2 motivates our development of joint models for the association of longitudinal binary and continuous processes to examine the differential patterns of association across treatments in Section 3. But our approach also allows us to answer question 1.

3. JOINT MODELS AND PRIORS

3.1 Joint Models

Several authors have developed joint models for the analysis of multivariate longitudinal data using latent normal variables (Daniels and Normand 2006; Dunson 2003; Gueorguieva and Sanacora 2006). In this section we propose similar joint models for the association of a longitudinal binary and a longitudinal continuous process. In our setting the time-dependent covariance matrix is modeled as a function of predictors, however, and is of primary interest.

Denote the binary outcome (in our example, smoking status) for subject j in treatment i at week t(i = 1, . . . , m;j = 1, . . . , n_i; t = 1, . . . , T) by Q_ij,t, and denote the continuous outcome (in our example, weight change) by W_ij,t. Define the vectors of responses for binary and continuous outcomes as Q_ij = (Q_ij_,1, . . . , Q_ij,T)’ and W_ij = (W_ij,1, . . . , W_ij,T)’. Also define a vector of latent variables underlying the binary vector Q_ij as Y_ij = (Y_ij,1, . . . , Y_ij,T)’. Suppose that V_ij is a vector of joint processes such that $V_{i j} = (Y_{i j}^{'}, W_{i j}^{'})$ . Then the joint distribution of binary and continuous variables over time can be modeled using the multivariate normal specification ${(Y_{i j}^{'}, W_{i j}^{'})}^{'} \sim N (X_{i j} β, Σ_{i})$ ,

V_{i j} = (\begin{matrix} Y_{i j} \\ W_{i j} \end{matrix}) = X_{i j} β + ∊_{i},

(1)

where X_ij is the design matrix, β is the vector of regression coefficients, and

∊_{i} ~ N (0, Σ_{i}) with Σ_{i} = (\begin{matrix} Σ_{i, 11} & Σ_{i, 21} \\ Σ_{i, 21} & Σ_{i, 22} \end{matrix}) .

Using the probit formulation for the binary process, we have Q_ij,t = I{Y_ij,t > 0}. To estimate the association between Q_ij and W_ij, we need models for Σ_i,12 as a function of treatment (and/or other subject specific covariates that might affect this relationship). But both Σ_i,12 and the entire covariance matrix Σ_i are difficult to model, because of positive definiteness constraints (Daniels and Kass 2001; Pourahmadi and Daniels 2002) and because it is high-dimensional for each subject. To address this problem, we factor the joint distribution of Y_ij and W_ij into two components: a marginal model for Y_ij and a correlated regression model for W_ij given Y_ij, by extending the ideas of Fitzmaurice and Laird (1995) and Gueorguieva and Agresti (2001). Let

X_{i j} = (\begin{matrix} X_{1 i j} & 0 \\ 0 & X_{2 i j} \end{matrix}) and β = (\begin{matrix} β_{1} \\ β_{2} \end{matrix});

then the new models can be expressed as

Y_{i j} = X_{1 i j} β_{1} + ∊_{1 i}

(2)

and

W_{i j} = X_{2 i j} β_{2} + B_{i} (Y_{i j} - X_{1 i j} β_{1}) + ∊_{2 i} .

(3)

where B_i = Σ_i,21 Σ^-1_i,11 is the matrix that reflects association between Y_ij and W_ij, ε_1i ~ N(0, Σ_i,11), and $∊_{2 i} \sim N (0, Σ_{i, 22}^{*})$ with $Σ_{i, 22}^{*} = Σ_{i, 22} - Σ_{i, 21} Σ_{i, 11}^{- 1} Σ_{i, 12}$ . The reparameterization of Σ_i to $(Σ_{i, 11}, B_{i}, Σ_{i, 22}^{*})$ is known in the literature as the Bartlett decomposition of a covariance matrix (Bartlett 1933).

It is easy to see that (2) is a correlated probit model and (3) is a standard correlated regression model, conditional on the latent variable Y_ij. For identifiability, it is common to restrict Σ_i,11 to be a correlation matrix, R_i,11 (Chib and Greenberg 1998); in the rest of this article, we use R_i,11 instead of Σ_i,11 as notation for the covariance matrix in (2). The advantage of this factorization is that the components of B_i in (3) are directly related to the variance and correlation terms in Σ_i,12. In addition, this factorization provides a convenient parameterization for examining the association between Y_ij (Q_ij) and W_ij, because the components of the B_i matrix are unconstrained.

Beside being unconstrained, the association matrix B_i in model (3) can be easily interpreted. The tth row of B_i reflects the association of the continuous process at week t with the binary process at all weeks (t = 1, . . . , T). In particular, it corresponds to the regression model $w_{i j, t} ∣ Y_{i j} = x_{2 i j, t}^{'} β_{2} + b_{i, t}^{'} (Y_{i j} - X_{1 i j} β_{1}) + ∊_{2 i, t}$ where b_i,t = (b_i,t1, . . . , b_i,tT)’ is the tth row of B_i. Because the covariates associated with b_i,t, Y_ij - X_1ijβ₁, are centered with variance 1 (recall that the marginal covariance matrix of Y_ij is a correlation matrix), the components of B_i are standardized regression coefficients. This property of the components of B_i will facilitate between-component comparisons and motivate ideas for modeling it parsimoniously.

3.2 Priors for Parameters in Joint Models

For Bayesian inference, we need to specify priors for parameters in the models described in Section 3.1. Let b_i denote the column vector obtained by stringing the rows of B_i (i = 1, . . . , m); that is, b_i = (B_i,11, . . . , B_i,1T, . . . , B_i,TT)’. Let R₁ = (R’_1,11, . . . , R’_m,11)’, $b = {(b_{1}^{'}, \dots, b_{m}^{'})}^{'}$ , and $Σ_{2}^{*} = {(Σ_{1, 22}^{*^{'}}, \dots, Σ_{m, 22}^{*^{'}})}^{'}$ . We write the joint prior in our models as

π (β, R_{1}, b, Σ_{2}^{*}) = \prod_{i = 1}^{m} π (β) π (R_{i, 11}) π (b_{i} ∣ β, Σ_{i, 22}^{*}) π (Σ_{i, 22}^{*}) .

(4)

This specification implies that β, b_i, and $Σ_{i, 22}^{*}$ are a priori jointly independent of R_i,11 and, marginally, β and $Σ_{i, 22}^{*}$ are a priori independent. Because we have little prior information for β, $Σ_{i, 22}^{*}$ and R_i,11, we specify flat priors for β and $Σ_{i, 22}^{*}$ and a joint uniform prior (derived in Barnard et al. 2000) for R_i,11,

π (β) \propto 1,

(5)

\begin{matrix} π & (Σ_{i, 22}^{*}) \propto I {Σ_{i, 22}^{*} \in {(- \infty, \infty)}^{T (T + 1) ∕ 2} \\ and Σ_{i, 22}^{*} is positive definite}, \end{matrix}

(6)

and

\begin{matrix} π & (R_{i, 11}) \propto I {r_{i, j k} : r_{i, j k} = 1 (j = k), ∣ r_{i, j k} ∣ < 1 (j \neq k) \\ and R_{i, 11} is positive definite}, \end{matrix}

(7)

where r_i,jk (j ≠ k, j, k = 1, . . . ,T) is the off-diagonal element of the jth row and kth column in R_i,11.

3.2.1 Prior for the Elements of the Association Matrix

We now provide details on the prior $π (b_{i} ∣ β, Σ_{i, 22}^{*})$ . Because the components of b_i are the regression coefficients of W_ij on Y_ij, we expect many of the components of b_i to be 0’s based on conditional independence (Markov-type) arguments for longitudinal data. Consider, for example, the regression for w_ij,t. Once we condition on {Y_ij,k : t - 1 ≤ k ≤ t + 1} (current value and lag 1 value forward and backward), we might expect w_ij,t to be independent of {Y_ij,k : k > t + 1, k < t - 1} (values more than lag 1 away). To incorporate these features into the model, we specify a hierarchical prior distribution that essentially allows components of b_i to be 0’s, borrowing ideas from Smith and Kohn (2002). This also will facilitate reducing the number of dependence parameters, which can be quite large.

A key feature of the hierarchical prior is that each component of b_i (recall that each component is on the same scale because they are standardized regression coefficients) can be exactly 0 with positive probability. To attain this, we introduce the latent indicator vector δ_i associated with b_i such that

b_{i, t l} = {\begin{matrix} 0 & if δ_{i, t l} = 0 \\ 1 & if δ_{i, t l} = 1, \end{matrix}

where b_i,tl is the component of the tth row and lth column in B_i and δ_i,tl is the corresponding binary indicator of δ_i that is associated with b_i,tl. The nonzero components of the vector b_i (i.e., the components for which δ_i,tl = 1) are given a normal prior (conditional on δ_i,tl = 1) with mean

\begin{matrix} E [b_{i} ∣ Y, β, δ, Σ_{22}^{*}] = & {(\sum_{j = 1}^{n_{i}} C_{i j δ}^{'} {(n_{i} Σ_{i, 22}^{*})}^{- 1} C_{i j δ})}^{- 1} \\ \times \sum_{j = 1}^{n_{i}} C_{i j δ}^{'} {(n_{i} Σ_{i, 22}^{*})}^{- 1} z_{i j} \end{matrix}

and covariance matrix

var [b_{i} ∣ Y, β, δ, Σ_{22}^{*}] = {(\sum_{j = 1}^{n_{i}} {C^{'}}_{i j δ} {(n_{i} \sum_{i, 22}^{*})}^{- 1} C_{i j δ})}^{- 1},

where Z_ij = W_ij - X_2ijβ₂, C_ijδ is obtained by removing from C_ij the columns corresponding to zero elements of b_i, and C_ij is T × T² matrix with elements that are functions of Y_ij and the tth row, $C_{i j, t} = (0_{i j, 1}^{'}, \dots, 0_{i j, (t - 1)}^{'}, {(Y_{i j} - X_{1 i j} β_{1})}_{t}^{'}, 0_{i j, (t + 1)}^{'}, \dots, 0_{i j, T}^{'})$ .The vector δ_i is then given the prior

π (δ_{i} ∣ p_{i}) = \prod_{t = 1}^{T} \prod_{l = 1}^{T} π (δ_{i, t l ∣ p_{i}})

(8)

\begin{matrix} = & \prod_{t = 1}^{T} \prod_{l = 1}^{T} Bernoulli (p_{i}^{{∣ t - 1 ∣}^{1 ∕ a_{0 + 1}}}), \\ π (p_{i}) = Beta (r_{i}, λ_{i}), \end{matrix}

(9)

where δ_i = (δ_i,11, . . . , δ_i,1T, . . . , δ_i,T1, . . . , δ_i,TT)’; δ_i,tl (i = 1, . . . , m; l, t = 1, . . . , T) is the element of δ_i associated with b_i,tl, which is the regression coefficient of w_ij,t on y_ij,l, a₀ is a tuning parameter; and (r_i, λ_i) are the corresponding hyperparameters.

The prior for b_i, given δ_i, is derived based on

π (b_{i} ∣ δ_{i}, β, Σ_{i, 22}^{*}) \propto f^{1 ∕ n_{i}} (W_{i} ∣ Y_{i}, β, b_{i}, δ_{i}, Σ_{i, 22}^{*}) .

(10)

The rationale for (10) is that the prior provides only 1/nth of the weight provided by the likelihood.

Based on this prior construction, the quantity $p_{i}^{{∣ t - l ∣}^{1 ∕ a_{0 + 1}}}$ in prior (8) can be considered the prior probability that b_i,tl will require a nonzero estimate, and |t - l|^1/a₀ implies that b_i,tl is more likely to be 0 as |t - l|^1/a₀ grows larger. Here a₀ is a tuning parameter that controls the rate of decay of the probability of a nonzero component as a function of lag. Expression (8) implies that the components of B_i become smaller a priori as they move away from the main diagonal (in the longitudinal setting, become smaller as they move farther away in time). This exponent also can be adjusted if we expect, a priori, a lagged relationship.

3.2.2 Using the Prior for the Association Matrix in Practice

A complication with the prior for b_i is that it is a function of $(Y, β, Σ_{i, 22}^{*})$ , which complicates the form of the full conditional distributions for the MCMC algorithm described in the Web appendix, Section I (www.amstat.org/publications/jasa/supplemental_materials). To address this issue, we replace the mean and covariance for the prior with μ^_{b_iδ} and covariance Σ̂_{b_iδ}, defined later. These quantities are computed from an MCMC run with b_i = 0 as $\sum_{k} E (b_{i} ∣ Y^{(k)}, β^{(k)}, δ, Σ_{22}^{(k) *})$ and ${\hat{Σ}}_{b_{i δ}} = \sum_{k} var (b_{i} ∣ Y^{(k)}, β^{(k)}, δ, Σ_{22}^{(k) *})$ , where k indexes the posterior sample. An alternative would be to just use a mean and variance with the posterior means of $(Y, β, Σ_{22}^{*})$ plugged in.

3.2.3 Propriety of the Joint Posterior Distribution

Given that we have little prior information on β, $Σ_{i, 22}^{*}$ and R_i,11, we propose using flat priors for β, $Σ_{i, 22}^{*}$ , and R_i,11. Because the priors for β and $Σ_{i, 22}^{*}$ are improper, we need to check that the joint posterior distribution is integrable. In the Web appendix, Section II (www.amstat.org/publications/jasa/supplemental_materials), we state (and prove) a theorem that guarantees integrability of the posterior when four easily checked conditions are met. The proof of this theorem extends some results of Chen and Shao (1999) for the propriety of posterior distributions for multivariate categorical response models and of Daniels (2006) for the propriety of the posterior for linear regression with cor related and/or heterogeneous errors.

4. POSTERIOR SAMPLING

4.1 The MCMC Sampling Algorithm

We develop an MCMC algorithm to sample from the posterior distribution of the parameter $θ = (β, R_{1}, b, δ, p, Σ_{2}^{*})$ , where $δ = {(δ_{1}^{'}, \dots, δ_{m}^{'})}^{'}$ , p = (p₁, . . . , p_m)’, and R₁, b, and $Σ_{2}^{*}$ are defined as in (4). To simplify implementation of the algorithm, we include a data augmentation (DA) step (Tanner and Wong 1987) that we use to impute the latent data and the missing values. Let Q_obs be the vector of observed binary outcomes, Q_mis be the vector of missing binary outcomes, W_obs be the vector of observed continuous outcomes, and W_mis be the vector of missing continuous outcomes. Let Y_obs denote the latent vector associated with Q_obs and let Y_mis denote the latent vector related to Q_mis. Define Y to be ${(Y_{obs}^{'}, Y_{mis}^{'})}^{'}$ Q to be ${(Q_{obs}^{'}, Q_{mis}^{'})}^{'}$ and W to be ${(W_{obs}^{'}, W_{mis}^{'})}^{'}$ . We use the generic notation f for the distribution of responses, π for the prior and posterior distributions of related parameters, and L for the likelihood function. Our algorithm comprises a DA imputation (DAI) step and a posterior sampling step as follows:

DAI step. Sample latent data Y and missing values W_mis from f (Y, W_mis|θ,Q_obs, W_obs). To do this, factor this distribution as
$\begin{matrix} f (Y, W_{m i s} ∣ θ, Q_{o b s}, W_{o b s}) \\ = f (W_{m i s} ∣ θ, Y_{o b s}, Y_{m i s}, W_{o b s}) f (Y_{m i s} ∣ θ, Y_{o b s}, W_{o b s}) \\ \times f (Y_{o b s} ∣ θ, Q_{o b s}, W_{o b s}) . \end{matrix}$ (11)
PS step (posterior sampling step). Generate θ from f(θ|Y, W) using Gibbs sampling. The DAI step involves sampling in the order from [Y_obs|θ, Q_obs, W_obs], [Y_mis|θ, Y_obs, W_obs], and [W_mis|θ, Y_obs, Y_mis, W_obs]. Once we obtain latent data and missing values, we need to sample from full conditional distributions of the components of θ. This can be completed using the Gibbs sampler. All of the full conditional distributions for the Gibbs sampler are derived in the Web appendix, Section I (www.amstat.org/publications/jasa/supplemental_materials).

4.2 Technical Issues

To implement the MCMC algorithm efficiently, two technical issues must be addressed: imputing the latent data and sampling the correlation matrix.

4.2.1 Imputing the Latent Data

One of the challenges with the multivariate probit models is the simulation of latent variables from the TMVN distribution of Y_obs given (θ, Q_obs, W_obs). We propose an algorithm to sample from this distribution efficiently.

For simplicity, assume that Y is a T × 1 vector and Y ~ N(μ, Σ)I_U1, where U₁ ∈ C^T is a truncation region. If we partition Y, μ, and Σ as

Y = (\begin{matrix} y_{t} \\ Y_{(- t)} \end{matrix}), μ = (\begin{matrix} μ_{t} \\ μ_{(- t)} \end{matrix}),

and

Σ = (\begin{matrix} σ_{t t} & Σ_{2} \\ Σ_{2}^{'} & Σ_{22} \end{matrix}),

then we have $y_{t} ∣ Y_{(- t)} = y_{(- t)} ~ N (μ_{t}^{*}, σ_{t t}^{*}) I_{U_{1 t}}$ where

\begin{matrix} μ_{t}^{*} = μ_{t} + Σ_{2} Σ_{22}^{- 1} (y_{(- t)} - μ_{(- t)}) \\ σ_{t t}^{*} = σ_{t t} - Σ_{2} Σ_{22}^{- 1} Σ_{2}^{'} . \end{matrix}

(12)

Here U_1t = {y_t ∈ C : (y_t, y_(-t)) ∈ U₁}. Geweke (1991) and Robert (1995) proposed a Gibbs sampling algorithm for sampling Y. The kernel of the Markov chain $y^{(k)} = (y_{1}^{(k)}, \dots, y_{T}^{(k)})$ in this algorithm was obtained by successively generating the components of y from their full conditional distributions $π (y_{t}^{(k)} ∣ y_{1}^{(k)}, \dots, y_{t - 1}^{(k)}, \dots, y_{t + 1}^{(k - 1)}, \dots, y_{T}^{(k - 1)})$ . A disadvantage of this algorithm is that it requires repeatedly computing the T means and variances given in (12) and often results in high autocorrelations. The following proposition provides a simple way to sample from the TMVN distribution without the need to compute (12) each time; we expect it to provide lower autocorrelations as well.

Proposition 1

Suppose that Y~ TN(μ, Σ)I_U₁, where U₁ is the truncation region of Y. Decompose Σ as PP’, where P is a lower triangular matrix. If U₁ is a convex set, then simulating Y from TN(μ, Σ)I_U₁ is equivalent to first sampling Z from TN(ν,I_{T × T})I_U₂ and then translating back to Y through Y = PZ. Here, ν = P^-1μ, and U₂ is the transformed version of U₁ (see the Web appendix, Section II for the proof).

This proposition is motivated by the integral calculation method mentioned by Chib and Greenberg (1998, p. 354). The idea behind this proposition is to obtain a more efficient implementation of the Gibbs sampler based on the new set of T conditional distributions of components of Z. These distributions are simple in the sense that we do not need to compute means and variances in (12), and the univariate truncation intervals U_2t can be easily derived. For example, in our model, Y~ TN(μ, R)I_U₁ with U₁ = {y ∈ C^T: S₁y ≤ 0}, where S₁ is an diagonal matrix with S₁ (t, t) = 1 if Q_t = 0 and -1 if Q_t = 1 (t = 1, . . . , T). By the transformation Z = P^-1Y, we have Z~ TN(ν, I_{T × T})I_U₂, where U₂ = {z ∈ C^T:S₂z ≤ 0} and S₂ = S₁P.

Thus, at iteration k, $z_{t}^{(k)} ∣ z_{1}^{(k)}, \dots, z_{t - 1}^{(k)}, z_{t + 1}^{(k - 1)}, \dots, z_{t}^{(k - 1)} ~ N (ν_{t}, 1) I_{U_{2 t}}$ , where ν_t is the tth element of ν and U_2t = {z_t∈C:S₂z ≤ 0}. Let S_2(-t) be the matrix {s₁, . . . , s_t-1, s_t+1, . . . , s_T}’, and let z_(-t) denote the vector {z₁, . . . , z_t-1, z_t+1, . . . , z_T}’. Then U_2t is given by

U_{2 t} = {z_{t} \in C : s_{t} z_{t} \leq - S_{2 (- t)} z_{(- t)}} .

(13)

The Markov chain $y^{(k)} = (y_{1}^{(k)}, \dots, y_{T}^{(k)})$ in our implementation of the Gibbs sampler can be obtained by first generating all components of z one by one from $π (z_{t}^{(k)} ∣ z_{t}^{(k)}, \dots, z_{t - 1}^{(k)}, z_{t + 1}^{(k - 1)}, \dots, z_{T}^{(k - 1)})$ (t = 1, . . . , T), and then translating back to y^(k) by y^(k) = Pz^(k).

4.2.2 Sampling the Correlation Matrix

Sampling correlation matrices in MCMC algorithms can be problematic. In addition to the positive definite constraint of covariance matrices, they have diagonal elements fixed at 1. The ideas for data augmentation and parameter expansion, introduced by Liu, Rubin, and Wu (1998), Liu and Wu (1999), and van Dyk and Meng (2001) to speed up convergence of algorithms (EM, DA, or others), provide a useful tool for addressing this problem. Liu (2007) and Liu and Daniels (2006) developed two-stage parameter expanded reparameterization and Metropolis-Hastings (PX-RPMH) algorithms for sampling a correlation matrix R by extending this idea. In these algorithms, the difficulty of simulating R can be overcome by creating an “expanded” model in which R can be transformed to a less constrained covariance matrix Ψ by borrowing the scale parameters from an expansion parameter matrix. In what follows, we derive an PX-RPMH algorithm for sampling the correlation matrix R_i,11 in the joint models in Section 3. For notational convenience, we denote R_i,11 as R_i in this section.

Define θ_{(-R_i)} to be the parameter vector not including R_i, and define D_i to be the expansion parameter, which is a diagonal matrix that we introduce to transform R_i into a less constrained covariance matrix, Ψ_i = D_iR_iD_i. Consider the following one-to-one mapping from {Y_ij, R_i,B_i to ${Y_{i j}^{*}, Ψ_{i}, B_{i}^{*}}$ :

{\begin{matrix} Y_{i j} = X_{1 i j} β + D_{i}^{- 1} Y_{i j}^{*} \\ R_{i} = D_{i}^{- 1} Ψ_{i} D_{i}^{- 1} (i = 1, \dots, m; j = 1, \dots, n_{i}), \\ B_{i} = B_{i}^{*} D_{i} \end{matrix}

(14)

where $Σ_{j = 1}^{n_{i}} Y_{i j, k}^{*^{2}} = 1$ for any t = 1, . . . , T. Given β, the step that draws Y_ij implicitly draws $Y_{i j}^{*}$ and D_i, because $Σ_{j = 1}^{n_{i}} {(Y_{i j, t} - x_{1 i j, t}^{'} β)}^{2} = D_{i, t t}^{- 1} Σ_{j = 1}^{n_{i}} Y_{i j, t}^{*^{2}} = D_{i, t t}^{- 1}$ , where D_i,tt is the tth element of D_i and $x_{1 i j, t}^{'}$ is the tth row of X_1ij. The space for $(Y_{i j}^{*}, ψ_{i}, B_{i}^{*})$ is higher-dimensional than that for (Y_ij, R_i, B_i), because R_i has fewer parameters than ψ_i. The constraints $Σ_{j = 1}^{n_{i}} Y_{i j, t}^{*^{2}} = 1$ for any t = 1, . . . , T, are needed to make the candidate transformation a one-to-one mapping. By specifying the candidate prior for R_i, given by

\begin{matrix} π & (R_{i}) \propto {∣ R_{i} ∣}^{- a_{i} ∕ 2} I {r_{i, j k} : r_{i, j k} = 1 (j = k), ∣ r_{i}, j k ∣ < 1 (j \neq k) \\ and R_{i, 11} is positive definite}, \end{matrix}

(15)

where a_i is a constant to be determined, we can derive a (parameter-expanded) candidate density (PXCD) for Ψ_i based on the following proposition. Note that the candidate prior is introduced solely to derive a candidate density for the Metropolis-Hastings algorithm. It is not used for inference.

Proposition 2

If we choose priors as specified in Section 3.2, then, from the likelihood function for the complete data in (2), transformation (14) and candidate prior (15), we obtain

\begin{matrix} π & (Ψ_{i} ∣ Y_{i}^{*}, B_{i}^{*}, β) \propto {∣ Ψ_{i} ∣}^{- (ν_{i} + T + 1) ∕ 2} \\ \times \exp {- \frac{1}{2} tr (S_{i} Ψ_{i}^{- 1})}, \end{matrix}

(16)

where ν_i = n_i - T, $S_{i} = Σ_{j = 1}^{n_{i}} Y_{i j}^{*} Y_{i j}^{*^{'}}$ , $Y_{i}^{*} = (Y_{i 1}^{*}, \dots, Y_{i n_{i}}^{*})$ , and $Y_{i j}^{*} = D_{i} (Y_{i j} - X_{1 i j} β)$ . That is, $Ψ_{i} ∣ Y_{i}^{*}, B_{i}, β$ , B_i, β has an inverse-Wishart distribution with degrees of freedom ν_i and scale parameter S_i (see the Web appendix, Section II for the proof).

Proposition 2 gives the PXCD of Ψ_i to use as the proposal density in the Metropolis-Hastings stage. In this stage, we first simulate Ψ_i from (16), and then obtain the correlation matrix R_i through the reduction function $P (Ψ_{i}) = D_{i}^{- 1} Ψ_{i} D_{i}^{- 1}$ . Second, we keep the candidate R_i with probability α_i (the acceptance rate in the Metropolis-Hastings algorithm). Sampling R_i based on this algorithm is given in the following theorem.

Theorem 1

Assume that θ_{(-R_i)} and R_i are a priori independent, that is, π(θ(_{-R_i})), R_i) = π(θ(_{-R_i})π(R_i). If we choose priors as specified in Section 3.2 for θ(_{-R_i}) and R_i, then, under transformation (14) and candidate prior (15), simulating R_i is equivalent to simulating Ψ_i first from the inverse-Wishart distribution (16) and then translating it back to R_i through $R_{i} = D_{i}^{- 1} Ψ_{i} D_{i}^{- 1}$ in (14) and accepting the candidate R_i using a Metropolis-Hastings step with some acceptance rate α_i, where $α_{i} = \min {1, \exp (\frac{a_{i}}{2} (\log ∣ R_{i} ∣ - \log ∣ R_{i}^{(k)} ∣))}$ at iteration k + 1 (see the Web appendix, Section II for the proof).

Theorem 1 provides a simple way to simulate the correlation matrix in the models proposed in this study.

4.2.3 Efficiency of the New Algorithms

Previous work (Liu 2007; Liu and Daniels 2006) has shown that the PX-RPMH algorithm is more efficient than other methods, such as the GG sampler (Ritter and Tanner 1992) and RW-MH algorithms (Chib and Greenberg 1998), for sampling a correlation matrix. Those authors compared the performance of the algorithms in detail, and the algorithms will generalize to our setting, the joint models specified in Section 3.

For sampling from the TMVN as described in Section 5.2.1, we conducted several simulations to compare our method with the Gibbs sampling technique of Robert (1995). In particular, we evaluated the mixing of Markov chain output from the two algorithms by calculating the lag-n (n = 1, 2, 3,. . .) autocorrelation of each component of Y. (The faster the decay of autocorrelation, the faster the mixing.) We denote our algorithm by LD-A and that of Robert (1995) by R-A. The decay of the autocorrelation was much faster for the LD-A algorithm than for the R-A algorithm; see the Web appendix, Section III (www.amstat.org/publications/jasa/supplemental_materials) for additional details. We next provide some summary remarks.

Remark 1

One computational inefficiency of the R-A for sampling from the TMVN distribution is that we need to repeatedly compute conditional means and variances in (12). The LD-A does not require this updating. The computational gains from this aspect are minimal, however.

Remark 2

The mixing of the chain from the R-A grows slower as the truncation region U₁ increases; however, the LD-A has the advantage of fast mixing, regardless of the volume of U₁. Specifically, when the truncation region approaches infinity (i.e., no truncation), the LD-A provides an iid sample.

Remark 3

The correlation between components of Y has little influence on the mixing of the chain from the LD-A, whereas it affects the performance of the R-A. We see this when comparing the two algorithms for various choices for Σ.

5. MODEL SELECTION AND GOODNESS OF FIT

5.1 Model Comparison

We now consider the problem of comparing alternative models. For the models in Section 3, competing models arise from restrictions on the prior for the association matrix B_i, correlation matrices R_i,11, and/or conditional covariance matrices $Σ_{i, 22}^{*}$ For example, we might assume that R_i,11 = R₁₁ and $Σ_{i, 22}^{*} = Σ_{22}^{*}$ when there are insufficient data to estimate all of the parameters. Although Bayes factors and marginal likelihoods are appealing, these approaches are very difficult to implement in a complex hierarchical model, such as the one proposed in this article (Liu 2006). As a result, we derive the deviance information criterion (DIC) (Spiegelhalter, Best, Carlin, and van der Linde 2002) to compare the alternative models. A main reason that we use the DIC is that its computation will be a ready byproduct of the MCMC simulations. But we note its drawbacks, which include lack of invariance to reparameterization and potential ambiguity in the choice of likelihood.

The DIC is defined as a classical estimate of fit plus a penalty,

DIC = D (\overset{‒}{θ}) + 2_{p_{D}} = \overset{‒}{D (θ)} + p_{D},

where $\overset{‒}{D (θ)}$ is the posterior mean of the deviance and $p_{D} = \overset{‒}{D (θ)} - D (\overset{‒}{θ})$ is the effective number of parameters in the model. Given that we have missing data from subjects dropping out, we use the observed data likelihood as the likelihood to construct the DIC (Celeux, Forbes, Robert, and Titterington 2006; Daniels and Hogan 2008). Thus $\overset{‒}{D (θ)}$ is defined as

\begin{matrix} \overset{‒}{D (θ)} = & - 2 E_{θ ∣ Q_{o b s}, W_{o b s}} (\log f (Q_{o b s}, W_{o b s} ∣ θ)) \\ + 2 \log f (Q_{o b s}, W_{o b s}) \end{matrix}

and p_D is given by

\begin{matrix} P_{D} = & - 2 E_{θ ∣ Q_{o b s}, W_{o b s}} (\log f (Q_{o b s}, W_{o b s} ∣ θ)) \\ + 2 \log f (Q_{o b s}, W_{o b s} ∣ \overset{‒}{θ}), \end{matrix}

where θ = (β, Σ), and θ‾ is the posterior mean. Details of the computation of the DIC for the joint models are given in the Web appendix, Section IV (www.amstat.org/publications/jasa/supplemental_materials).

5.2 Model Checking

The “best” model chosen from models under consideration according to the DIC still may not fit the data well. To check model fit, we use posterior predictive checks based on a discrepancy function (Gelman, Meng, and Stern 1996). The “significance” of these checks often is summarized with the posterior predictive p value (ppp) (Gelman et al. 1996); however, the ppp has the major problem that its distribution under the null tends to concentrate around .5 (Robins, Ventura, and van der Vaart 2000). As a result, we check the goodness of fit of the selected joint models based on the calibrated ppp (cppp) (Hjort, Dahl, and Steinbakk 2006), defined as

cppp (u^{o b s}) = P {ppp (u) \leq ppp (u^{o b s})},

(17)

where u = (Q’,W’)’, ppp(u^obs) is the ppp derived by Gelman, Mechelen, Verbeke, Heitjan, and Meulders (2004), and U has the distribution implied by the prior and the model. Note that u^obs corresponds to the observed vector of responses along with the data augmented responses sampled at each iteration (i.e., Q_mis and W_mis). We discuss the form of the discrepancy function in the example. Hjort et al. (2006) showed that the distribution of ppp(U) is a Uniform(0,1) and that the cppp in (17) is a proper p value. To calculate ppp(U) in (17), we need to first produce a sample of U. We can obtain U by first sampling θ from (4) and then drawing U from f (U|θ). The problem arises in sampling θ from π(θ), because in (4), we specified flat priors for β and $Σ_{i, 22}^{*}$ . We address this sampling issue next.

To approximately sample β and $Σ_{i, 22}^{*}$ from flat priors, we use the following diffuse priors in place of the improper priors for β and $Σ_{i, 22}^{*}$ As we did for the prior for b_i in Section 3.2.2, we run the independence model with b_i = 0 to obtain the posterior mean and covariance of β (denoted by μ_β and V_β) and the posterior mean of $Σ_{i, 22}^{*}$ (denoted by $μ_{Σ_{i, 22}^{*}}$ ). Then the approximate diffuse prior for β can be set as a normal with mean μ_β and covariance matrix V_β multiplied by the number of subjects (as with a unit information prior [Kass and Wasserman 1996]), and the diffuse prior for $Σ_{i, 22}^{*}$ can be set as an inverse-Wishart with degrees of freedom T and scale matrix $μ_{Σ_{i, 22}^{*}}$ . Note that to sample R_i,11 from the uniform prior, we use the algorithm of Joe (2006).

6. DATA ANALYSIS

We use the methodology described in Sections 3-5 to analyze the association of longitudinal quit status and weight change in the CTQ II clinical trial described in Section 2. We removed the observations at week 1 and 2 from the data, because the quit rates in these two weeks were very low (0 and 1.10%) due to the design of the study in which subjects were not supposed to try to quit smoking until week 3. Although participants were encouraged to make up missed sessions, there still existed intermittent missing values in quit status and/or weight gain. In addition, a large number of subjects dropped out before the end of the experiment. In what follows, we assume this missingness is ignorable.

For the mean of the two longitudinal responses as a function of covariates, we set β in (1) to be the vector of means at each time point across treatments; thus, the tth row of the design matrix is a vector of 0’s, with a 1 in the tth slot. Exploratory analysis suggested setting a₀ in (8) to be 4; this implies a slower decrease than the raw lag.

6.1 Comparing the Competing Models

We considered several models arising from restrictions on the association matrixes B_i, the prior for the association matrices B_i, the correlation matrices R_i,11 and conditional covariance matrices $Σ_{i, 22}^{*}$ . We fit a total of nine models to the CTQ II trial data. Denote the kth alternative model by M_k. Table 1 gives the details of all of the models considered.

Table 1.

Models under consideration

	Settings
Models	B_i	Prior for B_i	R_i11	$Σ_{i, 22}^{*}$
M₁	B_i = 0	-	Unconstrained	Unconstrained
M₂	B_i = 0	-	R_i,11 = R₁₁	$Σ_{i, 22}^{} = Σ_{22}^{}$
M₃	B_i = 0	-	R_i,11 = I_T×T	$Σ_{i, 22}^{} = diag (σ_{i 1}^{}, \dots, σ_{i T}^{2})$
M₄	B_i ≠ 0	Hierarchical prior	Unconstrained	Unconstrained
M₅	B_i ≠ 0	Hierarchical prior	R_i,11 = R₁₁	$Σ_{i, 22}^{} = Σ_{22}^{}$
M₆	B_i ≠ 0	Hierarchical prior	R_i,11 = I_T×T	$Σ_{i, 22}^{} = diag (σ_{i 1}^{}, \dots, σ_{i T}^{2})$
M₇	B_i ≠ 0	Normal prior^*	Unconstrained	Unconstrained
M₈	B_i ≠ 0	Normal prior^*	R_i,11 = R₁₁	$Σ_{i, 22}^{} = Σ_{22}^{}$
M₉	B_i ≠ 0	Normal prior^*	R_i,11 = I_T×T	$Σ_{i, 22}^{} = diag (σ_{i 1}^{}, \dots, σ_{i T}^{2})$

Open in a new tab

The normal prior is the fractional prior based on (8) with δ_i = 1.

For each model in Table 1, we computed the DIC using the methodology derived in Section 5. The DICs for all of the models are given in Table 2. From Table 2, we can see that the DIC is the smallest for model M₅ and the largest for model M₃. In general, the models using shrinkage priors for association matrices fit better.

Table 2.

Estimates of the DIC

	M₁	M₂	M₃	M₄	M₅	M₆	M₇	M₈	M₉
add:	2,841.5	2,843.0	3,775.8	2,835.9	2,744.6	3,591.3	2,834.5	2,749.5	3,594.1

Open in a new tab

NOTE: The best fitting model, M₅.

6.2 Checking the Goodness of Fit

For the CTQ II data, we defined the discrepancy function to be weekly quit rates or weekly average weight gains. Table 3 gives the cppp for quit rates and average weight gains at each week across treatments for model 5 (M₅). These p values were calculated based on comparison of 2,000 pairs of [ppp(u), ppp(u^obs)], with the ppp obtained using 5,000 iterations after burn-in. From this table, we can see that there are no extreme p values (<.05 or >.95). These checks suggest that the joint models (M₅) fit the mean structure of the CTQ II clinical trial data well.

Table 3.

Calibrated posterior predictive p value for quit rate (QR) and average weight gain (AWG) at each week across treatments for model 5 (M₅)

		Weeks
Treatments	Disc	1	2	3	4	5	6
Wellness	QR	.79	.69	.12	.08	.65	.27
	AWG	.23	.25	.29	.17	.72	.29
Exercise	QR	.14	.26	.29	.13	.18	.16
	AWG	.24	.16	.26	.68	.10	.18

Open in a new tab

6.3 Inference on the Quit Rates and Association

Given the results presented in Sections 6.1 and 6.2, we base inference on model M₅ (Table 1). We ran the MCMC algorithm described in Section 4.1 until convergence (determined by examining trace plots of multiple chains) and based inference on the last 10,000 iterations after burn-in.

Posterior means and 95% credible intervals (CIs) for quit rates across treatments are given in Table 4. The quit rates over time were slightly lower in the exercise treatment than in the wellness treatment. The 95% CI for the difference of quit rates between two treatments at the final week was (-.014, .150), marginally significant. This suggests that the exercise treatment does not have a positive effect on smoking cessation (more later).

Table 4.

Posterior means and 95% CIs for quit rates

	W						E
Items	1	2	3	4	5	6	1	2	3	4	5	6
QR	.40	.50	.58	.53	.53	.46	.45	.37	.44	.42	.46	.38
CIL	.34	.44	.53	.47	.47	.40	.40	.32	.38	.36	.40	.32
CIU	.45	.56	.66	.59	.58	.52	.51	.43	.49	.48	.52	.43

Open in a new tab

NOTE: W, wellness treatment; E, exercise treatment: QR, quit rate; CIL, lower bound; CIU, upper bound.

We now turn our attention to the association between smoking cessation and weight gain. The p_i (i = 1, 2) defined in (9) can be viewed as a summary measure of the overall magnitude of the association between quit status and weight gain for the two treatments. The estimates of p_i are p₁ = .26 (no exercise) and p₂= .18 (exercise); the 95% CI for their difference is (.025, .134). These results support the hypothesis that exercise weakens the association between quit status and weight gain.

This weakening also can be seen by examining the posterior means of association matrixes across treatments, as given in Table 5. We have removed from the table those elements of the B_i matrix with probabilities of the corresponding indicators being equal to 1 of <.1. The weakened associations between smoking cessation and weight gain is obvious by noting the presence of more 0’s under the exercise treatment and the larger magnitude of the (standardized) coefficients.

Table 5.

Posterior means of the association matrices with 95% CIs for each treatment

		Quit rate
	Weight	1	2	3	4	5	6
W	1	.39 (-.12, .89)	-.61 (-1.19, .02)	-	2.23 (1.40, 3.06)	-	-3.08 (-3.96, -2.20)
	2	-	-	.40 (-.25, 1.05)	-	-.75 (-1.48, -.01)	-
	3	-	-	-	-	-	-
	4	-	-	-	-.32 (-1.00, .37)	-	-
	5	-.86 (-1.43, -.31)	-	-	.95 (.14, 1.76)	-	-.50 (-1.23, .22)
	6	-	-	-	2.13 (1.04, 3.23)	-1.60 (-2.68, -.52)	-0.1 (-.80, .78)
E	1	-	-	-	-	-.67 (-1.36, .02)	-
	2	-	.75 (.18, 1.32)	-	-	-	-.90 (-1.75, -.07)
	3	-	-	-	-	-	-
	4	-	-	-	-	-	-
	5	-.88 (-1.50, -.33)	-	-	-.70 (1.33, -.07)	-	-
	6	-	-1.36 (-2.25, -.49)	.83 (-.07, 1.77)	-	-	-

Open in a new tab

NOTE: W, wellness treatment; E, exercise treatment “—” means that the corresponding element is small, defined as P (δ = 1| Q_obs, W_obs) < .1.

Table 6 shows the posterior means of pairwise correlations with 95% credible intervals; correlations whose 95% credible intervals covered 0 are excluded. We can see that smoking cessation and weight gain appear to have a lagged correlation structure, and that exercise weakens pairwise correlations. In particular, we point out the 2 × 2 blocks in the upper right corners of pairwise correlation matrixes under both treatments (in bold-type). For the wellness treatment, the four pairwise correlations between weight gain at the beginning of the study and quit status at weeks 5 and 6 are all negative. This means that people who gain weight early in the trial are more likely to be smoking at the end. The corresponding correlations are essentially 0’s (no longer significant) under the exercise arm. In addition, looking at the last row in Table 6 for the wellness arm, the correlations indicate those who quit early in the study are more likely to gain weight by the end (week 6); the corresponding relationship in the exercise arm is weaker.

Table 6.

Posterior means of pairwise correlations with 95% CIs for each treatment

		Quit rates
	Weight	1	2	3	4	5	6
W	1	-	-	-.14 (-.25, -.02)	-	-.24 (-.35, -.13)	-.30 (-.41, -.19)
	2	-	-	-	-.12 (-.24, 0)	-.16 (-.28, -.04)	-.17 (-.29, -0.5)
	3	-	-	-	-	-	-
	4	-	-	-	-	-	-
	5	-	-	-	-	-	-
	6	.13 (.01, .24)	.12 (.01, .23)	.15 (.03, .26)	.18 (.05, .30)	-	-
E	1	-	-	-	-	-	-
	2	-	-	-	-	-	-.12 (-.24, 0)
	3	-	-	-	-	-	-
	4	-	-	-	-	-	-
	5	-	-	-	-	-	-
	6	.11 (0, .22)	-	.13 (0.03, .27)	-	.12 (.01, .23)	.14 (.02, .25)

Open in a new tab

NOTE: W, wellness treatment; E, exercise treatment. “—” means that the corresponding element is not significant (95% CI covers 0).

7. CONCLUSIONS AND DISCUSSION

We have developed joint models for the association of longitudinal binary and continuous processes and applied them to the analysis of the CTQ II clinical trial to gain insight into the joint evolution of smoking status and weight gain. The results show that moderate-intensity exercise was not successful for smoking cessation, but that it did appear to weaken the association between smoking status and weight gain, supporting the hypothesis that exercise has an effect on smoking cessation by weakening the association between quitting smoking and gaining weight.

But we should be cautious in overinterpreting these results, because of the low compliance in the exercise arm. We might expect the low compliance to negatively bias the smoking cessation results (probability of quitting was too low) on the exercise arm; that is, the intention-to-treat effect (randomization to the exercise arm) reported here might be expected to be quite different from the causal effect (adherence to the exercise regimen). But the ability to still see the weakened association between weight gain and smoking on the exercise arm despite the low compliance supports this as the mechanism of action for exercise as a therapy for smoking cessation. In future work, we will extend these joint models to estimate causal effects and allow for nonignorable dropout.

An alternative way to factor model (1) is to decompose it into a marginal model for W_ij and a conditional probit regression model for Y_ij, conditioning on W_ij. This factorization could potentially greatly increase the computational burden in sampling the correlation matrix if the diagonal elements of Σ_i,11 were still fixed at 1 (due to not being identified); however, this computational problem could be avoided by fixing the diagonal elements of the conditional covariance matrix $Σ_{i, 11}^{*} = Σ_{i, 11} - Σ_{i, 12} Σ_{i, 22}^{- 1} Σ_{i, 21}$ to be l’s (i.e., make this matrix a correlation matrix). For interpretation of the mean parameters, β, this computationally simpler approach would require adjusting the β components corresponding to the longitudinal binary process with the diagonal elements of the marginal covariance matrix Σ_i,11, which in this case would have nonidentical diagonal elements.

The general methodology proposed here can be applied to analysis of other data sets where there are two processes and the question of interest is the association between the two processes. The methodology also can be directly extended to other longitudinal cases, such as modeling the association between longitudinal ordinal and continuous processes or between two continuous processes (Liu 2006).

Supplementary Material

web appendix

NIHMS143238-supplement-web_appendix.pdf^{(322.5KB, pdf)}

Contributor Information

Xuefeng Liu, Department of Biostatistics and Epidemiology, East Tennessee State University, Johnson City, TN 37614.

Michael J. Daniels, Department of Statistics, University of Florida, Gainesville, FL 32611.

Bess Marcus, Department of Psychiatry & Human Behavior, Brown University, Providence, RI 02912.

REFERENCES

Barnard J, McCulloch R, Meng XL. Modeling Covariance Matrices in Terms of Standard Deviations and Correlations With Application to Shrinkage. Statistica Sinica. 2000;10:1281–1311. [Google Scholar]
Bartlett MS. On the Theory of Statistical Regression. Proceedings of the Royal Society of Edinburgh. 1933;53:260–283. [Google Scholar]
Carlin BP, Chib S. Bayesian Model Choice via Markov Chain Monte Carlo Methods. Journal of the Royal Statistical Society, Ser. B. 1995;57:473–484. [Google Scholar]
Catalano PJ, Ryan LM. Bivariate Latent Variable Models for Clustered Discrete and Continuous Outcomes. Journal of the American Statistical Association. 1992;87:651–658. [Google Scholar]
Celeux G, Forbes F, Robert CP, Titterington DM. “Deviance Information Criteria for Missing Data Models” (with discussion) Bayesian Analysis. 2006;1:651–706. [Google Scholar]
Chen M-H, Shao Q-M. Properties of Prior and Posterior Distributions for Multivariate Categorical Response Data Models. Journal of Multivariate Analysis. 1999;71:277–296. [Google Scholar]
Chib S, Greenberg E. Bayesian Analysis of Multivariate Probit Models. Biometrika. 1998;85:347–361. [Google Scholar]
Chipman H. Bayesian Variable Selection With Related Predictors. Canadian Journal of Statistics. 1996;24:17–36. [Google Scholar]
Cox DR, Wermuth N. Response Models for Mixed Binary and Quantitative Variables. Biometrika. 1992;79:441–461. [Google Scholar]
Daniels MJ. Bayesian Modelling of Several Covariance Matrices and Some Results on Propriety of the Posterior for Linear Regression With Correlated and/or Heterogeneous Errors. Journal of Multivariate Analysis. 2006;97:1185–1207. [Google Scholar]
Daniels MJ, Hogan JW. Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis. Chapman & Hall/CRC Press; Boca Ration, FL: 2008. [Google Scholar]
Daniels MJ, Kass RE. Shrinkage Estimators for Covariance Matrices. Biometrics. 2001;57:1173–1184. doi: 10.1111/j.0006-341x.2001.01173.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Daniels MJ, Normand SL. Longitudinal Profiling of Health Care Units Based on Continuous and Discrete Patient Outcomes. Biostatistics. 2006;7:1–15. doi: 10.1093/biostatistics/kxi036. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dunson DB. Bayesian Latent Variable Models for Clustered Mixed Outcomes. Journal of the Royal Statistical Society, Ser. B. 2000;62:355–366. [Google Scholar]
Dunson DB. Dynamic Latent Trait Models for Multidimensional Longitudinal Data. Journal of the American Statistical Association. 2003;98:555–563. [Google Scholar]
Fitzmaurice GM, Laird NM. Regression Models for a Bivariate Discrete and Continuous Outcome With Clustering. Journal of the American Statistical Association. 1995;90:845–852. [Google Scholar]
Gelman A, Mechelen IV, Verbeke G, Heitjan DF, Meulders M. Multiple Imputation for Model Checking: Completed-Data Plots With Missing and Latent Data. Biometrics. 2004;61:74–85. doi: 10.1111/j.0006-341X.2005.031010.x. [DOI] [PubMed] [Google Scholar]
Gelman A, Meng X-L, Stern H. “Posterior Predictive Assessment of Model Fitness via Realized Discrepancies” (with discussion) Statistica Sinica. 1996;6:733–807. [Google Scholar]
George EI, McCulloch RE. Variable Selection via Gibbs Sampling. Journal of the American Statistical Association. 1993;88:881–889. [Google Scholar]
George EI, McCulloch RE. Approaches for Bayesian Variable Selection. Statistica Sinica. 1997;7:339–373. [Google Scholar]
Geweke J. Efficient Simulation From the Multivariate Normal and Student t-Distributions Subject to Linear Constraints; Computer Sciences and Statistics, Proceedings of the 23rd Symposium Interface; 1991.pp. 571–578. [Google Scholar]
Gueorguieva RV, Agresti A. A Correlated Probit Model for Joint Modelling of Clustered Binary and Continuous Responses. Journal of the American Statistical Association. 2001;96:1102–1112. [Google Scholar]
Gueorguieva RV, Sanacora G. Joint Analysis of Repeatedly Observed Continuous and Ordinal Measures of Disease Severity. Statistics in Medicine. 2006;25:1307–1322. doi: 10.1002/sim.2270. [DOI] [PubMed] [Google Scholar]
Hjort NL, Dahl FA, Steinbakk GH. Post-Processing Posterior Predictive p Values. Journal of the American Statistical Association. 2006;101:1157–1174. [Google Scholar]
Hoeting J, Raftery AE, Madigan D. A Method for Simultaneous Variable Selection and Outlier Identification in Linear Regression. Computational Statistics and Data Analysis. 1996;22:251–270. [Google Scholar]
Joe H. Generating Random Correlation Matrices Based on Partial Correlations. Journal of Multivariate Analysis. 2006;97:2177–2189. [Google Scholar]
Kass RE, Wasserman L. The Selection of Prior Distributions by Formal Rules. Journal of the American Statistical Association. 1996;91:1343–1370. Corr. (1998), 93, 412. [Google Scholar]
Liu C, Rubin DB, Wu YN. Parameter Expansion to Accelerate EM: The PX-EM Algorithm. Biometrika. 1998;85:755–770. [Google Scholar]
Liu JS, Wu Y. Parameter Expansion for Data Augmentation. Journal of the American Statistical Association. 1999;94:1264–1274. [Google Scholar]
Liu XF. unpublished doctoral dissertation. University of Florida, Dept. of Statistics; 2006. Bayesian Methodology for Models With Multivariate (Longitudinal) Outcomes. [Google Scholar]
Liu XF. Parameter Expansion for Sampling a Correlation Matrix: An Efficient GPX-RPMH Algorithm. Journal of Statistical Computation and Simulation. 2008;78:1065–1076. [Google Scholar]
Liu XF, Daniels MJ. A New Algorithm for Simulating a Correlation Matrix Based on Parameter Expansion and Re-Parameterization. Journal of Computational and Graphical Statistics. 2006;16:897–914. [Google Scholar]
Marcus BH, Albrecht AE, King TK, Parisi AF, Pinto BM, Roberts M, Niaura RS, Abrams DB. “The Efficacy of Exercise as an Aid for Smoking Cessation in Women. Archives of Internal Medicine. 1999;36:479–492. doi: 10.1001/archinte.159.11.1229. [DOI] [PubMed] [Google Scholar]
Marcus BH, King TK, Albrecht AE, Parisi AF, Abrams D. Rationale, Design, and Baseline Data for Commit to Quit: An Exercise Efficacy Trial for Smoking Cessation Among Women. Preventive Medicine. 1997;26:586–597. doi: 10.1006/pmed.1997.0180. [DOI] [PubMed] [Google Scholar]
Marcus BH, Lewis BA, Hogan J, King TK, Albrecht AE, Bock B, Parisi AF, Niaura R, Abrams DB. The Efficacy of Moderate-Intensity Exercise as an Aid for Smoking Cessation in Women: A Randomized Controlled Trial. Nicotine & Tobacco Research. 2005;7:871–880. doi: 10.1080/14622200500266056. [DOI] [PubMed] [Google Scholar]
Marcus BH, Lewis BA, King TK, Albrecht AE, Hogan J, Bock B, Parisi AF, Abrams DB. Rationale, Design, and Baseline Data for Commit to Quit II: An Evaluation of the Efficacy of Moderate-Intensity Physical Activity as an Aid to Smoking Cessation in Women. Preventive Medicine. 2003;36:479–492. doi: 10.1016/s0091-7435(02)00051-8. [DOI] [PubMed] [Google Scholar]
Pourahmadi M, Daniels MJ. Dynamic Conditionally Linear Mixed Models for Longitudinal Data. Biometrics. 2002;58:225–231. doi: 10.1111/j.0006-341x.2002.00225.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Regan MM, Catalano PJ. Likelihood Models for Clustered Binary and Continuous Outcomes: Application to Developmental Toxicology. Biometrics. 1999;55:760–768. doi: 10.1111/j.0006-341x.1999.00760.x. [DOI] [PubMed] [Google Scholar]
Ritter C, Tanner MA. Facilitating the Gibbs Sampler: The Gibbs Stopper and the Griddy-Gibbs Sampler. Journal of the American Statistical Association. 1992;87:861–868. [Google Scholar]
Robert CP. Simulation of Truncated Normal Variables. Statistics and Computing. 1995;5:121–125. [Google Scholar]
Robins JM, Ventura V, van der Vaart A. Asymptotic Distribution of P Values in Composite Null Models. Journal of the American Statistical Association. 2000;95:1143–1156. [Google Scholar]
Roy J, Lin X. Latent Variable Models for Longitudinal Data With Multiple Continuous Outcomes. Biometrics. 2000;56:1047–1054. doi: 10.1111/j.0006-341x.2000.01047.x. [DOI] [PubMed] [Google Scholar]
Sammel MD, Ryan LM, Legler JM. Latent Variable Models for Mixed Discrete and Continuous Outcomes. Journal of the Royal Statistical Society, Ser. B. 1997;59:667–678. [Google Scholar]
Smith M, Kohn R. Parsimonious Covariance Matrix Estimation for Longitudinal Data. Journal of the American Statistical Association. 2002;97:1141–1153. [Google Scholar]
Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A. Bayesian Measures of Model Complexity and Fit” (with discussion) Journal of the Royal Statistical Society, Ser. B. 2002;64:583–639. [Google Scholar]
Tanner MA, Wong WH. “The Calculation of Posterior Distribution by Data Augmentation” (with discussion) Journal of the American Statistical Association. 1987;82:528–550. [Google Scholar]
van Dyk DA, Meng XL. The Art of Data Augmentation” (with discussion) Journal of Computational and Graphical Statistics. 2001;10:1–111. [Google Scholar]
Wakefield JC, Bennett JE. The Bayesian Modelling of Covariates for Population Pharmacokinetic Models. Journal of the American Statistical Association. 1996;91:917–927. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

web appendix

NIHMS143238-supplement-web_appendix.pdf^{(322.5KB, pdf)}

[R1] Barnard J, McCulloch R, Meng XL. Modeling Covariance Matrices in Terms of Standard Deviations and Correlations With Application to Shrinkage. Statistica Sinica. 2000;10:1281–1311. [Google Scholar]

[R2] Bartlett MS. On the Theory of Statistical Regression. Proceedings of the Royal Society of Edinburgh. 1933;53:260–283. [Google Scholar]

[R3] Carlin BP, Chib S. Bayesian Model Choice via Markov Chain Monte Carlo Methods. Journal of the Royal Statistical Society, Ser. B. 1995;57:473–484. [Google Scholar]

[R4] Catalano PJ, Ryan LM. Bivariate Latent Variable Models for Clustered Discrete and Continuous Outcomes. Journal of the American Statistical Association. 1992;87:651–658. [Google Scholar]

[R5] Celeux G, Forbes F, Robert CP, Titterington DM. “Deviance Information Criteria for Missing Data Models” (with discussion) Bayesian Analysis. 2006;1:651–706. [Google Scholar]

[R6] Chen M-H, Shao Q-M. Properties of Prior and Posterior Distributions for Multivariate Categorical Response Data Models. Journal of Multivariate Analysis. 1999;71:277–296. [Google Scholar]

[R7] Chib S, Greenberg E. Bayesian Analysis of Multivariate Probit Models. Biometrika. 1998;85:347–361. [Google Scholar]

[R8] Chipman H. Bayesian Variable Selection With Related Predictors. Canadian Journal of Statistics. 1996;24:17–36. [Google Scholar]

[R9] Cox DR, Wermuth N. Response Models for Mixed Binary and Quantitative Variables. Biometrika. 1992;79:441–461. [Google Scholar]

[R10] Daniels MJ. Bayesian Modelling of Several Covariance Matrices and Some Results on Propriety of the Posterior for Linear Regression With Correlated and/or Heterogeneous Errors. Journal of Multivariate Analysis. 2006;97:1185–1207. [Google Scholar]

[R11] Daniels MJ, Hogan JW. Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis. Chapman & Hall/CRC Press; Boca Ration, FL: 2008. [Google Scholar]

[R12] Daniels MJ, Kass RE. Shrinkage Estimators for Covariance Matrices. Biometrics. 2001;57:1173–1184. doi: 10.1111/j.0006-341x.2001.01173.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] Daniels MJ, Normand SL. Longitudinal Profiling of Health Care Units Based on Continuous and Discrete Patient Outcomes. Biostatistics. 2006;7:1–15. doi: 10.1093/biostatistics/kxi036. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Dunson DB. Bayesian Latent Variable Models for Clustered Mixed Outcomes. Journal of the Royal Statistical Society, Ser. B. 2000;62:355–366. [Google Scholar]

[R15] Dunson DB. Dynamic Latent Trait Models for Multidimensional Longitudinal Data. Journal of the American Statistical Association. 2003;98:555–563. [Google Scholar]

[R16] Fitzmaurice GM, Laird NM. Regression Models for a Bivariate Discrete and Continuous Outcome With Clustering. Journal of the American Statistical Association. 1995;90:845–852. [Google Scholar]

[R17] Gelman A, Mechelen IV, Verbeke G, Heitjan DF, Meulders M. Multiple Imputation for Model Checking: Completed-Data Plots With Missing and Latent Data. Biometrics. 2004;61:74–85. doi: 10.1111/j.0006-341X.2005.031010.x. [DOI] [PubMed] [Google Scholar]

[R18] Gelman A, Meng X-L, Stern H. “Posterior Predictive Assessment of Model Fitness via Realized Discrepancies” (with discussion) Statistica Sinica. 1996;6:733–807. [Google Scholar]

[R19] George EI, McCulloch RE. Variable Selection via Gibbs Sampling. Journal of the American Statistical Association. 1993;88:881–889. [Google Scholar]

[R20] George EI, McCulloch RE. Approaches for Bayesian Variable Selection. Statistica Sinica. 1997;7:339–373. [Google Scholar]

[R21] Geweke J. Efficient Simulation From the Multivariate Normal and Student t-Distributions Subject to Linear Constraints; Computer Sciences and Statistics, Proceedings of the 23rd Symposium Interface; 1991.pp. 571–578. [Google Scholar]

[R22] Gueorguieva RV, Agresti A. A Correlated Probit Model for Joint Modelling of Clustered Binary and Continuous Responses. Journal of the American Statistical Association. 2001;96:1102–1112. [Google Scholar]

[R23] Gueorguieva RV, Sanacora G. Joint Analysis of Repeatedly Observed Continuous and Ordinal Measures of Disease Severity. Statistics in Medicine. 2006;25:1307–1322. doi: 10.1002/sim.2270. [DOI] [PubMed] [Google Scholar]

[R24] Hjort NL, Dahl FA, Steinbakk GH. Post-Processing Posterior Predictive p Values. Journal of the American Statistical Association. 2006;101:1157–1174. [Google Scholar]

[R25] Hoeting J, Raftery AE, Madigan D. A Method for Simultaneous Variable Selection and Outlier Identification in Linear Regression. Computational Statistics and Data Analysis. 1996;22:251–270. [Google Scholar]

[R26] Joe H. Generating Random Correlation Matrices Based on Partial Correlations. Journal of Multivariate Analysis. 2006;97:2177–2189. [Google Scholar]

[R27] Kass RE, Wasserman L. The Selection of Prior Distributions by Formal Rules. Journal of the American Statistical Association. 1996;91:1343–1370. Corr. (1998), 93, 412. [Google Scholar]

[R28] Liu C, Rubin DB, Wu YN. Parameter Expansion to Accelerate EM: The PX-EM Algorithm. Biometrika. 1998;85:755–770. [Google Scholar]

[R29] Liu JS, Wu Y. Parameter Expansion for Data Augmentation. Journal of the American Statistical Association. 1999;94:1264–1274. [Google Scholar]

[R30] Liu XF. unpublished doctoral dissertation. University of Florida, Dept. of Statistics; 2006. Bayesian Methodology for Models With Multivariate (Longitudinal) Outcomes. [Google Scholar]

[R31] Liu XF. Parameter Expansion for Sampling a Correlation Matrix: An Efficient GPX-RPMH Algorithm. Journal of Statistical Computation and Simulation. 2008;78:1065–1076. [Google Scholar]

[R32] Liu XF, Daniels MJ. A New Algorithm for Simulating a Correlation Matrix Based on Parameter Expansion and Re-Parameterization. Journal of Computational and Graphical Statistics. 2006;16:897–914. [Google Scholar]

[R33] Marcus BH, Albrecht AE, King TK, Parisi AF, Pinto BM, Roberts M, Niaura RS, Abrams DB. “The Efficacy of Exercise as an Aid for Smoking Cessation in Women. Archives of Internal Medicine. 1999;36:479–492. doi: 10.1001/archinte.159.11.1229. [DOI] [PubMed] [Google Scholar]

[R34] Marcus BH, King TK, Albrecht AE, Parisi AF, Abrams D. Rationale, Design, and Baseline Data for Commit to Quit: An Exercise Efficacy Trial for Smoking Cessation Among Women. Preventive Medicine. 1997;26:586–597. doi: 10.1006/pmed.1997.0180. [DOI] [PubMed] [Google Scholar]

[R35] Marcus BH, Lewis BA, Hogan J, King TK, Albrecht AE, Bock B, Parisi AF, Niaura R, Abrams DB. The Efficacy of Moderate-Intensity Exercise as an Aid for Smoking Cessation in Women: A Randomized Controlled Trial. Nicotine & Tobacco Research. 2005;7:871–880. doi: 10.1080/14622200500266056. [DOI] [PubMed] [Google Scholar]

[R36] Marcus BH, Lewis BA, King TK, Albrecht AE, Hogan J, Bock B, Parisi AF, Abrams DB. Rationale, Design, and Baseline Data for Commit to Quit II: An Evaluation of the Efficacy of Moderate-Intensity Physical Activity as an Aid to Smoking Cessation in Women. Preventive Medicine. 2003;36:479–492. doi: 10.1016/s0091-7435(02)00051-8. [DOI] [PubMed] [Google Scholar]

[R37] Pourahmadi M, Daniels MJ. Dynamic Conditionally Linear Mixed Models for Longitudinal Data. Biometrics. 2002;58:225–231. doi: 10.1111/j.0006-341x.2002.00225.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] Regan MM, Catalano PJ. Likelihood Models for Clustered Binary and Continuous Outcomes: Application to Developmental Toxicology. Biometrics. 1999;55:760–768. doi: 10.1111/j.0006-341x.1999.00760.x. [DOI] [PubMed] [Google Scholar]

[R39] Ritter C, Tanner MA. Facilitating the Gibbs Sampler: The Gibbs Stopper and the Griddy-Gibbs Sampler. Journal of the American Statistical Association. 1992;87:861–868. [Google Scholar]

[R40] Robert CP. Simulation of Truncated Normal Variables. Statistics and Computing. 1995;5:121–125. [Google Scholar]

[R41] Robins JM, Ventura V, van der Vaart A. Asymptotic Distribution of P Values in Composite Null Models. Journal of the American Statistical Association. 2000;95:1143–1156. [Google Scholar]

[R42] Roy J, Lin X. Latent Variable Models for Longitudinal Data With Multiple Continuous Outcomes. Biometrics. 2000;56:1047–1054. doi: 10.1111/j.0006-341x.2000.01047.x. [DOI] [PubMed] [Google Scholar]

[R43] Sammel MD, Ryan LM, Legler JM. Latent Variable Models for Mixed Discrete and Continuous Outcomes. Journal of the Royal Statistical Society, Ser. B. 1997;59:667–678. [Google Scholar]

[R44] Smith M, Kohn R. Parsimonious Covariance Matrix Estimation for Longitudinal Data. Journal of the American Statistical Association. 2002;97:1141–1153. [Google Scholar]

[R45] Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A. Bayesian Measures of Model Complexity and Fit” (with discussion) Journal of the Royal Statistical Society, Ser. B. 2002;64:583–639. [Google Scholar]

[R46] Tanner MA, Wong WH. “The Calculation of Posterior Distribution by Data Augmentation” (with discussion) Journal of the American Statistical Association. 1987;82:528–550. [Google Scholar]

[R47] van Dyk DA, Meng XL. The Art of Data Augmentation” (with discussion) Journal of Computational and Graphical Statistics. 2001;10:1–111. [Google Scholar]

[R48] Wakefield JC, Bennett JE. The Bayesian Modelling of Covariates for Population Pharmacokinetic Models. Journal of the American Statistical Association. 1996;91:917–927. [Google Scholar]

PERMALINK

Joint Models for the Association of Longitudinal Binary and Continuous Processes With Application to a Smoking Cessation Trial

Xuefeng Liu

Michael J Daniels

Bess Marcus

Roles

Abstract

1. INTRODUCTION

2. APPLICATION: SMOKING CESSATION TRIAL

3. JOINT MODELS AND PRIORS

3.1 Joint Models

3.2 Priors for Parameters in Joint Models

3.2.1 Prior for the Elements of the Association Matrix

3.2.2 Using the Prior for the Association Matrix in Practice

3.2.3 Propriety of the Joint Posterior Distribution

4. POSTERIOR SAMPLING

4.1 The MCMC Sampling Algorithm

4.2 Technical Issues

4.2.1 Imputing the Latent Data

Proposition 1

4.2.2 Sampling the Correlation Matrix

Proposition 2

Theorem 1

4.2.3 Efficiency of the New Algorithms

Remark 1

Remark 2

Remark 3

5. MODEL SELECTION AND GOODNESS OF FIT

5.1 Model Comparison

5.2 Model Checking

6. DATA ANALYSIS

6.1 Comparing the Competing Models

Table 1.

Table 2.

6.2 Checking the Goodness of Fit

Table 3.

6.3 Inference on the Quit Rates and Association

Table 4.

Table 5.

Table 6.

7. CONCLUSIONS AND DISCUSSION

Supplementary Material

Contributor Information

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases