Joint modeling of longitudinal zero-inflated count and time-to-event data: A Bayesian perspective

Huirong Zhu; Stacia M DeSantis; Sheng Luo

doi:10.1177/0962280216659312

. Author manuscript; available in PMC: 2018 Apr 1.

Published in final edited form as: Stat Methods Med Res. 2016 Jul 26;27(4):1258–1270. doi: 10.1177/0962280216659312

Joint modeling of longitudinal zero-inflated count and time-to-event data: A Bayesian perspective

Huirong Zhu ^*, Stacia M DeSantis ^†, Sheng Luo ^‡

PMCID: PMC5269555 NIHMSID: NIHMS818171 PMID: 27460540

Abstract

Longitudinal zero-inflated count data are encountered frequently in substance-use research when assessing the effects of covariates and risk factors on outcomes. Often, both the time to a terminal event such as death or dropout and repeated measure count responses are collected for each subject. In this setting, the longitudinal counts are censored by the terminal event, and the time to the terminal event may depend on the longitudinal outcomes. In the study described herein, we expand the class of joint models for longitudinal and survival data to accommodate zero-inflated counts and time-to-event data by using a Cox proportional hazards model with piecewise constant baseline hazard. We use a Bayesian framework via Markov chain Monte Carlo simulations implemented in the BUGS programming language. Via an extensive simulation study, we apply the joint model and obtain estimates that are more accurate than those of the corresponding independence model. We apply the proposed method to an alpha-tocopherol, beta-carotene lung cancer prevention study.

Keywords: Joint model, Zero-inflation generalized Poisson, Count data, Markov chain Monte Carlo, Mixed model

1 Introduction

Longitudinal-count-outcome data with excess zeros are encountered frequently in clinical and public health studies. Examples include skin cancer counts,¹ blood product units transfused after surgery,² the number of complications experienced after a procedure,³ and alcohol or tobacco consumption.^4,5 Counts provide useful information when observing the natural history of the target disease and/or evaluating the long-term effects of interventions against the disease. Zero-inflated count models [e.g., the zero-inflated Poisson (ZIP),⁶ zero-inflated negative binomial (ZINB)⁷] are commonly used to analyze such data; these distinguish between sources of zeros by using a two-part mixture of a point mass at zero and a count distribution. Compared with the ZIP model, the ZINB model has an additional parameter that captures variability due to over-dispersion.⁸

Longitudinal studies, however, are quite prone to incomplete records owing to terminal events such as death and dropout during the study.⁹ Importantly, longitudinal measurements can indicate such terminal events, which then censor the sequence of repeated measures. For example, in an alcohol- or tobacco-consumption study, the dropout process may be significantly informative of the drinking or smoking process because heavy drinkers or smokers are more likely to drop out of the trial early. Thus, heavy substance consumption is expected to provide information about the dropout time. Censoring the longitudinal measures due to the dropout events also may be informative because dropouts may exhibit heavier consumption during the study. In the literature, a dependent terminal event is often referred to as “dependent censoring” or “informative censoring,” and ignoring it has been shown to cause biased estimation and even invalid inference.^10,11 However, a joint analysis of these two processes can aid simultaneous investigations of the effect of covariates on both the longitudinal and survival processes.¹² When informative censoring is present, incorporating traditional zero-inflated count models into the class of joint models with time-to-event data can provide accurate inference by considering the existing informative censoring feature contained in the data.

The statistical literature on joint longitudinal and survival models for many different types of longitudinal outcomes has expanded over the past decade.^13–15 Joint modeling is a flexible dynamic tool used to capture the relationship between longitudinal predictors and survival time. It makes estimation more efficient via simultaneous estimation of shared and unshared parameters that characterize both the longitudinal and survival processes. Tsiatis and Davidian¹³ and Wu et al.¹⁴ have published thorough reviews on the topic. Specifically, subject-specific random effects are typically used to link two submodels for the longitudinal and event-time data; such submodels usually include a mixed-effects model for longitudinal data and a semiparametric Cox or accelerated failure time model for event-time data.^16,17 Other authors have proposed extensions of the joint model, such as including non-Gaussian random effects,¹⁸ assuming a latent stochastic Gaussian process instead of a random effects model,¹⁹ and incorporating the multidimensional longitudinal and/or survival components into the joint model framework.^20,21

The vast majority of research in the joint modeling field assumes continuous or multivariate continuous outcomes in the longitudinal submodel. Recently, Rizopoulos proposed a two-part shared parameter framework to illustrate the association between a dichotomous variable and the time to renal graft failure.²² Hatfield et al.¹² developed a zero-augmented joint model for longitudinal patient-reported proportional outcomes with many zeros and survival events. However, our study is concerned with the impact of dependent terminal events on a longitudinal vector of counts subject to zero inflation, and how those characteristics influence data-inferential procedures.

As described herein, we propose a comprehensive joint model to simultaneously accommodate the association among zero-inflated longitudinal counts with dispersion and the presence of informative censoring. We construct the zero-inflated count submodel by using a mixture of a degenerate point mass at zero and a count distribution such as the Poisson or generalized Poisson distribution.²³ We use the Cox proportional hazards model with piecewise constant baseline hazard to model the corresponding time-to-event data and link the submodels via two shared random effects. In addition, we perform simulation and data analysis by using a Bayesian framework to allow for full and exact posterior inference for all parameters. We specify noninformative prior distributions for all parameters and assess sensitivity to these priors.

The paper is organized as follows: In Section 2 we present the motivation from the alpha-tocopherol, beta-carotene (ATBC) lung cancer prevention study. In Section 3, we describe the proposed joint model structure. In Section 4, we report a simulation study that assesses the performance of the proposed approach and compare its performance with that of the corresponding independence approach. In Section 5, we apply competing models to the ATBC study. Finally, in Section 6, we conclude with remarks and an outline of future work.

2 Motivating Data

The methodological development for this study is motivated by the longitudinal ATBC study, which was sponsored by the National Cancer Institute. We refer the reader to the original report from the ATBC Study Group (2003)²⁴ for more details. Briefly, 29 133 eligible smoking-dependent males aged 50 to 69 years were included under the randomized primary prevention study to determine whether alpha-tocopherol or beta-carotene reduces the incidence of lung cancer. All individuals in the study smoked at least five cigarettes per day at baseline. Extensive medical histories and examination data for the participants were collected at baseline, and the participants were observed for five to eight years with scheduled follow-up visits every four months. In each follow-up visit, participants were queried about their health and smoking status since their last visit. Dropout events were also collected during the study. In this paper, we define the response variable as the average daily cigarette consumption over the last four months; it has excessive zeros compared to the Poisson distribution with a comparable mean. The terminal event under consideration is the dropout rate since study onset (24% dropout rate in the ATBC study).

In the current study, we are interested in how risk factors, e.g., age, years of smoking, and baseline smoking consumption, influence the severity of smoking addiction and the probability of smoking abstinence accounting for all relevant correlation and subject-specific random effects.

In the preliminary analysis, we observe that the cigarette counts are associated with the dropout time. To demonstrate this correlation, we split the dataset by the observed mean of 15 cigarettes per day. The 2 Kaplan–Meier curves for the mean-split data in Figure 1 demonstrate a significant difference in time to dropout between the two consumption groups (log-rank test p-value < 0.001), whereby high cigarette consumption increases the likelihood of dropout. The association between cigarette counts and dropout events requires simultaneous consideration of these two correlated processes.

The Kaplan–Meier curves demonstrate a difference in time to dropout (log-rank test p-value < 0.001) for participants in high versus low consumption groups (≤ 15, solid line; >15, dashed line).

3 Methods and estimation

3.1 Zero-inflated longitudinal submodel

3.1.1 Zero-inflated Poisson model

Assuming independence between subjects, let Y_ij denote the longitudinal response (average daily cigarette consumption) for subject i = 1, . . . , n at time j for j = 1, . . . , J_i. The distribution of Y_ij is

Y_{i j} ~ {\begin{cases} 0 & with probability ϕ_{i j} \\ Poisson (λ_{i j}) & with probability 1 - ϕ_{i j}, \end{cases}

where ϕ_ij denotes the probability of the observation arising from the degenerate distribution at zero and λ_ij represents the mean of the Poisson distribution. The probability distribution function is

\begin{array}{l} P (Y_{i j} = 0 ∣ Z_{i j}, X_{i j}) & = & ϕ_{i j} + (1 - ϕ_{i j}) e^{- λ_{i j}}, \\ (Y_{i j} = y_{i j} ∣ X_{i j}) & = & (1 - ϕ_{i j}) \frac{λ_{i j}^{y_{i j}} e^{- λ_{i j}}}{y_{i j}!} if y_{i j} = 1, 2, 3, \dots, \end{array}

where 0 ≤ ϕ_ij ≤ 1 and 0 < λ_ij < ∞. This formulation incorporates more zeros than permitted under the Poisson assumption (i.e., where ϕ_ij = 0). The vector Z_ij is the set of covariates predicting the probability of abstaining from smoking and X_ij is the set of covariates predicting the “intensity” of smoking. The vectors X_ij and Z_ij may overlap.

3.1.2 Zero-inflated generalized Poisson model

The flexible zero-inflated generalized Poisson (ZIGP) model is used to adjust for dispersion in an otherwise ZIP model with an additional dispersion parameter ω, which is positive for over-dispersed and negative for under-dispersed data. Let Y_ij denote the average daily cigarette consumption for subjects i = 1, . . . , n and at times j = 1, . . . , J_i. Then Y_ij has the following distribution:

Y_{i j} = {\begin{cases} 0 & w.p. ϕ_{i j} \\ general Poisson (λ_{i j}, ω) & w.p. 1 - ϕ_{i j}, \end{cases}

where λ_ij and the zero-inflation parameter ϕ_ij are the same as described in Section 3.1.1. Typically, the range of ω is $max {- 1, - \frac{λ_{i j}}{N}} \leq ω < 1$ corresponding to counts y_ij = 0, 1, 2, . . . , N, where ω = 0 reduces to the ZIP model. Note that a positive count with N = ∞ leads to ω ∈ [0, 1).²⁵

The probability distribution of the longitudinal ZIGP model is

\begin{array}{l} P (Y_{i j} = 0 ∣ Z_{i j}) & = & ϕ_{i j} + (1 - ϕ_{i j}) e^{- (1 - ω) λ_{i j}}, \\ P (Y_{i j} = y_{i j} ∣ X_{i j}) & = & (1 - ϕ_{i j}) (1 - ω) λ_{i j} \frac{{[(1 - ω) λ_{i j} + ω y_{i j}]}^{(y_{i j} - 1)}}{y_{i j}!} e^{- [(1 - ω) λ_{i j} + ω y_{i j}]} if y_{i j} = 1, 2, 3, \dots, \end{array}

where 0 ≤ ϕ_ij ≤ 1 and λ_ij > 0.

For both the ZIP and ZIGP distributions, logit(ϕ_ij) and log(λ_ij) are the natural link functions for the Bernoulli probability of success (abstinence) and the mean of the Poisson part, respectively. To account for within-subject dependence in the counting process, we let

logit (ϕ_{i j}) = α Z_{i j} + u_{i},

(1)

log (λ_{i j}) = β X_{i j} + v_{i},

(2)

where α and β are coefficient row vectors, Z_ij and X_ij are covariate vectors, and u_i and v_i are subject-specific random effects accounting for within-subject correlation in each model part.

Owing to the flexibility and straightforward mathematical properties of the Gaussian distribution, we assume that the random effects are independent and identically distributed bivariate normal random vectors for all subjects i = 1, ..., n, i.e.,

(\begin{matrix} u_{i} \\ v_{i} \end{matrix}) ~ N (0, \sum_{u, v}),

where

\sum_{u, v} = [\begin{matrix} σ_{u}^{2} & ρ σ_{u} σ_{v} \\ ρ σ_{u} σ_{v} & σ_{v}^{2} \end{matrix}] .

$σ_{u}^{2}$ and $σ_{v}^{2}$ are variance components in the Bernoulli and Poisson or GP part of the model, respectively, and ρ captures the correlation between those two parts. Intuitively, a positive u_i increases the probability of smoking abstinence; a positive v_i increases average daily cigarette consumption, while ρ reflects the correlation between the probability of smoking abstinence and average daily cigarette consumption.

3.2 Cox proportional hazard submodel

We use the Cox proportional hazards model to model time to terminal events (i.e., dropout). Let t_i be the observed dropout time since study onset for patient i and δ_i be the censoring indicator (δ_i = 1 if dropout is observed, and δ_i = 0 if it is censored). The terminal event intensity (or hazard) at time t_i is

h (t_{i}) = h_{0} (t_{i}) exp {γ W_{i} + ν_{1} u_{i} + ν_{2} v_{i}},

(3)

where h₀(·) is the baseline hazard function and γ represents coefficients for the risk factors in the fixed covariate matrix W_i, which can share part of or all covariates in Z_i and X_i defined in Section 3.1.2. The u_i, v_i are shared random effects accounting for the correlation between the longitudinal and survival processes and ν = (ν₁, ν₂)′ reflects the strength of association between the counts and the survival submodels.

We adopt a piecewise constant baseline hazard to approximate the baseline hazard function h₀(t), which yields good estimators of both the fixed and random effects.^26–28 To specify a piecewise exponential baseline hazard, we set a series of fixed cut points 0 = τ₀ < τ₁ <···< τ_m with τ_m equal to the maximum observation time in the dataset and assume that the baseline hazard in each interval is constant with h₀ = (h₀_,₁, h₀_,₂, . . . , h₀_,m-₁). We select the number of cut points to allow at least ten dropout events per interval.

We link the two submodels (1), (2), and (3) via the random effects u and v and assume that the longitudinal and survival processes are independent given u and v. The joint model reduces to an independent model if the corresponding association coefficients ν₁ = ν₂ = 0. The joint modeling framework accounts for three correlated processes: abstinence from smoking, amount of smoking by those at risk of smoking over the study period, and time to terminal events.

Let the unknown-parameter vector be θ = (α, β, ω, Σ_u,v, γ, ν, h₀), then the conditional likelihood of the observed data, y_i, is

L_{y} (y_{i} ∣ u_{i}, v_{i}) = \prod_{j = 1}^{n_{i}} P {(Y_{i j} = 0 ∣ α, β, ω, \sum_{u, v})}^{I (Y_{i j} = 0)} P {(Y_{i j} = y_{i j} ∣ α, β, ω, \sum_{u, v})}^{I (Y_{i j} > 0)} .

The conditional likelihood of event outcomes t_i and δ_i for patient i is

L_{s} (t_{i}, δ_{i} ∣ u_{i}, v_{i}) = h {(t_{i})}^{δ_{i}} S (t_{i}),

where $S (t_{i}) = exp [- \int_{0}^{t_{i}} h (s) d s]$ denotes the survival function. Finally, the full likelihood of the joint model for patient i is

L (y_{i}, t_{i}, δ_{i}, u_{i}, v_{i}) = L_{y} (y_{i} ∣ u_{i}, v_{i}) L_{s} (t_{i}, δ_{i} ∣ u_{i}, v_{i}) p (u_{i}, v_{i}),

(4)

where p(u_i, v_i) is the joint density function of random effects u_i and v_i.

3.3 Bayesian inference

We use Bayesian inference based on Markov chain Monte Carlo (MCMC) simulations to infer the unknown parameters, summarizing them using posterior means and 95% credible intervals (CI). We select noninformative priors for all the parameters. Specifically, we use the normal distribution N(0, 100) for all components in α, β, γ, and ν. For the covariance matrix Σ_u,v, a commonly used conjugate prior is an inverse-Wishart distribution; however, the standard “noninformative” inverse-Wishart prior cannot guarantee noninformativeness with a single parameter to control the precision of all entries in the covariance matrix.²⁹ Thus, we use the Cholesky decomposition to set the prior distribution and rewrite the covariance matrix as Σ_u,v = LL′, where L is the corresponding lower triangular matrix. We specify a Uniform(0, 10) prior distribution for the diagonal elements in L to ensure positivity and a Normal(0, 100) prior distribution for the off-diagonal elements, allowing for possible negative correlations. This approach yields satisfactory results and has been widely used in longitudinal and survival analysis.^9,30 We obtain initial values for the model parameters by separately modeling the longitudinal and survival data with a zero-inflated count model and a Cox proportional hazards model.

Model fitting is performed by using the BUGS programming language³¹ by specifying the full likelihood and prior distributions of all unknown parameters. To analyze the real data, we run multiple chains with initial values suggested by results of fitting the independent model. As evidence of convergence, we use the Gelman–Rubin diagnostic to ensure that the scale reduction R̂ of all parameters is smaller than 1.1.³² We also examine trace and autocorrelation plots to determine the burn-in length and ensure chain convergence. Finally, we explore other choices of priors to ensure that the results are robust to prior specifications.³²

3.4 Model selection

From a wide variety of model-selection criteria in the Bayesian toolbox, we adopt the deviance information criterion (DIC), the extended Akaike information criterion (EAIC), the expected Bayesian (or Schwarz) information criterion (EBIC), and log-pseudo-marginal likelihood (LPML).^33,34

The DIC provides an assessment of model fitting and a penalty for model complexity. Let f(y|θ) represent the likelihood function for the observed data y given the parameter vector θ, and let h(y) be a standardizing function of the data alone. The deviance is defined as D(θ) = −2 log (f(y|θ)) + 2 log (h(y)). The DIC is computed as DIC = D̄ + p_D, where D̄ = E_θ_|_y[D(θ)] is the posterior expectation of the deviance, p_D = D̄ − D(θ̄) is the effective number of parameters to capture model complexity, and D(θ̄) = D(E_θ_|_y[θ]) is the deviance evaluated at the posterior mean of the parameters. A smaller value of DIC indicates a better model fit. We estimate EAIC and EBIC by $\hat{EAIC} = - 2 \bar{D} + 2 ν$ and $\hat{EBIC} = - 2 \bar{D} + ν log (n)$ , where ν is number of parameters in the model and n is number of subjects. The model with the smaller EAIC and EBIC is preferred.

We also compute the conditional predictive ordinate (CPO) statistics for model selection.³⁴ CPO is derived from the posterior distribution and can be written as CPO_i = ∫ f(y_i|θ)p(θ|D⁽⁻ⁱ⁾)dθ, where D⁽⁻ⁱ⁾ is the full data with observation i deleted. The LPML, defined as $LPML = \sum_{i = 1}^{n} log ({\hat{CPO}}_{i})$ , is used to summarize the CPO_i values.³⁵ Larger values of LPML indicate a better fit.

4 Simulation Study

We perform extensive simulation studies to compare the performance of the proposed joint models with that of separate independent models. We design the first simulation study to have no correlation between the event time and the zero-inflated longitudinal counts (i.e., ν₁ = 0, ν₂ = 0) and the second simulation study to have a strong correlation (i.e., ν₁ = −0.6, ν₂ = 0.4 for the ZIP joint model and ν₁ = −0.3, ν₂ = 1.2 for the ZIGP joint model).We consider 500 datasets, each with sample size N = 1600. We generate the longitudinal zero-inflated counts under both ZIP and ZIGP distributions. The simulated data structure is similar to that of the ATBC study.

The joint model for the simulation study is

\begin{array}{l} logit (ϕ_{i j}) & = & α_{0} + α_{1} I ({trt}_{i}) + α_{2} {Time}_{j} + u_{i}, \\ log (λ_{i j}) & = & β_{0} + β_{1} I ({trt}_{i}) + β_{2} {Time}_{j} + v_{i}, \\ h (t_{i}) & = & h_{0} (t_{i}) exp {γ I ({trt}_{i}) + ν_{1} u_{i} + ν_{2} v_{i}}, \end{array}

where I(trt_i) is a binary treatment indicator for subject i and j = 1, . . . , 6 time points. We set α = (0.8, −0.3, 0.2)′, β = (0.5, 0.4, −0.2)′, and γ = −1.5. For the random effects covariance matrix, we choose ρ = −0.5, $σ_{u}^{3} = 1.2$ , and $σ_{v}^{2} = 0.6$ for ZIP and $σ_{u}^{2} = 1.5$ and $σ_{v}^{2} = 0.7$ for ZIGP. We set the dispersion parameter in the ZIGP distribution to 0.3.

The time to terminal event is simulated by using the Cox submodel with constant baseline hazard submodel, i.e., h₀ = 1. The independent censoring time is generated from a Uniform(2, 6) distribution resulting in 30% censoring, which is similar to that in the ATBC study. We set δ_i = 1 if the event time t_i is not greater than the censoring time and 0 otherwise.

We apply the Bayesian approach described in Section 3.3 to obtain samples from the posterior distributions of the parameters of interest. The OpenBUGS code for the ZIP joint model is included in the online appendix as an example. For each dataset in the simulation study, we run two parallel MCMC chains with over-dispersed initial values. We run both chains for 60 000 iterations. The first 40 000 iterations are discarded as burn-in; the remaining 20 000 samples are used to determine the posterior distributions of the parameters of interest. Trace plots indicate good mixing properties of the chains for the model parameters, and the scale reduction R̂ of all parameters is smaller than 1.1. Thus, convergence is achieved.

Through the simulation studies, we want to assess accuracy of both the joint and independent models and determine how parameter estimates, standard errors, and coverage probabilities are affected by ignoring the nonignorable censoring owing to dropout. To that end, we compute the bias (the average of the posterior means minus the true values), standard error (SE: the square root of the average of the posterior variance), standard deviation (SD: the standard deviation of the posterior means), and coverage probabilities (CPs) of 95% equal-tail credible intervals, all displayed in Tables 1 and 2.

Table 1.

Simulation results for the independent and joint models when the terminal event is independent of the longitudinal outcomes under both the ZIP and ZIGP distributions.

True

Independent model

Joint model

Bias

ZIP model

For zero-inflated outcomes

α₀

0.800

−0.006

0.102

0.132

0.956

−0.012

0.110

0.129

0.954

α₁

−0.300

0.001

0.108

0.134

0.940

0.001

0.112

0.125

0.926

α₂

0.200

0.001

0.025

0.032

0.946

0.001

0.025

0.031

0.960

β₀

0.500

−0.011

0.133

0.177

0.948

−0.022

0.135

0.166

0.948

β₁

0.400

0.010

0.170

0.225

0.956

−0.004

0.216

0.237

0.938

β₂

−0.200

−0.001

0.063

0.082

0.954

0.013

0.172

0.211

0.964

−0.500

0.029

0.153

0.201

0.950

0.008

0.166

0.192

0.942

σ_{u}^{2}

1.200

0.157

0.370

0.491

0.942

0.108

0.366

0.455

0.936

σ_{v}^{2}

0.600

0.027

0.082

0.110

0.944

0.030

0.085

0.104

0.944

For survival

−1.500

−0.004

0.083

0.111

0.950

−0.020

0.085

0.107

0.952

h₀

1.000

0.005

0.050

0.067

0.956

0.014

0.051

0.065

0.954

ν₁

0.000

−0.013

0.144

0.268

0.956

ν₂

0.000

−0.020

0.164

0.269

0.952

ZIGP model

For zero-inflated outcomes

α₀

0.800

0.004

0.155

0.145

0.926

0.003

0.151

0.166

0.938

α₁

−0.300

0.008

0.140

0.135

0.940

0.004

0.139

0.152

0.944

α₂

0.200

0.002

0.037

0.039

0.948

0.001

0.039

0.045

0.948

β₀

0.500

−0.011

0.183

0.189

0.946

−0.011

0.177

0.211

0.960

β₁

0.400

0.009

0.213

0.218

0.946

−0.003

0.205

0.243

0.952

β₂

−0.200

−0.012

0.084

0.083

0.948

−0.007

0.082

0.097

0.962

0.300

−0.000

0.049

0.047

0.948

−0.003

0.051

0.053

0.930

−0.500

0.074

0.194

0.199

0.932

0.048

0.221

0.225

0.902

σ_{u}^{2}

1.500

0.343

0.616

0.658

0.912

0.275

0.646

0.727

0.904

σ_{v}^{2}

0.700

0.039

0.140

0.135

0.932

0.035

0.136

0.149

0.948

For survival

−1.500

−0.006

0.088

0.091

0.956

−0.023

0.091

0.104

0.942

h₀

1.000

0.003

0.052

0.053

0.958

0.012

0.053

0.062

0.954

ν₁

0.000

−0.016

0.142

0.283

0.950

ν₂

0.000

−0.023

0.171

0.301

0.964

Open in a new tab

Table 2.

Simulation results using the independent and joint models when the terminal event depends on the longitudinal outcomes under both ZIP and ZIGP distributions.

True

Independent model

Joint model

Bias

ZIP model

For zero-inflated outcomes

α₀

0.800

−0.069

0.064

0.070

0.772

−0.006

0.061

0.068

0.940

α₁

−0.300

0.057

0.065

0.074

0.862

−0.005

0.066

0.075

0.946

α₂

0.200

−0.030

0.017

0.019

0.554

0.001

0.018

0.020

0.956

β₀

0.500

0.058

0.088

0.097

0.890

−0.005

0.088

0.097

0.922

β₁

0.400

−0.060

0.104

0.119

0.908

0.001

0.104

0.120

0.954

β₂

−0.200

0.084

0.036

0.041

0.358

0.001

0.038

0.043

0.958

−0.500

−0.033

0.099

0.112

0.900

−0.006

0.098

0.109

0.924

σ_{u}^{2}

1.200

−0.080

0.203

0.227

0.906

0.014

0.212

0.237

0.934

σ_{v}^{2}

0.600

0.016

0.058

0.063

0.922

0.013

0.054

0.061

0.934

For survival

−1.500

0.198

0.083

0.101

0.382

−0.013

0.097

0.122

0.972

h₀

0.300

0.013

0.015

0.017

0.846

0.001

0.018

0.021

0.960

ν₁

−0.600

−0.045

0.146

0.177

0.946

ν₂

0.400

−0.038

0.174

0.202

0.938

ZIGP model

For zero-inflated outcomes

α₀

0.800

−0.095

0.097

0.091

0.782

−0.009

0.081

0.083

0.940

α₁

−0.300

0.131

0.083

0.626

0.005

0.083

0.944

α₂

0.200

−0.088

0.026

0.024

0.048

−0.001

0.024

0.944

β₀

0.500

0.042

0.118

0.936

−0.011

0.105

0.114

0.974

β₁

0.400

−0.058

0.125

0.130

0.936

−0.001

0.121

0.132

0.962

β₂

−0.200

0.051

0.038

0.040

0.734

0.001

0.044

0.043

0.958

0.300

0.020

0.031

0.032

0.882

0.001

0.029

0.030

0.958

−0.500

0.050

0.126

0.128

0.904

0.006

0.129

0.123

0.930

σ_{u}^{2}

1.500

0.030

0.335

0.314

0.916

0.093

0.335

0.944

σ_{v}^{2}

0.700

−0.059

0.092

0.087

0.866

0.011

0.078

0.085

0.960

For survival

−1.500

0.343

0.097

0.088

0.052

−0.006

0.117

0.119

0.956

h₀

0.300

0.010

0.022

0.015

0.812

0.001

0.023

0.022

0.944

ν₁

−0.300

−0.002

0.145

0.191

0.938

ν₂

1.200

0.003

0.189

0.228

0.932

Open in a new tab

In simulation study 1, the data are generated by using the independence model, i.e., the longitudinal counts are not correlated with the terminal event. Table 1 presents the model estimates for the data under both the ZIP and ZIGP distributions for the zero-inflated longitudinal counts. The independent and joint models result in negligible bias, SEs close to the SD, and nominal CPs reasonably close to 95%. We notice that the bias of ρ and $σ_{u}^{2}$ are slightly larger than the bias for other parameters and that the CPs are slightly less than or greater than 0.95, indicating difficulty in distinguishing the random effects.¹¹ Of note, the estimates of the shared random effects parameters ν₁ and ν₂ in the joint models are correctly near 0, and the CP is close to 95%. The fact that over-parameterized joint models still provide unbiased and efficient estimates of the parameters indicates that the joint approach is robust to model over-parameterization.

In simulation study 2, the data are generated by using the joint model with zero-inflated longitudinal counts where the terminal events are designed to have a correlation with ν₁ = −0.6, ν₂ = 0.4 for the ZIP joint model and ν₁ = −0.3, ν₂ = 1.2 for the ZIGP joint model. Table 2 lists the model estimates for the data under both the ZIP and ZIGP distributions for the zero-inflated longitudinal counts. In general, use of a joint model results in negligible bias and small SEs close to the SD, and nominal CPs reasonably close to 95%. However, fitting the independent model to the correlated data results in biased estimates of most parameters and CPs far from the nominal value. These results demonstrate that the joint model can generally recover the true values of parameters in the presence of nonignorable censoring.

5 Data Analysis

We apply the proposed joint model and independent model to the motivating ATBC study. The primary outcome of interest is daily cigarette count, which is zero-inflated with a mean of 15 and range [0, 90] over the study period. The participants are scheduled to visit every four months after the baseline visit and the median follow-up duration is 2.7 years. In the joint model, fixed effects in the longitudinal submodel include age (centered at 57, denoted by Age), years of smoking (centered at 35, denoted by Smokyrs), baseline cigarette counts (centered at 20, denoted by Bsmokcount), and year from baseline (denoted by Visit, not mean centered). Frailty terms in the survival submodel are shared random effects from the two corresponding parts in the zero-inflated longitudinal submodel; the coefficients of those random effects govern the strength of the correlation between the longitudinal and survival submodels. The survival submodel only includes baseline smoking count because it is the only significant predictor.

The longitudinal zero-inflated cigarette count is modeled by using the ZIP and ZIGP models described in sections 3.1 and 3.2 with the following linear predictors:

logit (ϕ_{i j}) = α_{0} + α_{1} {Smokyrs}_{i} + α_{2} {Age}_{i} + α_{3} {Visit}_{i j} + α_{4} {Bsmokcount}_{i} + u_{i},

(5)

log (λ_{i j}) = β_{0} + β_{1} {Smokyrs}_{i} + β_{2} {Age}_{i} + β_{3} {Visit}_{i j} + β_{4} {Bsmokcount}_{i} + v_{i} .

(6)

The “survival” time is time to dropout and the model for the dropout hazard is

h (t_{i}) = h_{0} (t_{i}) exp {γ {Bsmokcount}_{i} + ν_{1} u_{u} + ν_{2} v_{v}} .

(7)

First, we fit the ZIGP model for the longitudinal smoking counts in the ATBC data, and the results suggest that the data do not display strong dispersion since the dispersion parameter ω approaches 0. As described in Section 3.1.2, the ZIGP model reduces to the ZIP model in this circumstance. Therefore, for the data analysis, we only consider joint and independent models under the ZIP framework. Table 3 compares those models by using the model-selection criteria described in Section 3.4. The joint model performs better than the corresponding independence model with smaller DIC, EAIC, and EBIC values and larger LPML values, suggesting that the joint model is preferable.

Table 3.

Model comparison statistics for the ATBC dataset.

Criterion	Model
Criterion	ZIP independent model	ZIP joint model
DIC	3 941 316	3 937 350
EAIC	1 971 899	1 969 921
EBIC	1 972 031	1 970 069
LPML	−1 002 022	−999 228.6

Open in a new tab

DIC: deviance information criterion; EAIC: expected AIC; EBIC: expected BIC; LPML: log-pseudo-marginal likelihood.

Table 4 summarizes the posterior mean, SD, and 95 % equal-tail credible interval of the parameters from both the joint and independent models. In the joint model, the results of the longitudinal submodel suggest that older participants have higher probability of smoking abstinence (OR = 1.139, 95% CI [1.118, 1.161]) and lower daily cigarette consumption (RR = 0.991, 95% CI [0.990, 0.992]). Moreover, smoking abstinence is less likely for those participants with greater prior smoking years (OR = 0.905, 95% CI [0.894, 0.914]) while daily cigarette consumption is higher (OR = 0.907, 95% CI [0.898, 0.915]). Participants are more likely to be abstinent and have lower cigarette consumption further from baseline. Importantly, the results of the survival submodel suggest that higher baseline cigarette consumption increases the possibility of dropout.

Table 4.

Results of fitting both the ZIP joint and independent models in the ATBC dataset.

Variables	Joint model				Independent model
Variables	Mean	SD	Median	95% CI	Mean	SD	Median	95% CI
Probability of smoking abstinence
Intercept	−9.172	0.068	−9.172 [	−9.305, −9.047]	−8.969	0.340	−9.060 [	−9.277, −8.006]
Smoke years	−0.100	0.006	−0.100 [	−0.112, −0.090]	−0.098	0.007	−0.097 [	−0.112, −0.085]
Age	0.131	0.010	0.131	[0.112, 0.149]	0.126	0.011	0.123	[0.109, 0.149]
Time (year)	0.974	0.007	0.974	[0.960, 0.988]	0.962	0.025	0.970	[0.903, 0.985]
Baseline cigarettes	−0.098	0.005	−0.098 [	−0.107, −0.089]	−0.102	0.010	−0.103 [	−0.117, −0.077]
Expected number of cigarettes per day
Intercept	2.882	0.002	2.882	[2.879, 2.885]	2.883	0.006	2.882	[2.877, 2.904]
Smoke years	0.003	0.000	0.003	[0.003, 0.004]	0.003	0.000	0.003	[0.002, 0.004]
Age	−0.009	0.000	−0.009 [	−0.010, −0.008]	−0.009	0.000	−0.009	[−0.010, −0.008]
Time (year)	−0.015	0.000	−0.015 [	−0.016, −0.015]	−0.015	0.000	−0.015	[−0.016, −0.014]
Baseline cigarettes	0.041	0.000	0.041	[0.041, 0.042]	0.041	0.000	0.041	[0.040, 0.042]
Time to dropout
Baseline cigarettes	0.011	0.001	0.011	[0.009, 0.013]	0.011	0.001	0.011	[0.009, 0.013]
ν₁	−0.009	0.003	−0.009	[−0.015, −0.003]
ν₂	−0.038	0.059	−0.038	[−0.147, 0.080]
Random eFFects
ρ	−0.237	0.009	−0.237	[−0.256, −0.219]	−0.126	0.249	−0.217	[−0.442, 0.522]
σ_u	5.136	0.048	5.137	[5.038, 5.227]	5.055	0.171	5.095	[4.561, 5.254]
σ_v	0.243	0.001	0.243	[0.241, 0.246]	0.250	0.010	0.246	[0.241, 0.283]

Open in a new tab

The coefficients of random effects are ν₁ = −0.009 with 95% CI [−0.015, −0.003] and ν₂ = −0.038 with 95% CI [−0.147, 0.080]). The significant coefficient ν₁ in the zero part indicates that participants with a lower probability of abstinence are more likely to drop out. The association between the longitudinal and survival submodels confirms our observations in Section 2. Table 4 also shows a negative correlation coefficient (ρ = −0.237, 95% CI = [0.256, 0.013]) between random effects in the zero and Poisson parts, confirming the more obvious result that participants who smoke more cigarettes have a lower probability of abstinence.

We also give insight into our model’s ability to provide subject-specific estimates of predictions conditional on the observed covariates for the ATBC study. We randomly choose two groups of participants in the ATBC study: Group 1 includes participants with varying follow-up visits and Group 2 includes all subjects with 20 follow-up visits with three visits per year. The participants in each group are randomly selected from those with different numbers of average daily cigarettes and distinct smoking abstinence patterns. We save the estimated subject-specific random effects from the MCMC samples and compute the estimated average daily cigarette counts by rounding the value of (1 − ϕ̂_ij) λ̂_ij for subject i at visit j. The ϕ̂_ij and λ̂_ij are computed based on Equations (5) and (6) using the estimated parameters. This is an informal method to evaluate the performance of the final model.

Table 5 summarizes the predicted average daily number of cigarettes, while the heatmaps of the observed and estimated average daily cigarette counts for both chosen groups are displayed in Figure 2. The observed and estimated average daily cigarette counts are very similar and have consistent trends. These results show that our model has a good capacity to reproduce the smoking patterns for randomly chosen individuals.

Table 5.

The observed and estimated average daily smoking counts and the estimated random effects of two groups of six subjects in the ATBC study.

Subjects (Group 1)	(a):Observed average daily smoking counts at each visit (1–24).
Subjects (Group 1)	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24
1	40	30	30	30
2	25	30	30	30	30	30	30	30	30	30	30	30	35	30
3	40	35	40	40	30	30	30	0	0	0	0	0	0	0
4	25	15	18	18	18	15	15	15	15	15	15	20	15	15	15
5	15	15	15	10	15	10	10	13	15	15	15	15	15	0	0	0	0	0	0	0	0	0	0	0
6	10	10	10	10	10	10	10	10	10	10	0	0	0	0	0	0	0	0	0	0	0	0	0	0

	(b): Estimated average daily smoking counts at each visit (1–24).
	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24

1	34	34	34	34
2	31	31	31	31	30	30	30	30	30	30	29	29	29	29
3	29	27	26	23	21	18	15	12	10	8	6	5	3	3
4	17	17	17	17	16	16	16	16	16	16	16	16	16	16	16
5	11	11	11	11	11	11	10	10	10	9	8	8	7	6	5	4	4	3	2	2	1	1	1	1
6	9	9	9	8	8	7	7	6	5	5	4	3	3	2	2	1	1	1	1	0	0	0	0	0

Subjects (Group 2)	(c): Observed average daily smoking counts at each visit (1–20).
Subjects (Group 2)	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20

1	6	6	6	6	6	6	6	6	6	6	6	6	6	6	6	6	6	6	6	6
2	20	20	20	20	20	20	20	20	20	20	20	20	20	20	20	20	20	20	20	20
3	17	17	15	20	20	20	20	20	20	20	20	20	20	20	20	20	20	20	20	5
4	20	20	20	20	20	15	20	20	20	20	20	0	0	0	0	0	0	0	0	0
5	20	20	20	20	20	20	20	15	15	15	20	17	15	15	20	15	20	0	0	0
6	20	20	20	20	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0

	(d): Estimated average daily smoking counts at each visit (1–20).
	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20

1	7	7	7	7	7	7	7	7	7	7	6	6	6	6	6	6	6	6	6	6
2	22	22	21	21	21	21	21	21	21	21	21	20	20	20	20	20	20	20	20	20
3	19	19	19	19	19	19	19	19	18	18	18	18	18	18	18	18	17	17	17	16
4	19	19	18	18	17	17	16	15	14	12	11	9	8	6	5	4	3	2	2	1
5	17	17	17	17	17	17	16	16	16	16	16	16	15	15	15	14	14	13	12	12
6	13	12	10	8	7	6	4	3	3	2	1	1	1	1	0	0	0	0	0	0

Open in a new tab

Heatmaps of the observed and estimated average daily smoking counts in two groups of six subjects in the ATBC study.

6 Discussion

In this article, we propose a joint modeling framework to analyze longitudinal zero-inflated counts censored by terminal events. The joint model consists of a zero-inflated count submodel and a Cox survival submodel linked via shared random effects. This joint model accommodates the co-existent features including zero inflation, over-dispersion, and dependent censoring, within the longitudinal counts data. It results in better inference for covariate effects on longitudinal counts with informative dropout. Our simulation study demonstrates that the proposed joint models improve the accuracy of the parameter estimates compared with independent models in the presence of informative censoring. Conversely, our joint models are robust to misspecification, producing results comparable to the independent model under the independent-terminal-event scenarios.

The results from the present ATBC study are consistent with previous findings. In a paper modeling smoking patterns via a latent transition model, Luo³⁶ showed that older people have a higher probability to make quit attempts, conditional on the corresponding random effects. In the current study, age similarly has a positive effect on the probability of smoking abstinence and a negative effect on average daily cigarette consumption. Our study adds to previous findings by modeling the cigarette counts directly, and by investigating the actual reported daily cigarette consumption with respect to demographic and baseline covariates.

The joint model described herein is a shared random effects model for accommodating monotone missingness of longitudinal data, with the merit of easy implementation by using the OpenBUGS software package. Because a mixture of non-monotone missingness and monotone missingness may exist, future work may extend the joint model to account for non-monotone missingness. In addition, dynamic prediction of the time to terminal event has recently become a research topic with significant clinical interest; future research could consider dynamic prediction under the joint model framework. In this paper, we use the shared random effect method to construct the joint model, which assumes that two submodels are dependent only via their shared random effects. Alternative approaches, such as applying selection and pattern mixture models, can also be accommodate missingness. In the future, these alternative methods can be investigated and compared to the current findings.

Supplementary Material

supp

NIHMS818171-supplement-supp.pdf^{(22.3KB, pdf)}

Acknowledgments

This work was supported in part by the National Institute of Health and the National Institute of Alcohol Abuse and Addiction, Grant No. R03AA020648. Sheng Luo’s research was supported in part by the National Institute of Neurological Disorders and Stroke under Award No. R01NS091307 and by the National Center for Advancing Translational Sciences under Award No. KL2-TR000370. The authors acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing high-performing computing resources that have contributed to the research results reported in this article (http://www.tacc.utexas.edu). The authors also thank the UTSPH information technology staff for their technical support.

References

1.Maruotti A. A two-part mixed-effects pattern-mixture model to handle zero-inflation and in-completeness in a longitudinal setting. Biometrical Journal. 2011;53(5):716–734. doi: 10.1002/bimj.201000190. [DOI] [PubMed] [Google Scholar]
2.DeSantis SM, Lazaridis C, Ji S, Spinale FG. Analyzing propensity matched zero-inflated count outcomes in observational studies. Journal of Applied Statistics. 2013;127:1–15. doi: 10.1080/02664763.2013.834296. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Hur K. PhD dissertation. University of Illinois; Chicago: 1999. A random effects zero-inflated Poisson regression model for clustered extra-zero Count Data. [Google Scholar]
4.Buu A, Li R, Tan X, Zucker RA. Statistical models for longitudinal zero-inflated count data with applications to the substance abuse field. Statistics in Medicine. 2012;31:4074–4086. doi: 10.1002/sim.5510. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Wang H, Heitjan DF. Modeling heaping in self-reported cigarette counts. Statistics in Medicine. 2008;27(19):3789–3804. doi: 10.1002/sim.3281. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Lambert D. Zero-Inflated Poisson regression, with an application to defects in manufacturing. Technometrics. 1992;34:1–14. [Google Scholar]
7.Ridout M, Hinde J, DemeAtrio CG. A score test for testing a zero-inflated Poisson regression model against zero-inflated negative Binomial alternatives. Biometrics. 2001;57(1):219–223. doi: 10.1111/j.0006-341x.2001.00219.x. [DOI] [PubMed] [Google Scholar]
8.Zhu H, Luo S, DeSantis SM. Zero-inflated count models for longitudinal measurements with heterogeneous random effects. Statistical Methods in Medical Research. 2015;23(1):1–16. doi: 10.1177/0962280215588224. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Luo S. A Bayesian approach to joint analysis of multivariate longitudinal data and parametric accelerated failure time. Statistics in Medicine. 2014;33(4):580–594. doi: 10.1002/sim.5956. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Faucett CL, Thomas DC. Simultaneously modelling censored survival data and repeatedly measured covariates: a Gibbs sampling approach. Statistics in Medicine. 1996;15(15):1663–1685. doi: 10.1002/(SICI)1097-0258(19960815)15:15<1663::AID-SIM294>3.0.CO;2-1. [DOI] [PubMed] [Google Scholar]
11.Henderson R, Diggle P, Dobson A. Joint modelling of longitudinal measurements and event time data. Biostatistics. 2000;1(4):465–480. doi: 10.1093/biostatistics/1.4.465. [DOI] [PubMed] [Google Scholar]
12.Hatfield LA, Boye ME, Hackshaw MD, Carlin BP. Multilevel Bayesian models for survival times and longitudinal patient-reported outcomes with many zeros. Journal of the American Statistical Association. 2012;107(499):875–885. [Google Scholar]
13.Tsiatis AA, Davidian M. Joint modeling of longitudinal and time-to-event data: An overview. Statistica Sinica. 2004;14(3):809–834. [Google Scholar]
14.Wu L, Liu W, Yi GY, Huang Y. Analysis of longitudinal and survival data: Joint modeling, inference methods, and issues. Journal of Probability and Statistics. 2012 [Google Scholar]
15.Yu M, Law NJ, Taylor JM, Sandler HM. Joint longitudinal-survival-cure models and their application to prostate cancer. Statistica Sinica. 2004;14(3):835–862. [Google Scholar]
16.Wulfsohn MS, Tsiatis AA. A joint model for survival and longitudinal data measured with error. Biometrics. 1997;53(1):330–339. [PubMed] [Google Scholar]
17.Tseng YK, Hsieh F, Wang JL. Joint modelling of accelerated failure time and longitudinal data. Biometrika. 2005;92(3):587–603. [Google Scholar]
18.Brown ER, Ibrahim JG. Bayesian Approaches to Joint Cure-Rate and Longitudinal Models with Applications to Cancer Vaccine Trials. Biometrics. 2003;59(3):686–693. doi: 10.1111/1541-0420.00079. [DOI] [PubMed] [Google Scholar]
19.Xu J, Zeger SL. Joint analysis of longitudinal data comprising repeated measures and times to events. Journal of the Royal Statistical Society: Series C (Applied Statistics) 2001;50(3):375–387. [Google Scholar]
20.Chi YY, Ibrahim JG. Joint models for multivariate longitudinal and multivariate survival data. Biometrics. 2006;62(2):432–445. doi: 10.1111/j.1541-0420.2005.00448.x. [DOI] [PubMed] [Google Scholar]
21.Elashoff RM, Li G, Li N. An approach to joint analysis of longitudinal measurements and competing risks failure time data. Statistics in Medicine. 2007;26(14):2813–2835. doi: 10.1002/sim.2749. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Rizopoulos D, Verbeke G, Lesaffre E, Vanrenterghem Y. A two-part joint model for the analysis of survival and longitudinal binary data with excess zeros. Biometrics. 2008;64(2):611–619. doi: 10.1111/j.1541-0420.2007.00894.x. [DOI] [PubMed] [Google Scholar]
23.Xie FC, Wei BC, Lin JG. Score tests for zero-inflated generalized Poisson mixed regression models. Computational Statistics and Data Analysis. 2009;53(9):3478–3489. [Google Scholar]
24.Virtamo J, Pietinen P, Huttunen J, Korhonen P, Malila N, Virtanen M, et al. Incidence of cancer and mortality following alpha-tocopherol and beta-carotene supplementation: a postintervention follow-up. JAMA: Journal of the American Medical Association. 2003;290(4):476–485. doi: 10.1001/jama.290.4.476. [DOI] [PubMed] [Google Scholar]
25.Ntzoufras I. Bayesian modeling using WinBUGS. Vol. 698. John Wiley & Sons; 2011. [Google Scholar]
26.Lawless J, Zhan M. Analysis of interval-grouped recurrent-event data using piecewise constant rate functions. Canadian Journal of Statistics. 1998;26(4):549–565. [Google Scholar]
27.Feng S, Wolfe RA, Port FK. Frailty survival model analysis of the national deceased donor kidney transplant dataset using Poisson variance structures. Journal of the American Statistical Association. 2005;100(471) [Google Scholar]
28.Liu L, Huang X. Joint analysis of correlated repeated measures and recurrent events processes in the presence of death, with application to a study on acquired immune deficiency syndrome. Journal of the Royal Statistical Society: Series C (Applied Statistics) 2009;58(1):65–81. [Google Scholar]
29.O’Malley AJ, Zaslavsky AM. Domain-level covariance analysis for multilevel survey data with structured nonresponse. Journal of the American Statistical Association. 2008;103(484):1405–1418. [Google Scholar]
30.Chen BE, Cook RJ, Lawless JF, Zhan M. Statistical methods for multivariate interval-censored recurrent events. Statistics in medicine. 2005;24(5):671–691. doi: 10.1002/sim.1936. [DOI] [PubMed] [Google Scholar]
31.Thomas A, O’Hara B, Ligges U, Sturtz S. Making BUGS open. R news. 2006;6(1):12–17. [Google Scholar]
32.Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. Bayesian Data Analysis. CRC press; 2013. [Google Scholar]
33.Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2002;64(4):583–639. [Google Scholar]
34.Carlin BP, Louis TA. Bayesian Methods for Data Analysis. CRC Press; 2011. [Google Scholar]
35.Dey DK, Chen MH, Chang H. Bayesian approach for nonlinear random effects models. Biometrics. 1997;53(4):1239–1252. [Google Scholar]
36.Luo S. Joint analysis of stochastic processes with application to smoking patterns and insomnia. Statistics in Medicine. 2013;32(29):5133–5144. doi: 10.1002/sim.5906. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supp

NIHMS818171-supplement-supp.pdf^{(22.3KB, pdf)}

[R1] 1.Maruotti A. A two-part mixed-effects pattern-mixture model to handle zero-inflation and in-completeness in a longitudinal setting. Biometrical Journal. 2011;53(5):716–734. doi: 10.1002/bimj.201000190. [DOI] [PubMed] [Google Scholar]

[R2] 2.DeSantis SM, Lazaridis C, Ji S, Spinale FG. Analyzing propensity matched zero-inflated count outcomes in observational studies. Journal of Applied Statistics. 2013;127:1–15. doi: 10.1080/02664763.2013.834296. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Hur K. PhD dissertation. University of Illinois; Chicago: 1999. A random effects zero-inflated Poisson regression model for clustered extra-zero Count Data. [Google Scholar]

[R4] 4.Buu A, Li R, Tan X, Zucker RA. Statistical models for longitudinal zero-inflated count data with applications to the substance abuse field. Statistics in Medicine. 2012;31:4074–4086. doi: 10.1002/sim.5510. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Wang H, Heitjan DF. Modeling heaping in self-reported cigarette counts. Statistics in Medicine. 2008;27(19):3789–3804. doi: 10.1002/sim.3281. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Lambert D. Zero-Inflated Poisson regression, with an application to defects in manufacturing. Technometrics. 1992;34:1–14. [Google Scholar]

[R7] 7.Ridout M, Hinde J, DemeAtrio CG. A score test for testing a zero-inflated Poisson regression model against zero-inflated negative Binomial alternatives. Biometrics. 2001;57(1):219–223. doi: 10.1111/j.0006-341x.2001.00219.x. [DOI] [PubMed] [Google Scholar]

[R8] 8.Zhu H, Luo S, DeSantis SM. Zero-inflated count models for longitudinal measurements with heterogeneous random effects. Statistical Methods in Medical Research. 2015;23(1):1–16. doi: 10.1177/0962280215588224. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Luo S. A Bayesian approach to joint analysis of multivariate longitudinal data and parametric accelerated failure time. Statistics in Medicine. 2014;33(4):580–594. doi: 10.1002/sim.5956. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Faucett CL, Thomas DC. Simultaneously modelling censored survival data and repeatedly measured covariates: a Gibbs sampling approach. Statistics in Medicine. 1996;15(15):1663–1685. doi: 10.1002/(SICI)1097-0258(19960815)15:15<1663::AID-SIM294>3.0.CO;2-1. [DOI] [PubMed] [Google Scholar]

[R11] 11.Henderson R, Diggle P, Dobson A. Joint modelling of longitudinal measurements and event time data. Biostatistics. 2000;1(4):465–480. doi: 10.1093/biostatistics/1.4.465. [DOI] [PubMed] [Google Scholar]

[R12] 12.Hatfield LA, Boye ME, Hackshaw MD, Carlin BP. Multilevel Bayesian models for survival times and longitudinal patient-reported outcomes with many zeros. Journal of the American Statistical Association. 2012;107(499):875–885. [Google Scholar]

[R13] 13.Tsiatis AA, Davidian M. Joint modeling of longitudinal and time-to-event data: An overview. Statistica Sinica. 2004;14(3):809–834. [Google Scholar]

[R14] 14.Wu L, Liu W, Yi GY, Huang Y. Analysis of longitudinal and survival data: Joint modeling, inference methods, and issues. Journal of Probability and Statistics. 2012 [Google Scholar]

[R15] 15.Yu M, Law NJ, Taylor JM, Sandler HM. Joint longitudinal-survival-cure models and their application to prostate cancer. Statistica Sinica. 2004;14(3):835–862. [Google Scholar]

[R16] 16.Wulfsohn MS, Tsiatis AA. A joint model for survival and longitudinal data measured with error. Biometrics. 1997;53(1):330–339. [PubMed] [Google Scholar]

[R17] 17.Tseng YK, Hsieh F, Wang JL. Joint modelling of accelerated failure time and longitudinal data. Biometrika. 2005;92(3):587–603. [Google Scholar]

[R18] 18.Brown ER, Ibrahim JG. Bayesian Approaches to Joint Cure-Rate and Longitudinal Models with Applications to Cancer Vaccine Trials. Biometrics. 2003;59(3):686–693. doi: 10.1111/1541-0420.00079. [DOI] [PubMed] [Google Scholar]

[R19] 19.Xu J, Zeger SL. Joint analysis of longitudinal data comprising repeated measures and times to events. Journal of the Royal Statistical Society: Series C (Applied Statistics) 2001;50(3):375–387. [Google Scholar]

[R20] 20.Chi YY, Ibrahim JG. Joint models for multivariate longitudinal and multivariate survival data. Biometrics. 2006;62(2):432–445. doi: 10.1111/j.1541-0420.2005.00448.x. [DOI] [PubMed] [Google Scholar]

[R21] 21.Elashoff RM, Li G, Li N. An approach to joint analysis of longitudinal measurements and competing risks failure time data. Statistics in Medicine. 2007;26(14):2813–2835. doi: 10.1002/sim.2749. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Rizopoulos D, Verbeke G, Lesaffre E, Vanrenterghem Y. A two-part joint model for the analysis of survival and longitudinal binary data with excess zeros. Biometrics. 2008;64(2):611–619. doi: 10.1111/j.1541-0420.2007.00894.x. [DOI] [PubMed] [Google Scholar]

[R23] 23.Xie FC, Wei BC, Lin JG. Score tests for zero-inflated generalized Poisson mixed regression models. Computational Statistics and Data Analysis. 2009;53(9):3478–3489. [Google Scholar]

[R24] 24.Virtamo J, Pietinen P, Huttunen J, Korhonen P, Malila N, Virtanen M, et al. Incidence of cancer and mortality following alpha-tocopherol and beta-carotene supplementation: a postintervention follow-up. JAMA: Journal of the American Medical Association. 2003;290(4):476–485. doi: 10.1001/jama.290.4.476. [DOI] [PubMed] [Google Scholar]

[R25] 25.Ntzoufras I. Bayesian modeling using WinBUGS. Vol. 698. John Wiley & Sons; 2011. [Google Scholar]

[R26] 26.Lawless J, Zhan M. Analysis of interval-grouped recurrent-event data using piecewise constant rate functions. Canadian Journal of Statistics. 1998;26(4):549–565. [Google Scholar]

[R27] 27.Feng S, Wolfe RA, Port FK. Frailty survival model analysis of the national deceased donor kidney transplant dataset using Poisson variance structures. Journal of the American Statistical Association. 2005;100(471) [Google Scholar]

[R28] 28.Liu L, Huang X. Joint analysis of correlated repeated measures and recurrent events processes in the presence of death, with application to a study on acquired immune deficiency syndrome. Journal of the Royal Statistical Society: Series C (Applied Statistics) 2009;58(1):65–81. [Google Scholar]

[R29] 29.O’Malley AJ, Zaslavsky AM. Domain-level covariance analysis for multilevel survey data with structured nonresponse. Journal of the American Statistical Association. 2008;103(484):1405–1418. [Google Scholar]

[R30] 30.Chen BE, Cook RJ, Lawless JF, Zhan M. Statistical methods for multivariate interval-censored recurrent events. Statistics in medicine. 2005;24(5):671–691. doi: 10.1002/sim.1936. [DOI] [PubMed] [Google Scholar]

[R31] 31.Thomas A, O’Hara B, Ligges U, Sturtz S. Making BUGS open. R news. 2006;6(1):12–17. [Google Scholar]

[R32] 32.Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. Bayesian Data Analysis. CRC press; 2013. [Google Scholar]

[R33] 33.Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2002;64(4):583–639. [Google Scholar]

[R34] 34.Carlin BP, Louis TA. Bayesian Methods for Data Analysis. CRC Press; 2011. [Google Scholar]

[R35] 35.Dey DK, Chen MH, Chang H. Bayesian approach for nonlinear random effects models. Biometrics. 1997;53(4):1239–1252. [Google Scholar]

[R36] 36.Luo S. Joint analysis of stochastic processes with application to smoking patterns and insomnia. Statistics in Medicine. 2013;32(29):5133–5144. doi: 10.1002/sim.5906. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Joint modeling of longitudinal zero-inflated count and time-to-event data: A Bayesian perspective

Huirong Zhu

Stacia M DeSantis

Sheng Luo

Abstract

1 Introduction

2 Motivating Data

Figure 1.

3 Methods and estimation

3.1 Zero-inflated longitudinal submodel

3.1.1 Zero-inflated Poisson model

3.1.2 Zero-inflated generalized Poisson model

3.2 Cox proportional hazard submodel

3.3 Bayesian inference

3.4 Model selection

4 Simulation Study

Table 1.

Table 2.

5 Data Analysis

Table 3.

Table 4.

Table 5.

Figure 2.

6 Discussion

Supplementary Material

Acknowledgments

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Joint modeling of longitudinal zero-inflated count and time-to-event data: A Bayesian perspective

Huirong Zhu

Stacia M DeSantis

Sheng Luo

Abstract

1 Introduction

2 Motivating Data

Figure 1.

3 Methods and estimation

3.1 Zero-inflated longitudinal submodel

3.1.1 Zero-inflated Poisson model

3.1.2 Zero-inflated generalized Poisson model

3.2 Cox proportional hazard submodel

3.3 Bayesian inference

3.4 Model selection

4 Simulation Study

Table 1.

Table 2.

5 Data Analysis

Table 3.

Table 4.

Table 5.

Figure 2.

6 Discussion

Supplementary Material

Acknowledgments

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases