Joint modeling of multivariate longitudinal measurements and survival data with applications to Parkinson’s disease

Bo He; Sheng Luo

doi:10.1177/0962280213480877

. Author manuscript; available in PMC: 2014 Oct 16.

Published in final edited form as: Stat Methods Med Res. 2013 Apr 16;25(4):1346–1358. doi: 10.1177/0962280213480877

Joint modeling of multivariate longitudinal measurements and survival data with applications to Parkinson’s disease

Bo He ¹, Sheng Luo ¹

PMCID: PMC3883896 NIHMSID: NIHMS528523 PMID: 23592717

Abstract

In many clinical trials, studying neurodegenerative diseases including Parkinson’s disease (PD), multiple longitudinal outcomes are collected in order to fully explore the multidimensional impairment caused by these diseases. The follow-up of some patients can be stopped by some outcome-dependent terminal event, e.g. death and dropout. In this article, we develop a joint model that consists of a multilevel item response theory (MLIRT) model for the multiple longitudinal outcomes, and a Cox’s proportional hazard model with piecewise constant baseline hazards for the event time data. Shared random effects are used to link together two models. The model inference is conducted using a Bayesian framework via Markov Chain Monte Carlo simulation implemented in BUGS language. Our proposed model is evaluated by simulation studies and is applied to the DATATOP study, a motivating clinical trial assessing the effect of tocopherol on PD among patients with early PD.

Keywords: joint model, item-response theory, latent variable, Markov Chain Monte Carlo, mixed model

1 Introduction

In many longitudinal studies and clinical trials, researchers often collect some longitudinal outcomes y. The follow-up may be stopped by a dependent terminal event (e.g. death and dropout) whose probability of occurrence is non-ignorable, i.e. dependent on unobserved values of outcomes or latent variables related to outcomes. The scientific focus is often to study changes in outcomes over time and/or to analyze the relationship between y and time to the terminal event. It has been shown that the methods analyzing y alone are biased while a properly specified joint model provide consistent estimates.¹ The approach of joint modeling constructs two sub-models for the longitudinal data and the event time data, linked by a set of subject-specific random effects.² Many joint models involve a mixed effects model for the longitudinal data and a semiparametric Cox proportional hazard model for the event time.³ Many extensions have been proposed in the joint model literature such as using both random effects and a latent stochastic process to link two sub-models ¹; using a spline-based approach to capture the non-linear shapes of subject-specific changes for longitudinal outcomes⁴; relaxation of the normality assumption on the random effects⁵; the incorporation of a cured fraction⁶; and multiple event times.⁷

However, in many clinical trials studying neurodegenerative diseases such as Parkinson’s disease (PD), Huntington disease, and Alzheimer’s disease, multiple longitudinal outcomes are collected to fully explore the multidimensional impairment caused by these diseases. To properly analyze these longitudinal data, one has to account for three sources of correlation, i.e. inter-source (different measures at the same visit time), longitudinal (same measure at different visit times), and cross correlation (different measures at different visits).⁸ Multivariate generalized linear mixed effects models have been applied to analyze the multiple longitudinal outcomes in the joint model.⁴ But the computation associated with the high-dimensional integration is complicated and time-consuming. An alternative approach is the latent variable model.⁹ Specifically, a continuous latent variable is introduced to represent patients’ underlying disease severity and the observed longitudinal data can be viewed as measurements of the latent variable. Because all outcomes share the same latent variable, the dimensionality of the data can be reduced and fewer parameters are needed. To this end, multilevel item response theory (MLIRT) models have been widely used to analyze longitudinal data in social, behavioral, and health sciences.^10–15 Within the MLIRT modeling framework, the observed measurements are viewed as imperfect manifestations of the interaction between subject-specific latent traits and measurement-specific parameters. The latent traits are regressed on covariates of interest (e.g. treatment and disease duration) as well as the confounding variables. All three sources of correlation are accounted for via either random effects or covariance matrix. Advantages of the MLIRT models include better reflection of multilevel data structure, simultaneous estimation of measurement-specific parameters and covariate effects, and accurate inference about high-level measures.^16,17 Marginal maximum-likelihood method¹⁸ and Bayesian method¹⁹ have been used for the MLIRT model inference. Skrondal and Rabe-Hesketh^20,21 have provided detailed description and summary of the IRT models.

In this article, we propose a joint model with a MLIRT sub-model for the multiple longitudinal data and a Cox proportional hazard sub-model for time to the dependent terminal event. Two sub-models are linked by random effects denoting the subject-specific disease characteristics. We develop a Bayesian approach via Markov Chain Monte Carlo (MCMC) method for parameter estimation. To the best of our knowledge, there has been no previous work on the joint analysis based on the MLIRT modeling framework. The rest of the article is organized as follows. In Section 2, we describe the joint model, Bayesian inference, and model selection criterion. In Section 3, we apply the joint model to a motivating study. In Section 4, simulation studies are conducted to assess the performance of the proposed method. Section 5 provides a summary and discussion.

2 Model

2.1 Model Formulation and Likelihood

Let y_ijk be the observed outcome k from patient i at time point j, where i = 1, …, N, j = 1, …, J, and k = 1, …, K. We have coded all outcomes such that larger values are worse clinical conditions. Let y_ij = (y_ij₁, …, y_ijk, …, y_ijK)′ be the vector of observation for patient i at visit j and let y_i = (y_i₁, …, y_iK)′ be the outcome vector across visits. Let t_i be the observed event time for patient i, and δ_i (1 if the event is observed and 0 otherwise) be the event indicator. We use a MLIRT sub-model for the multiple longitudinal outcomes and a Cox proportional hazard sub-model for the event time. In level 1 measurement model within the MLIRT framework, we model the binary outcome, the cumulative probabilities of ordinal outcome, and the continuous outcome by a two-parameter model,¹⁹ graded response model,¹⁹ and common factor model,²² respectively.

logit {p (y_{ijk} = 1 ∣ θ_{i j})} = a_{k} + b_{k} θ_{i j},

(1)

logit {p (y_{ijk} \leq l ∣ θ_{i j})} = a_{k l} - b_{k} θ_{i j}, with 1 = 1, 2, \dots, n_{k} - 1,

(2)

y_{ijk} = a_{k} + b_{k} θ_{i j} + ε_{ijk},

(3)

where random error for continuous outcomes $ε_{ijk} ~ N (0, σ_{k}^{2})$ , a_k and b_k (positive) are the outcome-specific “difficulty” parameter and “discrimination” parameter, respectively. For the ordinal outcome with n_k categories, the order constraint a_k₁ < · · · < a_kl < · · · <a_{kn_k−1} must be satisfied, and the probability of being in a particular category is p(Y_ijk = l) = p(Y_ijk ≤ l |θ_ij) − p(Y_ijk ≤ l − 1|θ_ij). The continuous latent variable θ_ij represents disease severity for patient i at time j, with higher value denoting more severe status. In the second level latent trait regression model, we postulate

θ_{i j} = X_{i 0} β_{0} + u_{i 0} + (X_{i 1} β_{1} + u_{i 1}) t_{j},

(4)

where X_i₀ and X_i₁ are the covariates of interest associated with the disease severity, X_i₀ and X_i₁ can share part of or all the covariates. The variable t_j is the visit time with t₁ = 0 for baseline. The random effects u_i₀ and u_i₁ represent the subject-specific baseline disease severity and disease progression rate, respectively, and they follow normal distribution with mean 0 and variances 1 and $σ_{u}^{2}$ , respectively, and correlation coefficient ρ. The regression parameter vectors β₀ and β₁ represent the covariate effects on the baseline disease severity and disease progression rate, respectively. For example, if θ_ij = β₀₁x_i + u_i₀ + [β₁₀ + β₁₁x_i + u_i₁]t_j, where x_i is an indicator variable of treatment (1 if treatment, 0 otherwise), then β₀₁ is the baseline group difference, and β₁₀ and β₁₀ + β₁₁ are the disease progression rates for the placebo and treatment patients, respectively. The negative significant variable β₁₁ indicates that the treatment is efficacious in slowing down the disease progression. Note that IRT models are over-parameterized because they have more parameters than can be estimated from the data.¹⁹ Additional constraints are usually required to make models identifiable. In the aforementioned models, we set Var[u_i₀] = 1 to obtain Var[θ_ij] = 1 at t = 0 (baseline) to make the discrimination parameter b_k identifiable.

One key assumption in the MLIRT model is that all measurements from each patient are independent conditioning on the random effect vector u_i = (u_i₀, u_i₁)′.¹⁹ The conditional likelihood of the multiple longitudinal outcomes for patient i is

L_{y} (y_{i} ∣ u_{i}) = \prod_{j = 1}^{J} \prod_{k = 1}^{K} p (y_{ijk} ∣ u_{i}),

(5)

where p(y_ijk|u_i) is the conditional density function of y_ijk obtained from Models (1)–(4). Under the Cox proportional hazard sub-model, the hazard of having a terminal event at time t_i is

h (t_{i}) = h_{0} (t_{i}) exp (X_{i} γ + ν_{0} u_{i 0} + ν_{1} u_{i 1}),

(6)

where ν₀ and ν₁ measure the association between the two sub-models. Two sub-models are linked together via the shared random effects u_i₀ and u_i₁, which is a popular approach in joint modeling.^1,3 The covariate vector X_i can be the same or different from X_i₀ and X_i₁. We have selected piecewise constant function to approximate the baseline hazard function h₀(t) because models using a piecewise constant baseline hazard yield good estimators for both fixed effects and frailty,^23,24 although fixed cut points need to specified a priori. Given a set of fixed time points 0 = τ₀ < τ₁ < · · · < τ_m, and the baseline hazard vector g = (g₀, g₁, …, g_m₋₁), we define the piecewise constant hazard function as $h_{0} (t) = \sum_{l = 0}^{m - 1} g_{l} I_{l} (t)$ , with indicator function I_l (t) = 1 if τ_l ≤ t < τ_l₊₁ and 0 otherwise. The likelihood of event outcome t_i and δ_i for patient i is

L_{s} (t_{i}, δ_{i} ∣ u_{i}) = h {(t_{i})}^{δ_{i}} S (t_{i}),

(7)

where the survival function $S (t_{i}) = exp [- \int_{0}^{t_{i}} h (s) d s]$ . Conditional on the random effect vector u_i, y_i is assumed to be independent of t_i. The full likelihood of the joint model for patient i is

p (y_{i}, t_{i}, δ_{i}, u_{i}) = L_{y} (y_{i} ∣ u_{i}) L_{s} (t_{i}, δ_{i} ∣ u_{i}) p (u_{i}),

(8)

where p(u_i) is the density function of u_i. For notation convenience, we let the difficulty parameter vector be $a = {(a_{1}^{'}, \dots, a_{k}^{'}, \dots, a_{K}^{'})}^{'}$ , with a_k being numeric for binary and continuous outcomes and a_k = (a_k₁, …, a_{kn_k−1})′ for ordinal outcomes. Let the discrimination vector be b = (b₁, …, b_K)′ and $β = {(β_{0}^{'}, β_{1}^{'})}^{'}$ . The unknown parameter vector Φ = (a′, b′, β′, γ′, σ_u, ρ, σ_k, ν₀, ν₁, g′)′. We refer to the proposed joint modeling framework (8) as joint model. We refer to as reduced model, the model assuming the occurrence of the terminal event is independent to the longitudinal outcomes (i.e. ν₀ = ν₁ = 0).

2.2 Bayesian Estimation and Model Selection

We develop a fully Bayesian approach via the MCMC method to estimate the unknown parameters. The model fitting is implemented using the BUGS language. Vague prior distributions are imposed on all parameters. Specifically, a normal distribution N(0, 100) is used for all components in a, β, and γ and for ν₀ and ν₁. We let all components in b and g have Uniform[0, 20] as prior distribution to ensure non-negativity. To satisfy the order constraint of a_k for the ordinal outcome with n_k categories, we let a_k₁ ~ N(0, 100), and a_kl = a_k_,_l₋₁ + ω_l for l = 2, …, n_k − 1, with ω_l ~ N(0, 100)I(0, ), i.e. normal distribution left truncated at 0. We use the prior distributions σ_k ~ Gamma(0.01, 0.01) and ρ ~ Uniform[−1, 1]. Multiple chains with over-dispersed initial values are run to analyze data and the Gelman–Rubin diagnostic²⁵ is used to ensure the scale reduction R̂ of all parameters are smaller than 1.1. Moreover, we use the trace plots and autocorrelation functions²⁵ to ensure the chain convergence.

We have adopted two model selection criteria, i.e. Deviance Information Criterion (DIC)²⁶ and Bayes factor (BF).²⁷ The deviance statistics is defined as D(θ) = −2 log f(y|θ) + 2 log h(y), where f(y|θ) is the likelihood function for the observed data y given the parameter vector θ, and h(y) is some standardizing function of the data alone. The DIC is defined as DIC = D̄ + p_D, where D̄ = E_θ_|_y[D(θ)] is the posterior expectation of the deviance, D(θ̄) = D(E_θ_|_y[θ]) is the deviance evaluated at the posterior mean of parameters, and p_D = D̄ − D(θ̂) is the effective number of parameters, which captures model complexity. A smaller DIC indicates a better fit when comparing models.

BFs is a Bayesian alternative to p values for testing hypotheses and for quantifying the degree to which observed data support or conflict with a hypothesis. Let two competing models be M₁ and M₂. The BF in favor of model M₁ over M₂ is defined as:

BF (M_{1}; M_{2}) = \frac{p (M_{1} ∣ y) / p (M_{2} ∣ y)}{p (M_{1}) / p (M_{2})} = \frac{p (y ∣ M_{1})}{p (y ∣ M_{2})},

(9)

where p(M_i) is the prior probability of model M_i, where i = 1, 2, p(M_i|y) is the posterior probability of model M_i, and p(y|M_i) is the predictive probability of observing y under model M_i, and p(y|M_i) = ∫ f(y|θ_i, M_i) p(θ_i|M_i)dθ_i, where p(θ_i|M_i) is the prior distribution for parameter vector θ_i under model M_i. When the BF is greater than 100, decisive evidence is shown in favor of model M₁. To avoid the integral involved in computation of BF, the Laplace–Metropolis estimator based on the normal distribution²⁸ is adopted to approximate the predictive probability. Specifically, p(y|M_i) ≈ (2π)^d_i/2|Σ_i|^1/2f(y|θ̄_i, M_i) p(θ̄_i|M_i), where d_i is the number of the parameters in θ_i, Σ_i is the posterior covariance matrix of θ_i, θ̄_i is the posterior mean of parameters, p(θ̄_i|M_i) is the prior probability of parameters evaluated at θ̄_i, and f(y|θ̄_i, M_i) is the likelihood when parameters are at the posterior mean values.

3 Application

Our work is motivated by the Deprenyl And Tocopherol Antioxidative Therapy of Parkinsonism (DATATOP) study. DATATOP was a double-blind, placebo-controlled multicenter clinical trial to determine whether deprenyl or tocopherol, alone or in combination, administered to patients with early PD will prolong the time until dopaminergic therapy to treat emerging disability.²⁹ Totally 800 patients were randomly assigned in a 2 × 2 factorial design to receive double-placebo, active tocopherol alone, active deprenyl alone, and both active tocopherol and deprenyl. In this article, we investigate the effect of tocopherol and we define the placebo group as patients who did not receive tocopherol (double-placebo and active deprenyl alone groups, 401 patients), and the treatment group as patients who received tocopherol (active tocopherol alone and both active tocopherol and deprenyl groups, 399 patients). The longitudinal outcomes are Unified Parkinson’s Disease Rating Scale (UPDRS) total score, Schwab and England activities of daily living (SEADL), Mini-Mental State Exam (MMSE), and Hamilton rating scale for depression (HRSD) collected at baseline, months 1, 3, 9, and 15. UPDRS total score evaluates patients’ mentation, behavior, activities of daily living, and motor function. It is an approximate continuous variable with integer value from 0 (normal) to 176 (severe).³⁰ SEADL is a measurement of activities of daily living and it is an ordinal variable with integer value from 0 (severe) to 100 (normal) incrementing by 5.³¹ MMSE measures patients’ cognitive impairment and it is an ordinal variable with integer value from 0 (severe) to 30 (normal). HRSD, a depression test measuring the severity of clinical depression symptoms, is an ordinal variable with integer value from 0 (normal) to 52 (severe). During the course of the study, 192 and 184 patients in the placebo and treatment groups, respectively, reached a level of functional disability sufficient to warrant the initiation of dopaminergic therapy, which is a symptomatic therapy to provide temporary relief of PD symptoms. In this case, only the observed outcomes before the initiation of dopaminergic therapy can be used in the assessment of treatment efficacy because dopaminergic therapy can significantly change the values of the outcomes for a short period. Therefore, these individuals would have missing data after the initiation of dopaminergic therapy. Figure 1 displays the mean UPDRS, SEADL, MMSE, and HRSD measurements over time for DATATOP patients with follow-up time less than 6 months (dotted line), 6–12 months (dashed line), and more than 12 months (solid line). Patients with shorter follow-up time tend to have higher UPDRS and HRSD values and lower SEADL and MMSE values, indicating worse clinical outcomes. This phenomenon suggests the existence of association between the longitudinal outcomes and the time to dopaminergic therapy.

Mean longitudinal measures over time. Follow-up time: less than 6 months (dotted line), 6–12 months (dashed line), and more than 12 months (solid line).

To analyze the DATATOP dataset, we have recoded the outcomes SEADL and MMSE so that higher values in all outcomes are worse clinical conditions. Moreover, we combine some categories in the outcomes SEADL, MMSE, and HRSD with zero or small number of individuals so that they have 7, 7, and 10 categories, respectively. The median follow-up time is 14 months (range: 0–25 months). We first perform the Schoenfeld residual test, the non-significant result (p = 0.43) indicates the validity of the proportionality assumption. To use the MLIRT sub-model, we let X_i₀ = 0 and consider the treatment variable x_i (1 treatment, and 0 if placebo) as the only covariate in X_i₁. Hence, the level 2 model (4) is θ_ij = u_i₀ + (β₁₀ + β₁₁x_i + u_i₁)t_ij, with visit time being t_ij = (0, 1, 3, 9, 15) and the random effects u_i₀ and u_i₁ representing the subject-specific baseline disease severity and disease progression rate, respectively. The survival time is time to the initiation of dopaminergic therapy. The treatment variable is the single covariate in the Cox sub-model so that h(t_i) = h₀(t_i) exp(γx_i + ν₀u_i₀ + ν₁u_i₁) in Model (6).

For model selection and comparison, we compute the DIC and BF illustrated in Section 2.2. The joint model has smaller DIC (53,168), comparing with 53,502 from the reduced model. The BF in favor of the joint model over the reduced model is much larger than 100, indicating decisive evidence in favor of the joint model according to the interpretation proposed by Kass and Raftery.²⁷ Table 1 compares the posterior mean, standard deviation (SD), and 95% equal-tail credible intervals from the reduced and the best fit joint models. The results from the joint model indicate that the placebo patients have significant disease progression at the rate of 0.392 units per month (β̂₁₀, 95% CI: [0.343, 0.446]). In comparison, the treatment patients have disease progression rate of 0.345 units per month (β̂₁₀ + β̂₁₁, 95% CI: [0.237, 0.461]) with insignificant tocopherol treatment effect of slowing down the disease progression rate by −0.047 per month (β̂₁₁, 95% CI: [−0.106, 0.015]). Moreover, tocopherol decreases the hazard of the initiation of dopaminergic therapy by 5% (γ̂ = −0.054, 1 − exp(−0.054) = 0.05, 95% CI: [−0.27,0.24]). The insignificant tocopherol effect is consistent with Shoulson.²⁹ We observe that ν̂₀ and ν̂₁ are positive and significantly different from zero, (ν̂₀ = 0.348, 95% CI: [0.144, 0.511], and ν̂₂ = 3.854, 95% CI: [2.497, 5.642]), suggesting that the patients with worse baseline disease severity (larger u_i₀) and faster disease progression rate (larger u_i₁) tend to have higher hazard of need for dopaminergic therapy and vice versa. Both the reduced and joint models give similar estimates to the outcome-specific parameters (a and b

Table 1.

Parameter estimations from joint and reduced modeling and model comparison based on DATATOP trial.

Parameters	Reduced model			Joint model
Parameters	Mean	SD	95% CI	Mean	SD	95% CI
For longitudinal outcomes
β₁₀	0.307	0.022	0.264, 0.351	0.392	0.026	0.343, 0.446
β₁₁	−0.051	0.030	−0.108, 0.008	−0.047	0.031	−0.106, 0.015
P	0.297	0.070	0.162, 0.439	0.415	0.062	0.294, 0.535
σ_u	0.241	0.017	0.209, 0.275	0.287	0.023	0.244, 0.334
For survival
Γ	−0.036	0.100	−0.237, 0.154	−0.054	0.138	−0.321, 0.216
ν₀				0.348	0.093	0.144, 0.511
ν₁				3.854	0.804	2.497, 5.642

Open in a new tab

To visualize the difference in the disease progression rates in two groups, Figure 2 displays the estimates of the latent disease severity θ_ij of 100 randomly selected patient at each visit, together with the lowess smooth curves (based on all patients) denoted by the dashed (placebo group) and solid (treatment group) lines, respectively. Figure 2 suggests that two groups have similar disease progression rate before month 9 and the placebo patients deteriorate at a slightly faster rate starting from month 9, as manifested by the departure of two curves.

Estimates of the subject-specific disease severity 1 θ_ij at each visit and the lowess curve for two groups.

Table 1 also shows positive correlation coefficient ρ between u_i₀ and u_i₁ (0.415, 95% CI: [0.294,0.535]), suggesting that the patient with worse baseline disease severity tend to have faster disease deterioration and vice versa. To obtain more insight into u_i₀, u_i₁, and ρ, we plot in Figure 3 u_i₀ (upper panel) and u_i₁ (lower panel) with their 95% credible intervals. Patients are sorted so that patients at the left have milder disease at baseline and slower disease progression rate (larger ranks), while patients at the right have more severe disease at baseline and faster disease progression rate (smaller ranks). For clarity purpose, only patients with the smallest 100 and the largest 100 ranks are displayed in the figure. We use two patients as an example to illustrate the effect of ρ. Patient 551 has the worst baseline disease severity and he/she ranks No. 8 in the disease progression rate. Patient 528 has the fastest disease progression rate and he/she ranks No. 5 in the baseline disease severity.

The ranking of subject-specific baseline disease severity (upper panel) and disease progression rate (lower panel) with point estimates and 95% CI. The numbers in the figures are patient numbers.

4 Simulation

In this section, we conduct two simulation studies to compare the performance of the proposed joint model and the reduced model. In the first simulation study, there is a strong correlation between the survival time and the longitudinal outcome (i.e. ν₀ = 0.4, ν₁ = 1), whereas in the second simulation study, there is no correlation (i.e. ν₀ = ν₁ = 0). The simulated datasets have a data structure and parameters similar to the DATATOP study. In each simulation study, we simulate 500 datasets with sample size N = 800 (400 in both treatment and placebo groups).

We simulate one continuous (y_ij₁) and three ordinal (denoted by y_ij₂, y_ij₃, and y_ij₄ with 7, 7, and 10 categories, respectively) outcomes at five visits (e.g. baseline, months 1, 3, 9, 15). Treatment variable (x_i = 1 if treatment, and 0 if placebo) is the only covariate under consideration and we assume that the treatment is effective. The level 2 model (4) is θ_ij = u_i₀ + (β₁₀ + β₁₁x_i + u_i₁)t_ij, with visit time being t_ij = (0, 1, 3, 9, 15), and the Cox sub-model (6) is h(t_i) = h₀(t_i) exp(γx_i + ν₀u_i₀ + ν₁u_i₁). We set β₁₀ = 0.4, β₁₁ = −0.5, and γ = −0.7. Note that β₁₁ is negative so that we expect the treated patients to have smaller θ_ij and better clinical status. Similarly, γ is negative so that the treated patients are expected to have smaller event hazard at any specific time. We simulate random effects u_i = (u_i₀, u_i₁)′ ~ N₂(0, Σ), where $\sum = ((1, ρ σ_{u}), (ρ σ_{u}, σ_{u}^{2}))$ and ρ = 0.4, σ_u = 1.3. For continuous outcome y_ij₁, we set a₁ = 25, b₁ = 10 and σ₁ = 5, and simulate from $N (a_{1} + b_{1} θ_{i j}, σ_{1}^{2})$ . For ordinal outcomes, we let a₂ = (−2.7, − 0.6, 2, 2.8, 5, 6), b₂ = 2, a₃ = (−0.1, 1, 1.8, 2.6, 3.3, 4), b₃ = 0.4, a₄ = (−1, − 0.1, 0.5, 1, 1.5, 2, 2.4, 2.8, 3.3), b₄ = 0.7, and use Model (2) to obtain the probability of being in each category for each ordinal outcome at every visit. Then, three ordinal outcomes are simulated from multinomial distributions.

The time to terminal event is simulated from the Cox sub-model with a piecewise constant baseline hazard function. Given a set of fixed time points 0 = τ₀ < τ₁ < · · · < τ_m and the baseline hazard vector g = (g₀, g₁, …, g_m₋₁), we define the piecewise constant baseline hazard function as $h_{0} (t) = \sum_{l = 0}^{m - 1} g_{l} I_{l} (t)$ , with I_l (t) = 1 if τ_l ≤ t ≤ τ_l₊₁. For a given interval τ_a ≤ t_i< τ_a₊₁ with a = 0, …, m − 1, the survival function is $S (t_{i}) = exp {[- \sum_{l = 0}^{a - 1} g_{l} (τ_{l + 1} - τ_{l}) - g_{a} (t_{i} - τ_{a})] \times exp (X_{i} γ + ν_{0} u_{i 0} + ν_{1} u_{i 1})}$ . To solve this equation for t_i, we have

t_{i} = τ_{a} - \frac{log [S (t_{i})]}{g_{a} exp (X_{i} γ + ν_{0} u_{i 0} + ν_{1} u_{i 1})} = \frac{\sum_{l = 0}^{a - 1} g_{l} (τ_{l + 1} - τ_{l})}{g_{a}} .

(10)

The condition τ_a ≤ t_i< τ_a₊₁ imposes the following constraint: $- exp (X_{i} γ + ν_{0} u_{i 0} + ν_{1} u_{i 1}) \times \sum_{l = 0}^{a} g_{l} (τ_{l + 1} - τ_{l}) < log [S (t_{i})] \leq - exp (X_{i} γ + ν_{0} u_{i 0} + ν_{1} u_{i 1}) \sum_{l = 0}^{a - 1} g_{l} (τ_{l + 1} - τ_{l})$ . To generate the event time t_i, we set the piecewise baseline hazard vector g = (0.01, 0.05, 0.13) at the fixed time points τ = (0, 8, 13, 30). We generate the censoring time from Uniform[10, 20] and δ_i = 1 if the event time generated from equation (10) is not larger than the censoring time.

In each simulation study, we run two parallel MCMC chains with over-dispersed initial values. Each chain is run for 10,000 iterations. The first 5000 iterations are discarded as burn-in, and the remaining 5000 samples are used to obtain the posterior distribution of the parameters. We have computed the bias (the average of the posterior means minus the true values), standard error (SE, the square root of the average of the posterior variance), SD (the standard deviation of the posterior means), and coverage probabilities (CPs) of 95% equal-tail credible intervals from the reduced and joint models.

Table 2 displays the results from the first simulation study in which the occurrence of the terminal event is strongly correlated with the longitudinal outcomes. The joint model generally provides estimates with negligible bias, SE close to SD, and the CPs reasonably close to 0.95. We notice that the CP of ν₁ is slightly off from 0.95, indicating some difficulty in distinguishing the random effects as reported in Henderson et al.¹ These results suggest that the joint model can generally recover the true values in the presence of dependent terminal event. In contrast, the reduced model gives severely biased estimates and the CPs are far away from the nominal value. Specifically, the treatment effect parameter β₁₁ is biased toward zero and it is thus less likely to detect the treatment effect if the treatment effect is present. Because the parameters ν₀ and ν₁ are set to be positive, the patients with worse baseline disease severity (larger u_i₀) and faster disease progression rate (larger u_i₁) tend to have a terminal event earlier. By ignoring this phenomenon and treating the missing data after the terminal event as missing at random, the reduced model tends to reduce the difference between two groups and therefore underestimate the treatment effect. This finding is consistent with the literature of the univariate longitudinal data analysis with dependent dropout.³² In addition, both models provide reasonable estimates to the difficulty and discriminating parameter vectors a and b

Table 2.

Simulation results from the reduced and joint models when the terminal event is dependent on the longitudinal outcomes.

Parameters	Reduced model				Joint model
Parameters	Bias	SE	SD	CP	Bias	SE	SD	CP
For longitudinal outcomes
β₁₀ = 0.4	−0.193	0.065	0.065	0.178	0.027	0.073	0.079	0.906
β₁₁ = −0.5	0.068	0.089	0.089	0.862	−0.011	0.095	0.096	0.944
ρ = 0.4	−0.033	0.037	0.036	0.860	0.006	0.036	0.038	0.902
σ_u = 1.3	−0.099	0.047	0.045	0.466	0.022	0.054	0.053	0.930
For survival
γ = −0.7	0.209	0.112	0.115	0.536	−0.072	0.151	0.152	0.910
ν₀ = 0.4					−0.001	0.075	0.075	0.934
ν₁ = 1.0					0.070	0.090	0.096	0.896

Open in a new tab

Table 3 displays the results from the second simulation study in which the reduced model is the correct model. The reduced model provides estimates with small bias and CPs reasonably close to the nominal level. The results indicate that the reduced model can successfully recover the true values under independent terminal event. In comparison, under model overparameterization, the results from the joint model still have reasonably small bias, CPs close to nominal level, and it does not inflate the SEs. The estimates of the parameters ν₀ and ν₁ are correctly close to zero, suggesting that the joint model is still a reasonable model even when it is overparameterized. Moreover, both models provide reasonable estimates to the difficulty and discriminating parameter vectors a and b

Table 3.

Simulation results from the reduced and joint models when the terminal event is independent on the longitudinal outcomes.

Parameters	Reduced model				Joint model
Parameters	Bias	SE	SD	CP	Bias	SE	SD	CP
For longitudinal outcomes
β₁₀ = 0.4	0.002	0.068	0.070	0.932	0.001	0.068	0.069	0.930
β₁₁ = −0.5	−0.003	0.092	0.095	0.944	−0.001	0.093	0.096	0.946
ρ = 0.4	0.004	0.034	0.035	0.934	0.005	0.034	0.035	0.934
σ_u = 1.3	0.003	0.048	0.049	0.946	0.001	0.048	0.048	0.954
For survival
γ =−0.7	−0.028	0.129	0.128	0.944	−0.009	0.141	0.138	0.950
ν₀ = 0					−0.009	0.075	0.078	0.940
ν₁ = 0					0.006	0.061	0.062	0.926

Open in a new tab

In conclusion, the simulation results suggest that in the presence of dependent terminal event, the joint model provide more accurate estimates for the MLIRT and Cox regression parameters and the random effects parameters. Under independent terminal event, the joint model provides results comparable with the reduced model.

5 Discussion

In clinical trials, it is quite common to have longitudinal outcomes subject to dependent terminal event. Previous work of joint modeling for this type of data has been mainly focused on a single longitudinal outcome accounting for the dependent censoring. In this article, we have proposed a joint modeling framework to jointly analyze the multiple longitudinal data subject to dependent terminal event using the MLIRT sub-model and the Cox proportional hazard sub-model. Two sub-models are linked together via shared random effects representing the subject-specific baseline disease severity and disease progression rate, respectively. The proposed joint model has a better fit than the reduced model in the analysis of the DATATOP dataset. We have found that the treatment tocopherol is insignificant in slowing the PD disease progression. Moreover, we have identified a significant positive correlation between the multiple longitudinal outcomes and the terminal event, in addition to the positive significant correlation between the baseline disease severity and disease progression rate. The simulation studies have shown that in the presence of dependent terminal event, the joint model successfully recovers the true parameters whereas the reduced model underestimates the treatment effect and has large bias in the regression and random effects parameters. Under the scenario of independent terminal event, the joint model provides results comparable with the reduced model.

Our method can be extended to robust inference to handle outlying observations in the longitudinal outcomes. One direction is to relax the normality assumption for the random errors of the continuous outcome to some long-tailed or heavy-tailed distributions, e.g. normal/ independent distributions,³³ skew-normal independent distributions,³⁴ and generalized skew-elliptical distributions.³⁵ Another issue is about the assumption of homogeneous random covariance matrix (the matrix is the same for all subjects). Accounting for heterogeneity in random covariance matrix has been investigated in generalized linear models,³⁶ non-linear mixed models,³⁷ and linear mixed models.³⁸ The use of the heterogenous random covariance matrix in the joint modeling framework of the MLIRT models warrants further investigation.

Acknowledgments

This work was supported by two NIH/NINDS grants U01NS043127 and U01NS43128. Computations were performed on the high-performance computational capabilities of the Linux cluster system at University of Texas School of Public Health (UTSPH). The authors express appreciation to UTSPH information technology staff for their technical support of the cluster.

Footnotes

Reprints and permissions: sagepub.co.uk/journalsPermissions.nav

References

1.Henderson R, Diggle P, Dobson A. Joint modelling of measurements and event time data. Biostatitics. 2000;1:465–480. doi: 10.1093/biostatistics/1.4.465. [DOI] [PubMed] [Google Scholar]
2.Tsiatis A, Davidian M. Joint modelling of longitudinal and time-to-event data: an overview. Stat Sin. 2004;14:809–834. [Google Scholar]
3.Wulfsohn M, Tsiatis A. A joint model for survival and longitudinal data measured with error. Biometrics. 1997;53:330–339. [PubMed] [Google Scholar]
4.Brown E, Ibrahim J, DeGruttola V. A flexible B-Spline model for multiple longitudinal biomarkers and survival. Biometrics. 2005;61:64–73. doi: 10.1111/j.0006-341X.2005.030929.x. [DOI] [PubMed] [Google Scholar]
5.Rizopoulos D, Ghosh P. A Bayesian semiparametric multivariate joint model for multiple longitudinal outcomes and a time-to-event. Stat Med. 2011;30:1366–1380. doi: 10.1002/sim.4205. [DOI] [PubMed] [Google Scholar]
6.Brown E, Ibrahim J. Bayesian approaches to joint cure-rate and longitudinal models with applications to cancer vaccine trials. Biometrics. 2003;59:686–693. doi: 10.1111/1541-0420.00079. [DOI] [PubMed] [Google Scholar]
7.Elashoff R, Li G, Li N. An approach to joint analysis of longitudinal measurements and competing risks failure time data. Stat Med. 2007;26:2813–2835. doi: 10.1002/sim.2749. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.O’Brien L, Fitzmaurice G. Analysis of longitudinal multiple-source binary data using generalized estimating equations. J R Stat Soc Ser C Appl Stat. 2004;53:177–193. [Google Scholar]
9.Wang C, Douglas J, Anderson S. Item response models for joint analysis of quality of life and survival. Stat Med. 2002;21:129–142. doi: 10.1002/sim.989. [DOI] [PubMed] [Google Scholar]
10.Glas C, Geerlings H, Van de laar M, et al. Analysis of longitudinal randomized clinical trials using item response models. Contemp Clin Trials. 2008;30:158–170. doi: 10.1016/j.cct.2008.12.003. [DOI] [PubMed] [Google Scholar]
11.Huang L, Wang W. The generalized multilevel facets model for longitudinal data. J Educ Behav Stat. 2012;37:231–255. [Google Scholar]
12.Wang W, Liu C. Formulation and application of the generalized multilevel facets model. Educ Psychol Meas. 2007;67:583–605. [Google Scholar]
13.Bacci S, Caviezel V. Multilevel IRT models for the university teaching evaluation. J Appl Stat. 2011;38:2775–2791. [Google Scholar]
14.Douglas J. Item response models for longitudinal quality of life data in clinical trials. Stat Med. 1999;18:2917–2931. doi: 10.1002/(sici)1097-0258(19991115)18:21<2917::aid-sim204>3.0.co;2-n. [DOI] [PubMed] [Google Scholar]
15.Andrade D, Tavares H. Item response theory for longitudinal data: population parameter estimation. J Multivariate Anal. 2005;95:1–22. [Google Scholar]
16.Maier K. A Rasch hierarchical measurement model. J Educ Behav Stat. 2001;26:307–330. [Google Scholar]
17.Kamata A. Item analysis by the hierarchical generalized linear model. J Educ Meas. 2001;38:79–93. [Google Scholar]
18.Mislevy R. Estimation of latent group effects. J Am Stat Assoc. 1985;80:993–997. [Google Scholar]
19.Fox J. Bayesian item response modeling: theory and applications. New York, USA: Springer-Verlag; 2010. [Google Scholar]
20.Skrondal A, Rabe-Hesketh S. Generalized latent variable modeling: multilevel, longitudinal, and structural equation models. Boca Raton, FL: CRC Press; 2004. [Google Scholar]
21.Skrondal A, Rabe-Hesketh S. Latent variable modelling: a survey. Scand J Stat. 2007;34:712–745. [Google Scholar]
22.Lord F, Novick M, Birnbaum A. Statistical theories of mental test scores. Boston, MA: Addison-Wesley; 1968. [Google Scholar]
23.Lawless J, Zhan M. Analysis of interval-grouped recurrent-event data using piecewise constant rate functions. Canad J Stat. 1998;26:549–565. [Google Scholar]
24.Feng S, Wolfe R, Port F. Frailty survival model analysis of the national deceased donor kidney transplant dataset using Poisson variance structures. J Am Stat Assoc. 2005;100:728–735. [Google Scholar]
25.Gelman A, Carlin J, Stern H, et al. Bayesian data analysis. Boca Raton, FL: CRC Press; 2004. [Google Scholar]
26.Spiegelhalter D, Best N, Carlin B, et al. Bayesian measures of model complexity and fit. J R Stat Soc Ser B Stat Methodol. 2002;64:583–639. [Google Scholar]
27.Kass R, Raftery A. Bayes factors. J Am Stat Assoc. 1995;90:773–795. [Google Scholar]
28.Lewis S, Raftery A. Estimating Bayes factors via posterior simulation with the Laplace–Metropolis estimator. J Am Stat Assoc. 1997;92:648–655. [Google Scholar]
29.Shoulson I. DATATOP: a decade of neuroprotective inquiry. Parkinson Study Group. Deprenyl and Tocopherol Antioxidative Therapy of Parkinsonism. Ann Neurol. 1998;44:S160–S166. [PubMed] [Google Scholar]
30.Bushnell D, Martin M. Quality of life and Parkinson’s disease: translation and validation of the US Parkinson’s disease questionnaire (PDQ-39) Qual Life Res. 1999;8:345–350. doi: 10.1023/a:1008979705027. [DOI] [PubMed] [Google Scholar]
31.McRae C, Diem G, Vo A, et al. Schwab & England: standardization of administration. Movement Disord. 2000;15:335–336. doi: 10.1002/1531-8257(200003)15:2<335::aid-mds1022>3.0.co;2-v. [DOI] [PubMed] [Google Scholar]
32.Touloumi G, Babiker AG, Pocock SJ, et al. Impact of missing data due to drop-outs on estimators for rates of change in longitudinal studies: a simulation study. Stat Med. 2001;20:3715–3728. doi: 10.1002/sim.1114. [DOI] [PubMed] [Google Scholar]
33.Lange K, Sinsheimer J. Normal/independent distributions and their applications in robust regression. J Comput Graph Stat. 1993;2:175–198. [Google Scholar]
34.Lachos V, Ghosh P, Arellano-Valle R. Likelihood based inference for skew-normal independent linear mixed models. Stat Sin. 2010;20:303. [Google Scholar]
35.Genton M, Loperfido N. Generalized skew-elliptical distributions and their quadratic forms. Ann Inst Stat Math. 2005;57:389–401. [Google Scholar]
36.Chiu T, Leonard T, Tsui K. The matrixlogarithmic covariance model. J Am Stat Assoc. 1996;91:198–210. [Google Scholar]
37.Davidian M, Giltinan D. Nonlinear models for repeated measurement data. Vol. 62. Boca Raton, FL: Chapman & Hall/CRC; 1995. [Google Scholar]
38.Pourahmadi M, Daniels M. Dynamic conditionally linear mixed models for longitudinal data. Biometrics. 2002;58:225–231. doi: 10.1111/j.0006-341x.2002.00225.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] 1.Henderson R, Diggle P, Dobson A. Joint modelling of measurements and event time data. Biostatitics. 2000;1:465–480. doi: 10.1093/biostatistics/1.4.465. [DOI] [PubMed] [Google Scholar]

[R2] 2.Tsiatis A, Davidian M. Joint modelling of longitudinal and time-to-event data: an overview. Stat Sin. 2004;14:809–834. [Google Scholar]

[R3] 3.Wulfsohn M, Tsiatis A. A joint model for survival and longitudinal data measured with error. Biometrics. 1997;53:330–339. [PubMed] [Google Scholar]

[R4] 4.Brown E, Ibrahim J, DeGruttola V. A flexible B-Spline model for multiple longitudinal biomarkers and survival. Biometrics. 2005;61:64–73. doi: 10.1111/j.0006-341X.2005.030929.x. [DOI] [PubMed] [Google Scholar]

[R5] 5.Rizopoulos D, Ghosh P. A Bayesian semiparametric multivariate joint model for multiple longitudinal outcomes and a time-to-event. Stat Med. 2011;30:1366–1380. doi: 10.1002/sim.4205. [DOI] [PubMed] [Google Scholar]

[R6] 6.Brown E, Ibrahim J. Bayesian approaches to joint cure-rate and longitudinal models with applications to cancer vaccine trials. Biometrics. 2003;59:686–693. doi: 10.1111/1541-0420.00079. [DOI] [PubMed] [Google Scholar]

[R7] 7.Elashoff R, Li G, Li N. An approach to joint analysis of longitudinal measurements and competing risks failure time data. Stat Med. 2007;26:2813–2835. doi: 10.1002/sim.2749. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.O’Brien L, Fitzmaurice G. Analysis of longitudinal multiple-source binary data using generalized estimating equations. J R Stat Soc Ser C Appl Stat. 2004;53:177–193. [Google Scholar]

[R9] 9.Wang C, Douglas J, Anderson S. Item response models for joint analysis of quality of life and survival. Stat Med. 2002;21:129–142. doi: 10.1002/sim.989. [DOI] [PubMed] [Google Scholar]

[R10] 10.Glas C, Geerlings H, Van de laar M, et al. Analysis of longitudinal randomized clinical trials using item response models. Contemp Clin Trials. 2008;30:158–170. doi: 10.1016/j.cct.2008.12.003. [DOI] [PubMed] [Google Scholar]

[R11] 11.Huang L, Wang W. The generalized multilevel facets model for longitudinal data. J Educ Behav Stat. 2012;37:231–255. [Google Scholar]

[R12] 12.Wang W, Liu C. Formulation and application of the generalized multilevel facets model. Educ Psychol Meas. 2007;67:583–605. [Google Scholar]

[R13] 13.Bacci S, Caviezel V. Multilevel IRT models for the university teaching evaluation. J Appl Stat. 2011;38:2775–2791. [Google Scholar]

[R14] 14.Douglas J. Item response models for longitudinal quality of life data in clinical trials. Stat Med. 1999;18:2917–2931. doi: 10.1002/(sici)1097-0258(19991115)18:21<2917::aid-sim204>3.0.co;2-n. [DOI] [PubMed] [Google Scholar]

[R15] 15.Andrade D, Tavares H. Item response theory for longitudinal data: population parameter estimation. J Multivariate Anal. 2005;95:1–22. [Google Scholar]

[R16] 16.Maier K. A Rasch hierarchical measurement model. J Educ Behav Stat. 2001;26:307–330. [Google Scholar]

[R17] 17.Kamata A. Item analysis by the hierarchical generalized linear model. J Educ Meas. 2001;38:79–93. [Google Scholar]

[R18] 18.Mislevy R. Estimation of latent group effects. J Am Stat Assoc. 1985;80:993–997. [Google Scholar]

[R19] 19.Fox J. Bayesian item response modeling: theory and applications. New York, USA: Springer-Verlag; 2010. [Google Scholar]

[R20] 20.Skrondal A, Rabe-Hesketh S. Generalized latent variable modeling: multilevel, longitudinal, and structural equation models. Boca Raton, FL: CRC Press; 2004. [Google Scholar]

[R21] 21.Skrondal A, Rabe-Hesketh S. Latent variable modelling: a survey. Scand J Stat. 2007;34:712–745. [Google Scholar]

[R22] 22.Lord F, Novick M, Birnbaum A. Statistical theories of mental test scores. Boston, MA: Addison-Wesley; 1968. [Google Scholar]

[R23] 23.Lawless J, Zhan M. Analysis of interval-grouped recurrent-event data using piecewise constant rate functions. Canad J Stat. 1998;26:549–565. [Google Scholar]

[R24] 24.Feng S, Wolfe R, Port F. Frailty survival model analysis of the national deceased donor kidney transplant dataset using Poisson variance structures. J Am Stat Assoc. 2005;100:728–735. [Google Scholar]

[R25] 25.Gelman A, Carlin J, Stern H, et al. Bayesian data analysis. Boca Raton, FL: CRC Press; 2004. [Google Scholar]

[R26] 26.Spiegelhalter D, Best N, Carlin B, et al. Bayesian measures of model complexity and fit. J R Stat Soc Ser B Stat Methodol. 2002;64:583–639. [Google Scholar]

[R27] 27.Kass R, Raftery A. Bayes factors. J Am Stat Assoc. 1995;90:773–795. [Google Scholar]

[R28] 28.Lewis S, Raftery A. Estimating Bayes factors via posterior simulation with the Laplace–Metropolis estimator. J Am Stat Assoc. 1997;92:648–655. [Google Scholar]

[R29] 29.Shoulson I. DATATOP: a decade of neuroprotective inquiry. Parkinson Study Group. Deprenyl and Tocopherol Antioxidative Therapy of Parkinsonism. Ann Neurol. 1998;44:S160–S166. [PubMed] [Google Scholar]

[R30] 30.Bushnell D, Martin M. Quality of life and Parkinson’s disease: translation and validation of the US Parkinson’s disease questionnaire (PDQ-39) Qual Life Res. 1999;8:345–350. doi: 10.1023/a:1008979705027. [DOI] [PubMed] [Google Scholar]

[R31] 31.McRae C, Diem G, Vo A, et al. Schwab & England: standardization of administration. Movement Disord. 2000;15:335–336. doi: 10.1002/1531-8257(200003)15:2<335::aid-mds1022>3.0.co;2-v. [DOI] [PubMed] [Google Scholar]

[R32] 32.Touloumi G, Babiker AG, Pocock SJ, et al. Impact of missing data due to drop-outs on estimators for rates of change in longitudinal studies: a simulation study. Stat Med. 2001;20:3715–3728. doi: 10.1002/sim.1114. [DOI] [PubMed] [Google Scholar]

[R33] 33.Lange K, Sinsheimer J. Normal/independent distributions and their applications in robust regression. J Comput Graph Stat. 1993;2:175–198. [Google Scholar]

[R34] 34.Lachos V, Ghosh P, Arellano-Valle R. Likelihood based inference for skew-normal independent linear mixed models. Stat Sin. 2010;20:303. [Google Scholar]

[R35] 35.Genton M, Loperfido N. Generalized skew-elliptical distributions and their quadratic forms. Ann Inst Stat Math. 2005;57:389–401. [Google Scholar]

[R36] 36.Chiu T, Leonard T, Tsui K. The matrixlogarithmic covariance model. J Am Stat Assoc. 1996;91:198–210. [Google Scholar]

[R37] 37.Davidian M, Giltinan D. Nonlinear models for repeated measurement data. Vol. 62. Boca Raton, FL: Chapman & Hall/CRC; 1995. [Google Scholar]

[R38] 38.Pourahmadi M, Daniels M. Dynamic conditionally linear mixed models for longitudinal data. Biometrics. 2002;58:225–231. doi: 10.1111/j.0006-341x.2002.00225.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Joint modeling of multivariate longitudinal measurements and survival data with applications to Parkinson’s disease

Bo He

Sheng Luo

Abstract

1 Introduction