Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2011 Feb 28.
Published in final edited form as: Stat Med. 2010 Feb 28;29(5):546–557. doi: 10.1002/sim.3798

Joint modeling of longitudinal ordinal data and competing risks survival times and analysis of the NINDS rt-PA stroke trial

Ning Li 1,*, Robert M Elashoff 2, Gang Li 2, Jeffrey Saver 3
PMCID: PMC2822130  NIHMSID: NIHMS156012  PMID: 19943331

SUMMARY

Existing joint models for longitudinal and survival data are not applicable for longitudinal ordinal outcomes with possible non-ignorable missing values caused by multiple reasons. We propose a joint model for longitudinal ordinal measurements and competing risks failure time data, in which a partial proportional odds model for the longitudinal ordinal outcome is linked to the event times by latent random variables. At the survival endpoint, our model adopts the competing risks framework to model multiple failure types at the same time. The partial proportional odds model, as an extension of the popular proportional odds model for ordinal outcomes, is more flexible and at the same time provides a tool to test the proportional odds assumption. We use a likelihood approach and derive an EM algorithm to obtain the maximum likelihood estimates of the parameters. We further show that all the parameters at the survival endpoint are identifiable from the data. Our joint model enables one to make inference for both the longitudinal ordinal outcome and the failure times simultaneously. In addition, the inference at the longitudinal endpoint is adjusted for possible non-ignorable missing data caused by the failure times. We apply the method to the NINDS rt-PA stroke trial. Our study considers the modified Rankin Scale only. Other ordinal outcomes in the trial, such as the Barthel and Glasgow scales can be treated in the same way.

1. INTRODUCTION

In clinical trials longitudinal ordinal outcomes are commonly encountered and quite often some observations are missing due to dropout or death. If the probability of dropout or death is related to the unobserved observations, the missing mechanism is often called missing not at random (MNAR) or non-ignorable [1]. One example is the clinical trial of intravenous recombinant tissue-plasminogen activator (rt-PA) in patients with acute stroke [2]. In this study, patients treated with rt-PA were compared with those given placebo to look for an improvement from baseline in the score on the Modified Rankin Scale, an ordinal measure of degree of disability with categories ranging from no symptoms, no significant disability to severe disability or death. During the follow-up patients could dropout, die or experience treatment failure. A treatment failure occurs if the patient remains in severe disability after treatment initiation. Both death and dropout could result in non-ignorable missing values in the Modified Rankin Scale because these events are highly related to the disease condition of the patients. The problem is further complicated by the fact that treatment failure, death and dropout are potentially correlated. It is suggested by the clinicians to use treatment failure and death to provide additional information on the treatment efficacy. In this trial we are interested in estimating the treatment effects on both the longitudinal measurements of the Modified Rankin Scale and the risk of treatment failure or death. The estimates need to be adjusted for possible non-ignorable missing data in Modified Rankin Scale and informative censoring of treatment failure or death by dropout.

Non-ignorable missing data problem in longitudinal studies has motivated a growing literature on joint analysis of the repeated measurements and the missing data mechanism. A great body of work exists for normal-distributed longitudinal measurements in the setting of linear mixed effects models or marginal models [3 - 9]. These were also extended to generalized longitudinal measurements with exponential family distributions [10, 11, 12]. However, the approaches cannot be used for longitudinal ordinal outcomes which are encountered very often in medical studies. There have been very limited efforts to extend the joint analysis to longitudinal ordinal measurements. Molenberghs, Kenward, and Lesaffre proposed a model for longitudinal ordinal data with nonrandom drop-out, which linked the multivariate Dale model for longitudinal ordinal data to a logistic regression model for drop-out [13]. A pattern-mixture model was developed by Kaciroti et. al to analyze clustered longitudinal ordinal data with non-ignorable missing values [14]. These methods assume finite, discrete missing data patterns and thus are not applicable to the aforementioned NINDS rt-PA stroke trial where the death time is continuous and there are multiple reasons leading to non-ignorable missing data. For the NINDS rt-PA stroke trial, a competing risks framework is essential to distinguish treatment failure/death from dropout because failure or death is an important clinical endpoint to evaluate the treatment efficacy in addition to the longitudinal measurements of Modified Rankin Scale. To the best of our knowledge, we are the first to consider competing risks failure times to deal with possible non-ignorable missing values in the longitudinal ordinal measurements.

In this article we formulate a joint model which consists of the following two components: (1) a partial proportional odds model for the longitudinal ordinal outcome, which extends the model proposed by Peterson and Harrell [15] to correlated ordinal observations. Such extensions have been studied by Hedeker and Mermelstein [16, 17]. The partial proportional odds model is built upon the popular proportional odds model for ordinal data [18], but allows non-proportional odds for a subset of the predictors. It is a more flexible approach and at the same time provides a useful tool to test the proportional odds assumption. (2) a cause-specific hazards model for the competing risks failure times data to allow for multiple risks at the survival endpoint [19], in which we incorporate a frailty to take into account correlations between the failure times. We further show that the frailty can be identified from the data. The two sub-models are associated through the joint distribution of the random effects in (1) and (2) so that the event time processes (e.g., missing data mechanism) can depend on both observed and missing measurements in the longitudinal endpoint. Our joint model not only enables one to make inference for both the longitudinal ordinal outcome and the failure times simultaneously, but also adjusts estimated quantities of the longitudinal measurements for possible non-ignorable missing data caused by the failure times. Our model further extends the previous methods in that it considers multiple failure types with potential correlations at the survival endpoint.

This paper is organized as follows. Section 2 describes the joint model and its likelihood function, and further shows that all the components at the survival endpoint, especially the frailty, are identifiable. Section 3 proposes an EM algorithm for the maximum likelihood estimates of the joint model and a profile likelihood approach for standard error estimation. Section 4 contains an application of the method to the the NINDS rt-PA stroke trial. Some simulation studies are provided in Section 5. The final section contains a discussion.

2. THE JOINT MODEL AND THE LIKELIHOOD FUNCTION

Our joint model consists of two linked sub-models: (1) a partial proportional odds model for the longitudinal ordinal repeated measurements; (2) a cause-specific hazards model for the competing risks failure time data. Sub-model (1) is an extension of the partial proportional odds model proposed by Peterson and Harrell [15] to allow for multiple observations on each study subject by incorporating subject-specific random effects. If we have n subjects under study, each with ni observations, i = 1, . . . , n, let Yij denote the jth response for subject i, where Yij takes values in {1, . . . , K} for some integer K ≥ 2, Xij a p × 1 vector of predictors, X~ij a s × 1 vector, sp, containing a subset of the p predictors for which the proportional odds assumption may not be satisfied, and Wij a q × 1 vector of predictors for the random effects. The partial proportional odds model for Yij is written as:

P(YijkXij,X~ij,Wij,θ,β,α,bi)=11+exp(θkXijTβX~ijTαkWijTbi) (1)

for k = 1, . . . , K - 1, where θ = (θ1, . . . , θK - 1)T with θ1 < θ2 < · · · θK−1, β = β1, . . . , βp)T are fixed effects of Xij, αk = αk1, . . . , αks)T is a s × 1 vector of regression coefficients and α1 = 0, so that X~ijTαk is an increment associated with the logit of probability Yijk comparing to that of Yij ≤ 1, and bi ~ Nq(0, Σb) is a vector of random effects for subject i. Let the vector α=(α2T,,αK1T)T.

We assume a proportional cause-specific hazards sub-model for the competing risks failure time data. Let Zi(t) denote the associated l × 1 vector of time-dependent predictors and Ci = (Ti, Di) denote the survival data on subject i, where Ti is the failure time or censoring time, and Di takes value from {0, 1, . . . , g}, with Di = 0 indicating a censored event and Di = d showing that subject i fails from the dth type of failure, where d = 1, . . . , g. The sub-model for Ci is specified as

λd(t;Zi(t),ui,γ,ν)=limh0h1P(tTi<t+h,Di=dTit,Zi(t),ui,γ,ν)=λ0d(t)exp(Zi(t)Tγd+νdui) (2)

for d = 1, . . . , g, where λd(t; Zi(t), ui, γ, ν) is the instantaneous failure rate due to type d at time t given Zi(t) and the frailty ui and in the presence of all other failure types, λ0d(t) is a completely unspecified baseline hazard function for risk d, γ=(γ1T,,γgT)T is a vector of fixed unknown regression coefficients, and ν = (ν1, . . . , νg)T collects the coefficients of the frailty ui for the g competing risks. This model is an extension of the cause-specific hazards model for competing risks survival data [19] by including subject-specific random effects ui. The random effects ui can be interpreted as unobservable traits that are shared by all the g event processes on the same subject and induce correlations among different failure types. Note that we do not assume the latent failure times are independent conditional on ui and the covariates, and allow existence of other sources of correlations among the failure times that are not accounted for in (2). Throughout, the censoring mechanism is assumed to be independent of the survival time. Dependent (or informative) censoring can be treated as one of the g types of failures. The association between Y and C is modeled by the assumption that the random effects ui and bi jointly have a multivariate normal distribution:

ai=(biui)~N(q+1)((00),(ΣbΣbuTΣbuσu2)).

The parameter ν1 is set to 1 to ensure identifiability. A Wald’s test can be used to test the null hypothesis H0: Σbu = 0 for the association between Y and C. It is easily seen that the joint model reduces to separate analysis of the two endpoints if Σbu = 0. The correlation between bi and ui can also be derived from Σbu and this, together with the magnitude of ν, can be used used to determine the strength of association between Y and C. We further assume Y and C are independent given the latent random effects ai and the covariates. Note that our joint model allows measurements in Y after event times, which is necessary in the rt-PA stroke trial since the Modified Rankin scale can be observed after treatment failure.

For competing risks failure data, it is well known that the distribution of (T, D) is the identified minimum and that the joint distribution of the underlying failure times is not identifiable from the data [20]. Under the assumptions that variation with observed regressors {exp(Z(t)T γd), d = 1, . . . , g} contains a non-empty open set in Rg and that the expectation of the frailty term exp(u) is finite, Abbring and van den Berg proved that the parameters of a mixed proportional cause-specific hazards model are identifiable based on competing risks survival data [21]. Their arguments can be applied to establish the identiability of the parameters in our Model (2). They also established the identifiability of the joint distribution of the latent failure times by further assuming independence between the latent failure times conditional on the covariates and random effects. In the paper we only need to be concerned with identifying the parameters of the mixed proportional cause-specific hazards model, rather than the joint distribution of the latent failure times, from the observed competing risks survival data. Therefore, we do not require the independence assumption between the latent failure times.

Let Ψ = (θ, β, α, γ, ν, Σ, λ01(t), . . . , λ0g(t)) collects all the parameters in (1) and (2), where Σ is the variance-covariance matrix of ai. We assume that the missing values in the longitudinal measurements caused by reasons other than the events are missing at random. For the notation, we write Yi = (Yi1, . . . , Yini)T, Y=(Y1T,,YnT)T, and C = (C1, D1, . . . ,Cn, Dn)T. Let πij(k) stand for the probability that Yijk given the covariates and the random effects, and thus πij(K) = 1 and πij(0) = 0 for all i and j. The observed-data likelihood function for Ψ is therefore

L(Ψ;Y,C)i=1nf(Yi,CiΨ)=i=1naf(YiCi,a,Ψ)f(Cia,Ψ)f(aΨ)da=i=1naf(Yia,Ψ)f(Cia,Ψ)f(aΨ)da=i=1na[j=1nik=1K{πij(k)πij(k1)}I(Yij=k)]{d=1gλd(Ti;Zi(Ti),u,γd,νd)I(Di=d)}×exp[0Ti{d=1gλd(t;Zi(t),u,γk,νk)}dt]×1(2π)q+1Σexp(12aTΣ1a)da. (3)

Here we rely on the assumption that Yi and Ci are independent conditional on the covariates and the random effects.

3. ESTIMATION AND INFERENCE

The observed-data likelihood is difficult to maximize directly because of integration with respect to the latent random effects ai. The procedure can be simplified using the complete-data likelihood conditional on the random effects:

L(Ψ;Y,C,a)i=1n[j=1nik=1K{πij(k)πij(k1)}I(Yij=k)]
{d=1gλd(Ti;Zi(Ti),ui,γd,νd)I(Di=d)}×exp[0Ti{d=1gλd(t;Zi(t),ui,γk,νk)}dt]×1(2π)q+1Σexp(12aiTΣ1ai). (4)

The maximum likelihood estimates of Ψ can be obtained by an EM-algorithm which iterates between an E-step in which the expected logarithm of the complete-data likelihood (4) is computed conditional on the observed data and the current estimates of the parameters, and an M-step in which the new parameter estimates are calculated by maximizing this expected log-likelihood. The cumulative hazards of the baseline functions in λd are chosen to be step functions with jumps at observed event times. We need to solve score equations in the maximization step. There are no closed-form solutions for θ, β, α, γ, and ν, for which we use a one-step Newton-Raphson algorithm. These parameter estimates in the M-step depend on the conditional expectations of functions of ai, which are evaluated in the E-step in each iteration. The algorithm iterates between the E-step and the M-step until the estimates converge. Please refer to the Appendix for more detail.

The dimension of our maximum likelihood estimates of Ψ increases with the sample size due to the non-parametric feature of the baseline hazard function λ0d, which motivates a profile likelihood approach for the standard error estimates of the parametric components θ, β, α, γ, ν, and Σ, in which the baseline hazards functions have been profiled. We propose to approximate its variance-covariance matrix of Ω = (θ, β, α, γ, ν, Σ) by inverting the empirical Fisher information obtained from the profile likelihood. Let l(i)(Ω̂; Y, C) denote the observed score vector from the profile likelihood on the ith subject evaluated at Ω̂. The observed information matrix of Ω can be approximated by i=1nl(i)(Ω^;Y,C)l(i)(Ω^;Y,C)T.

4. ANALYSIS OF THE NINDS rt-PA TRIAL

The NINDS rt-PA trial of intravenous recombinant tissue plasminogen activator (rt-PA) in patients with acute ischemic stroke compares rt-PA with placebo using a randomized double-blind design. A total of 624 patients entered the study and were randomized to one of the two groups of 312 patients each. Among other measures of efficacy, the modified Rankin scale was recorded at baseline, 7–10 days, 3 months, 6 months, and 12 months post stroke onset. The measure is in an ordinal scale and some of the categories were pooled using the following: 1 = no symptoms or no significant disability despite symptoms, 2 = slight disability, 3 = moderate disability or moderately severe disability, 4 = severe disability or dead. Although death is in one of the levels, we do not impute missing data after death in the modified rankin scale, but take care of the time to death in the survival endpoint. Out of the 624 patients, 25 dropped out before 12 months (14 in rt-PA group and 11 in the placebo) and 168 died (78 in rt-PA group and 90 in the placebo group). A treatment failure occurs if the patient remains in severe disability in two consecutive observations after randomization. We observed 54 treatment failures, of which 17 died later. The average number of visits is 4.25, and the percent of missing data in the modified Rankin Scale at 12 months is 30%. The missing data after death or dropout could be non-ignorable since patients with a higher Rankin score would be more likely to die or drop out of the study because of low efficacy of the treatment.

In this example we illustrate the application of the joint model using a subset of the patients whose disease subtypes are small vessel occlusive disease, large vessel atherosclerosis / cardioembolic stroke, or unknown reasons. The following covariates were considered in modeling the longitudinal ordinal modified Rankin scale post stroke onset: treatment group (rt-PA or placebo), the three subtypes of acute stroke (small vessel occlusive disease, large vessel atherosclerosis or cardioembolic stroke, and unknown reasons), modified Rankin scale prior stroke onset (based on the original definition without collapsing categories), and time since randomization. We adopt unstructured time trend using three dummy variables, time3, time6, time12, for 3, 6, and 12 months respectively, so that the measure at 7 – 10 days serves as reference. There are 587 patients included in the analysis and their baseline characteristics and changes in modified rankin scale over time are summarized in Table 1. The disease subtypes and the baseline modified rankin scale are distributed evenly between the groups, but the rt-PA group has significantly lower modified rankin scale after treatment initiation. For both groups, the modified rankin scale decreased over time. The Kendall’s tau correlations among the modified rankin scale at 7–10 days, 3, 6, and 12 months are in between 0.65 – 0.87 (p-values < 0.0001).

Table 1.

Baseline characteristics of study subjects and changes in modified rankin scale over time (we show the mean (standard deviation) and the frequency (%) for modified rankin scale and the disease subtypes, respectively)

rt-PA group (n = 292) Placebo group (n = 295) p-value
Disease subtypes
small vessel 31 (10.62%) 30 (10.17%)
large vessel or cardioembolic stroke 181 (61.99%) 184 (62.37%) 0.9804a
Modified rankin scale
modified rankin scale prior onset 0.27 (0.73) 0.29 (0.80) 0.9872b
modified rankin scale (7–10 days) 2.55 (1.20) 2.90 (1.07) 0.0006b
modified rankin scale (3 months) 1.97 (1.07) 2.27 (1.02) 0.0017b
modified rankin scale (6 months) 1.91 (1.04) 2.18 (1.04) 0.0060b
modified rankin scale (12 months) 1.81 (1.01) 2.13 (1.04) 0.0015b
a

The p-values are calculated using Chi-square test

b

The p-values are calculated using Wilcoxon rank-sum test

In the joint model, we are interested in modeling two competing risks at the survival endpoint, the time to dropout (risk 1) and the time to death or remaining in severe disability (risk 2). We combine death and remaining in severe disability in one risk because both of the events are strong evidence of low treatment efficacy. One dummy variable ”group” is created treating the placebo treatment arm as the reference group, and another two dummy variables ”small vessel” and ”large vessel or cardioembolic stroke” are generated to represent the two blocks, treating the block ”unknown reasons” as the reference. We carried out the likelihood ratio test to assess the fit of the proportional odds assumption by expanding the vector X~ and testing α = 0, and identified divergence of the block effects from the proportional odds assumption.

The results of the joint model are shown in Table 2 where at the longitudinal endpoint we have X~=(small vessel, large vessel or cardioembolic stroke)T. Since we could not identify significant interaction effects between group and the time trend, these terms were not considered in our final model. As shown in Table 2, there are significant effects of treatment, modified Rankin scale prior onset, time3, time6, time12, and the interaction between large vessel or cardioembolic stroke and the treatment. The Rankin scale has a decreasing trend over time, given that conditional on other covariates and the random effects, the cumulative odds ratio for Yk, k = 1, 2, 3, is exp(2.12) = 8.33 at 3 months compared to 7–10 days post stroke onset (the 95% confidence interval is (5.63, 12.33)), and is 9.68 and 11.59 at 6 months and 12 months, with the 95% confidence intervals (6.54, 14.32) and (7.53, 17.84), respectively. The block effects do not satisfy the proportional odds assumption. Compared to the patients with unknown reasons (as stated in the database), the small vessel patients have lower Rankin scales, and the conditional cumulative odds ratio is exp(3.49) for Y ≤ 1, and is exp(3.77) and exp(6.14) for Y ≤ 2 and Y ≤ 3. Note that the estimate 6.14 may not be reliable due to the fact that there are only 4 patients with Y = 4 in the stratum small vessel. The patients with large vessel and cardioembolic stroke tend to have higher Rankin scales than the patients with unknown reasons. The treatment is not as effective for the large vessel or cardioembolic stroke patients as for the patients with unknown reasons, and there is no significant difference in the treatment effects between the patients with small vessel and those with unknown reasons. For the patients with unknown reasons, the conditional cumulative odds ratio is exp(1.48) = 4.39 for Y ≤ 1, 2, and 3 comparing rt-PA group to the placebo group (the 95% confidence interval (2.30, 8.39)). In contrast, in the large vessel or cardioembolic stroke patients, the conditional cumulative odds ratio is exp(1.48–2.27) = 0.45 (the 95% confidence interval (0.09, 2.31)).

Table 2.

Results from the joint analysis for the NINDS study

Estimate (SE)
Longitudinal outcome
proportional odds (PO)
(cumulative prob of Yk, k = 1, 2, 3)
group 1.48 (0.33)
modified Rankin scale prior onset −1.67 (0.27)
time3 2.12 (0.20)
time6 2.27 (0.20)
time12 2.45 (0.22)
small vessel × group −0.74 (1.26)
large vessel or cardioembolic stroke × group −2.27 (0.76)
partial PO (cumulative prob of Y ≤ 1)
small vessel 3.49 (0.68)
large vessel or cardioembolic stroke −1.04 (0.44)
partial PO (cumulative prob of Y ≤ 2)
small vessel 3.77 (0.68)
large vessel or cardioembolic stroke −1.36 (0.39)
partial PO (cumulative prob of Y ≤ 3)
small vessel 6.14 (1.15)
large vessel or cardioembolic stroke −0.64 (0.49)
Cause-specific hazards
Risk 1: dropout
group 0.23 (0.47)
modified Rankin scale prior onset 0.06 (0.42)
small vessel 0.55 (0.57)
large vessel −0.29 (0.51)
small vessel × group 0.04 (1.14)
large vessel or cardioembolic stroke 0.30 (1.02)
Risk 2: death or remaining in severe disability
group −0.46 (0.27)
modified Rankin scale prior onset 0.53 (0.17)
small vessel −2.07 (0.79)
large vessel 0.37 (0.27)
small vessel × group 0.35 (1.49)
large vessel or cardioembolic stroke 0.81 (0.54)
Random effects
σb2 34.66 (3.94)
σu2 0.51 (0.07)
ρ bu −0.997 (0.19)
ν 2 3.12 (0.51)

p-value < 0.05

We are not able to observe significant treatment effects at the survival endpoint for either the time to dropout or the time to death or remaining in severe disability. There appears to be a higher risk of death or remaining in severe disability in the patients with a higher prior onset Rankin scale. On the other hand, the patients with small vessel tend to have a lower risk for this event than those with unknown reasons. The estimate of ν2 is positive, suggesting that the two risks are positively correlated, i.e., the patients with a higher risk of dropout are more likely to experience death or remaining in severe disability. There is a negative correlation (ρbu < 0) between the random intercept bi in sub-model (1) and the frailty ui in sub-model (2), which indicates that patients with higher Rankin scales tend to have a higher risk of dropout, death or remaining in severe disability. We also carried out separate analysis of the longitudinal Rankin scale measurements using either Wilcoxon two sample test or a partial proportional odds model assuming ignorable missing data mechanism. Significant treatment effects were found in both methods and the partial proportional odds model produced similar estimates as our joint model. However, this is not always the case in the presence of non-ignorable missing data. In the next section we show using simulation studies that the separate analysis of the longitudinal ordinal data could give rise to biased estimates and poor inference when there are non-ignorable missing data.

5. SIMULATION STUDIES

Tables 3 and 4 summarize the simulation results on 200 Monte Carlo samples with the sample size n = 200 and 500, respectively. We generated data from Model (1)-(2) with K = 3. For each simulated dataset, we apply both the joint model and the separate analysis of the longitudinal outcome and the competing risks survival times using Model (1) and (2), respectively. The covariate vector Xij = (tij, xi, tijxi)T, where tij = 0, 0.5, . . . , up to 4 (may be censored by the failure times) is the visit time, xi ~ Bernoulli(0.5) is the treatment group indicator, and tijxi is the interaction between the two. We further set X~ij=xi and Wij = 1, so that bi is the random intercept for subject i and its variance is σb2. The true values of the parameters β, α, θ and σb2 are given in Tables 3 and 4. We simulated two competing risks with λ01 = 0.15, λ02 = 0.25, Zi(t) = (zi, xi)T with zi ~ N(2, 1), and ui~N(0,σu2). The random intercept bi in Model (1) and ui have a bivariate normal distribution with correlation ρbu. The censoring time τi for subject i was generated from an exponential distribution with mean 10. We could only observe one failure type on each study subject, depending on which happens first. Furthermore, censoring could occur if τi is smaller than both failure times. In our simulation the rate of risk 1 is around 44%, risk 2 is around 37%, and the censoring rate is around 19%. The simulated bias, standard error and the 95% confidence interval coverage probability (CP) are given in Tables 3 and 4.

Table 3.

Comparison of the joint model and the separate analysis of the longitudinal outcome (n = 200)

Separate Joint

Parameter True Bias SE CP Bias SE CP MSES/MSEJ
Longitudinal
 Fixed effects
β 1 −1 0.322 0.301 0.795 −0.020 0.155 0.970 7.954
β 2 1.5 0.030 0.352 0.940 0.014 0.294 0.960 1.441
β 3 0.8 −0.216 0.528 0.935 0.039 0.287 0.990 3.879
α 2 0 −0.023 0.285 0.950 −0.008 0.236 0.975 1.466
θ 1 −0.5 0.017 0.161 0.960 −0.010 0.165 0.950 0.959
θ 2 1 0.023 0.178 0.955 0.015 0.153 0.950 1.363
 Random effects
σb2 1 −0.059 0.589 0.900 0.021 0.346 0.955 2.916
Survival
 Fixed effects
γ 11 0.8 −0.015 0.152 0.950 0.062 0.163 0.965 0.767
γ 12 −1 0.014 0.308 0.950 −0.079 0.297 0.955 1.006
γ 21 0.5 0.009 0.170 0.970 0.041 0.159 0.965 1.075
γ 22 −1 −0.024 0.301 0.970 −0.072 0.269 0.975 1.176
 Random effects
ν 2 0.5 −0.192 0.894 0.985 0.044 0.419 0.960 4.711
σu2 0.5 −0.112 0.418 0.910 0.203 0.473 0.930 0.707
Covariance
ρ bu −0.9 0.048 0.127 0.990

Note: Large bias and poor CP are highlighted in boldface.

Table 4.

Comparison of the joint model and the separate analysis of the longitudinal outcome (n = 500)

Separate Joint

Parameter True Bias SE CP Bias SE CP MSES/MSEJ
Longitudinal
 Fixed effects
β 1 −1 0.311 0.189 0.565 −0.007 0.101 0.960 12.92
β 2 1.5 −0.005 0.228 0.950 0.014 0.193 0.960 1.389
β 3 0.8 −0.169 0.340 0.900 0.012 0.176 0.980 4.632
α 2 0 −0.015 0.169 0.960 −0.014 0.154 0.940 1.204
θ 1 −0.5 0.023 0.100 0.945 −0.009 0.102 0.945 1.004
θ 2 1 0.014 0.112 0.950 0.008 0.106 0.930 1.127
 Random effects
σb2 1 −0.078 0.337 0.920 0.030 0.225 0.960 2.322
Survival
 Fixed effects
γ 11 0.8 0.008 0.137 0.960 0.026 0.109 0.950 1.450
γ 12 −1 −0.007 0.224 0.955 −0.038 0.195 0.945 1.273
γ 21 0.5 0.004 0.139 0.965 0.022 0.094 0.955 2.075
γ 22 −1 −0.015 0.212 0.940 −0.036 0.180 0.955 1.340
 Random effects
ν 2 0.5 −0.046 0.859 0.970 0.027 0.268 0.950 10.20
σu2 0.5 −0.082 0.423 0.940 0.125 0.281 0.925 1.963
Covariance
ρ bu −0.9 0.061 0.088 0.970

Note: Large bias and poor CP are highlighted in boldface.

Compared to the joint model, the separate analysis produces relatively large bias in the time trend β1 and the interaction with the treatment β3. With the negative correlation between bi and ui (ρbu < 0)and the positive correlation between the two failure times (ν2 > 0), the subject with a higher ordinal outcome tends to have a higher risk of experiencing both failures and thus leave the study early, so that that the observed time trend is under-estimated (note that we model the probability of Yijk), which results in a low confidence interval coverage for β1. Because the treatment lowers the ordinal outcome, there would be unbalanced event rates between the two groups, the estimated difference in the time trend (β3) is also biased. These biases will not vanish as we increase the sample size to 500 and the CP for β1 is even poorer. The separate analysis of the competing risks data also underestimates ν2 when n = 200 and produces larger empirical standard errors for γ. In the joint analysis the missing data mechanism has been modeled together with the longitudinal measurements so that we are able to obtain almost unbiased estimates of β. Furthermore, by combining information from the longitudinal endpoint, it is more efficient in estimating γ and ν2. Overall the joint model performs better asymptotically (n = 500) with smaller mean square errors for all the parameters. At last, we observe that estimation of σu2 requires a relatively large sample size in both the joint model and the separate analysis.

To further compare the performances of the joint model and the separate analysis under a more general senario, we conducted a second set of simulations in which the random effects ai were generated from a multivariate t-distribution with degrees of freedom d.f. = 5, but the data were analyzed on the basis of the assumptions specified in Models (1)-(2). We know that the t-distribution has longer tails than the normal distribution, and the latter is included as a special case as the d.f. goes to infinity. The results of the simulations are given in Tables 5 and 6. We do not show the estimates for the random components since the estimates and the true parameter values are no longer comparable under model misspecification. Similar to the results in Tables 3 and 4, bias in the estimates of β1, β3 and ν2 is identified in the separate analysis, but now ν2 tends to be over-estimated, and its bias does not vanish as the sample size increases to 500. The joint model in general produces more accurate point estimates than the separate analysis. The standard error estimation methods in both approaches are not robust to model misspecification as some parameters show poor confidence interval coverage probabilities, especially as the sample size gets large. The separate analysis tends to have larger variances (or SE) in the parameter estimates for the fixed effects at the longitudinal endpoint than that in Table 3. However, the impact of the model misspecification on the variances of the estimates in the joint model is minimal. Comparisons of the mean square errors between the two approaches again suggest that the joint model performs superior to the separate analysis.

Table 5.

Comparison of the joint model and the separate analysis of the longitudinal outcome when the underlying distribution of ai is multivariate t with d.f. = 5 (n = 200)

Separate Joint

Parameter True Bias SE CP Bias SE CP MSES/MSEJ
Longitudinal
 Fixed effects
β 1 −1 0.398 0.266 0.740 −0.023 0.153 0.945 9.573
β 2 1.5 0.036 0.371 0.980 0.064 0.338 0.980 1.174
β 3 0.8 −0.228 0.565 0.910 −0.041 0.291 0.960 4.298
α 2 0 −0.035 0.314 0.930 0.024 0.229 0.950 1.883
θ 1 −0.5 0.027 0.178 0.945 −0.029 0.156 0.970 1.287
θ 2 1 0.037 0.195 0.955 −0.022 0.154 0.970 1.628
Survival
 Fixed effects
γ 11 0.8 −0.051 0.190 0.870 0.020 0.165 0.980 1.401
γ 12 −1 0.079 0.291 0.915 −0.042 0.283 0.970 1.111
γ 21 0.5 −0.021 0.171 0.970 0.027 0.150 0.970 1.278
γ 22 −1 0.004 0.301 0.975 −0.034 0.300 0.970 0.994
 Random effects
ν 2 0.5 0.114 0.878 0.955 0.049 0.387 0.960 5.151

Note: Large bias and poor CP are highlighted in boldface.

Table 6.

Comparison of the joint model and the separate analysis of the longitudinal outcome when the underlying distribution of ai is multivariate t with d.f. = 5 (n = 500)

Separate Joint

Parameter True Bias SE CP Bias SE CP MSES/MSEJ
Longitudinal
 Fixed effects
β 1 −1 0.400 0.183 0.405 −0.027 0.104 0.895 16.76
β 2 1.5 0.029 0.233 0.950 0.029 0.201 0.975 1.337
β 3 0.8 −0.275 0.362 0.825 −0.065 0.182 0.895 5.533
α 2 0 0.005 0.182 0.945 −0.001 0.149 0.960 1.493
θ 1 −0.5 0.037 0.109 0.935 −0.011 0.107 0.930 1.145
θ 2 1 0.036 0.122 0.940 −0.003 0.097 0.950 1.718
Survival
 Fixed effects
γ 11 0.8 −0.031 0.144 0.800 0.016 0.109 0.970 1.788
γ 12 −1 0.034 0.245 0.865 −0.020 0.194 0.945 1.609
γ 21 0.5 −0.002 0.138 0.925 0.011 0.094 0.965 2.127
γ 22 −1 −0.031 0.234 0.925 −0.025 0.178 0.935 1.725
 Random effects
ν 2 0.5 0.103 0.909 0.870 0.019 0.214 0.950 18.13

Note: Large bias and poor CP are highlighted in boldface.

6. DISCUSSION

The proposed model extends existing methods to handle longitudinal ordinal data with possible non-ignorable data using a partial proportional odds model, adopts competing risks framework for the missing data mechanism, and therefore is more general in terms of distinguishing different events that cause missing data in the study. On the basis of the arguments given in Abbring and van den Berg [21], it is easy to show that all the parameters at the survival endpoint are identifiable. Our joint model enables one to make inferences for both endpoints simultaneously, while at the same time adjusting estimated quantities of the longitudinal measurements for possible non-ignorable missing data caused by the failures. Using simulations we show that the joint analysis performs better than the separate analysis, even under model misspecification where the underlying distribution of the random effects ai has longer than normal tails. Employment of the partial proportional odds model also enables us to test the fit of the proportional odds assumption for the ordinal measures. If the sample size permits, one could start with the full partial proportional odds model by setting X~ij=Xij and backward eliminate the non-significant covariates from X~ij. In our joint analysis settings, the correlations among the longitudinal ordinal data are modeled through the random effects, which makes it difficult to obtain the fitted correlations as one of the outputs of the model fitting process. If it is of interest to the investigator, marginal models for multivariate ordinal data could be used instead. Because our joint model involves infinite dimensional parameters in the baseline hazard functions, a rigorous treatment of the asymptotic properties of the maximum likelihood estimates warrants future research.

Model selection, as in any regression setting, is an important problem in joint analysis. However, this issue has not been fully addressed in the joint modeling literature. In the application to the stroke study, we use likelihood ratio test to assess the fit of proportional odds assumption by expanding the covariate vector X~. This problem can not be easily tackled by the popular model selection criteria, such as the Akaike information criterion (AIC), since it is difficult in the presence of nonparametric baseline functions of the cause-specific hazards. In particular, in semiparametric models like our joint model, the number of nuisance parameters in the baseline hazard functions increases with the sample size. The partial likelihood approach for Cox models is also inapplicable due to the correlation with longitudinal measurements introduced by the frailty term. Some authors have presented profile likelihood methods for model selection in the context of frailty models [22], which can be extended to the joint model. It is possible to extend their work to joint models. Future research in this direction is warranted.

ACKNOWLEDGEMENTS

The publicly available dataset for the NINDS rt-PA Stroke Trials was downloaded through the National Technical Information Service website. We are grateful to NIH and the NINDS rt-PA Study Group for making this dataset available as a public resource.

Appendix. The EM Algorithm

E-step

In the E-step of the (m + 1)th iteration, conditional on the observed data and the parameter estimates from the mth iteration, we evaluate

EaiYi,Ci,Ψ(m)(h(ai))=h(ai)f(aiYi,Ci,Ψ(m))dai=h(ai)f(Yi,Ci,aiΨ(m))daif(Yi,CiΨ(m))=h(ai)f(Yiai,Ψ(m))f(Ciai,Ψ(m))f(aiΨ(m))daif(Yiai,Ψ(m))f(Ciai,Ψ(m))f(aiΨ(m))dai. (5)

The integrals can be evaluated using Gaussian-Hermite quadrature.

M-step

Use E to stand for Eαi|Yi, Ci, Ψ(m). We have, for Σ,

Σb(m+1)=1ni=1nE(bibiT), (6)
σu2(m+1)=1ni=1nE(ui2), (7)

and

Σbu(m+1)=1ni=1nE(biui). (8)

Suppose there are qd distinct failure times due to the dth cause and write td1 ≤ . . . ≤ tdqd for d = 1, . . . , g. Let R(tdj) be the risk set at time tdj, and ndj be the number of failures due to cause d at time tdj. The cumulative baseline hazard function for cause d is

H0d(m+1)(tdq)=j=1qλ0d(m+1)(tdj)=j=1qndjrR(tdj)exp(Zr(tdj)Tγd(m))E(exp(νd(m)ur)). (9)

No closed-form solutions exist for θ, β, α, γ, and ν, which are updated by a one-step Newton-Raphson algorithm in each iteration:

θk(m+1)=θk(m)Sθk(m)Iθk(m) (10)

where k = 1, . . . , K - 1, with Iθk(m) and Sθk(m) being

Iθk(m)=i=1nj=1ni[I(Yij=k)E{(πij(k)πij2(k))(12πij(k))πij(k)πij(k1)(πij(k)πij2(k))2(πij(k)πij(k1))2}I(Yij=k+1)E{(πij(k)πij2(k))(12πij(k))πij(k+1)πij(k)+(πij(k)πij2(k))2(πij(k+1)πij(k))2}], (11)
Sθk(m)=i=1nj=1ni[I(Yij=k)E{πij(k)πij2(k)πij(k)πij(k1)}I(Yij=k+1)E{πij(k)πij2(k)πij(k+1)πij(k)}], (12)
β(m+1)=β(m)+Iβ(m)1Sβ(m), (13)

with Iβ(m) and Sβ(m) being

Iβ(m)=i=1nj=1nik=1KI(Yij=k)[E{πij(k)πij2(k)}+E{πij(k1)πij2(k1)}]XijXijT, (14)
Sβ(m)=i=1nj=1nik=1KI(Yij=k)[1E{πij(k)}E{πij(k1)}]Xij, (15)
αk(m+1)=αk(m)Iαk(m)1Sαk(m) (16)

where k = 2, . . . , K - 1, with Iαk(m) and Sαk(m) being

Iαk(m)=i=1nj=1ni[I(Yij=k)E{(πij(k)πij2(k))(12πij(k))πij(k)πij(k1)(πij(k)πij2(k))2(πij(k)πij(k1))2}X~ijX~ijTI(Yij=k+1)E{(πij(k)πij2(k))(12πij(k))πij(k+1)πij(k)+(πij(k)πij2(k))2(πij(k+1)πij(k))2}X~ijX~ijT], (17)
Sαk(m)=i=1nj=1ni[I(Yij=k)E{πij(k)πij2(k)πij(k)πij(k1)}X~ijI(Yij=k+1)E{πij(k)πij2(k)πij(k+1)πij(k)}X~ij], (18)
γd(m+1)=γd(m)+Iγd(m)1Sγd(m) (19)

where d = 1, . . . , g, with Iγd(m) and Sγd(m) being

Iγd(m)=i=1ntdjTiλ0d(m+1)(tdj)exp(Zi(tdj)Tγd(m))×E(exp(νd(m)ui))Zi(tdj)Zi(tdj)T, (20)
Sγd(m)=i=1n{I(Di=d)Zi(Ti)tdjTiλ0d(m+1)(tdj)exp(Zi(tdj)Tγd(m))×E(exp(νd(m)ui))Zi(tdj)}, (21)
νd(m+1)=νd(m)+Sνd(m)Iνd(m) (22)

where d = 2, . . . , g, with Iνd(m) and Sνd(m) being

Iνd(m)=i=1ntdjTiλ0d(m+1)(tdj)exp(Zi(tdj)Tγd(m+1))×E(ui2exp(νd(m)ui)), (23)
Sνd(m)=i=1n{I(Di=d)E(ui)tdjTiλ0d(m+1)(tdj)exp(Zi(tdj)Tγd(m+1))×E(uiexp(νd(m)ui))}. (24)

Because the model requires that the elements in θ satisfy θ1 < θ2 < · · · < θK−1, we start the EM algorithm by setting the initial values of θ in the increasing order. In each M-step, we monitor the order of the updated θ and switch the values of some components to maintain the monotonicity. However, in our simulations and the real data analysis we have not encountered situations where we need to switch values.

REFERENCES

  • 1.Little RJA, Rubin DB. Statistical Analysis with Missing Data. 2nd edn Wiley; Hoboken: 2002. [Google Scholar]
  • 2.The national institute of neurological disorders and stroke rt-PA stroke study group Tissue plasminogen activator for acute ischemic stroke. The New England Journal of Medicine. 1995;333:1581–1587. doi: 10.1056/NEJM199512143332401. [DOI] [PubMed] [Google Scholar]
  • 3.Diggle P, Kenward MG. Informative drop-out in longitudinal data analysis (with discussion) Applied Statistics. 1994;43:49–93. [Google Scholar]
  • 4.Little RJA. A class of pattern-mixture models for normal incomplete data. Biometrika. 1994;81:471–483. [Google Scholar]
  • 5.Henderson R, Diggle P, Dobson A. Joint modeling of longitudinal measurements and event time data. Biostatistics. 2000;4:465–480. doi: 10.1093/biostatistics/1.4.465. [DOI] [PubMed] [Google Scholar]
  • 6.Hogan JW, Daniels MJ. A hierarchical modelling approach to analysing longitudinal data with drop-out and non-compliance, with application to an equivalence trial in paediatric acquired immune defficiency syndrome. Applied Statistics. 2002;51:1–21. [Google Scholar]
  • 7.Zeng D, Cai J. Simultaneous modelling of survival and longitudinal data with an application to repeated quality of life measures. Lifetime Data Analysis. 2005;11:151–174. doi: 10.1007/s10985-004-0381-0. [DOI] [PubMed] [Google Scholar]
  • 8.Elashoff RM, Li G, Li N. An approach to joint analysis of longitudinal measurements and competing risks failure time data. Statistics in Medicine. 2007;26:2813–2835. doi: 10.1002/sim.2749. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Elashoff RM, Li G, Li N. A joint model for longitudinal measurements and survival data in the presence of multiple failure types. Biometrics. 2008;64:762–771. doi: 10.1111/j.1541-0420.2007.00952.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Follmann D, Wu M. An approximate generalized linear model with random effects for informative missing data. Biometrics. 1995;51:151–168. [PubMed] [Google Scholar]
  • 11.Ibrahim JG, Chen M-H, Lipsitz SR. Missing responses in generalised linear mixed models when the missing data mechanism is nonignorable. Biometrika. 2001;88:551–564. [Google Scholar]
  • 12.Roy J, Daniels MJ. A general class of pattern mixture models for nonignorable dropout with many possible dropout times. Biometrics. 2008;64:538–545. doi: 10.1111/j.1541-0420.2007.00884.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Molenberghs G, Kenward MG, Lesaffre E. The analysis of longitudinal ordinal data with nonrandom drop-out. Biometrika. 1997;84:33–44. [Google Scholar]
  • 14.Kaciroti NA, Raghunathan TE, Schork MA, Clark NM, Gong M. A Bayesian approach for clustered longitudinal ordinal outcome with nonignorable missing data: evaluation of an asthma education program. Journal of the American Statistical Association. 2006;101:435–446. [Google Scholar]
  • 15.Peterson B, Harrell FE. Partial proportional odds models for ordinal response variables. Applied Statistics. 1990;39:205–217. [Google Scholar]
  • 16.Hedeker D, Mermelstein RJ. A multilevel thresholds of change model for analysis of stages of change data. Multivariate Behavioral Research. 1998;33:427–455. [Google Scholar]
  • 17.Hedeker D, Mermelstein RJ. Analysis of longitudinal substance use outcomes using ordinal random-effects regression models. Addiction. 2000;95(Supplement 3):S381–S394. doi: 10.1080/09652140020004296. [DOI] [PubMed] [Google Scholar]
  • 18.McCullagh P. Regression models for ordinal data. Journal of the Royal Statistical Society, Series B. 1980;42:109–142. [Google Scholar]
  • 19.Prentice RL, Kalbfleisch JD, Peterson AV, Flournoy N, Farewell VT, Breslow NE. The analysis of failure times in the presence of competing risks. Biometrics. 1978;34:541–554. [PubMed] [Google Scholar]
  • 20.Tsiatis A. A nonidentifiability aspect of the problem of competing risks. Proceedings of the National Academy of Sciences of the United States of America. 1975;72:20–22. doi: 10.1073/pnas.72.1.20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Abbring JH, van den Berg GJ. The identifiability of the mixed proportional hazards competing risks model. The Journal of the Royal Statistical Society, Series B. 2003;65:701–710. [Google Scholar]
  • 22.Xu R, Gamst A, Donohue M, Vaida F, Harrington DP. Using profile likelihood for semiparametric model selection with application to proportional hazards mixed models. Harvard University Biostatistics Working Paper Series. 2006 http://www.bepress.com/harvardbiostat/paper43. [PMC free article] [PubMed]

RESOURCES