Joint Analysis of Survival Time and Longitudinal Categorical Outcomes

Jaeun Choi; Jianwen Cai; Donglin Zeng; Andrew F Olshan

doi:10.1007/s12561-013-9091-z

. Author manuscript; available in PMC: 2016 May 1.

Published in final edited form as: Stat Biosci. 2013 May 24;7(1):19–47. doi: 10.1007/s12561-013-9091-z

Joint Analysis of Survival Time and Longitudinal Categorical Outcomes

Jaeun Choi ¹, Jianwen Cai ², Donglin Zeng ³, Andrew F Olshan ⁴

PMCID: PMC4454429 NIHMSID: NIHMS484740 PMID: 26052353

Abstract

In biomedical or public health research, it is common for both survival time and longitudinal categorical outcomes to be collected for a subject, along with the subject’s characteristics or risk factors. Investigators are often interested in finding important variables for predicting both survival time and longitudinal outcomes which could be correlated within the same subject. Existing approaches for such joint analyses deal with continuous longitudinal outcomes. New statistical methods need to be developed for categorical longitudinal outcomes. We propose to simultaneously model the survival time with a stratified Cox proportional hazards model and the longitudinal categorical outcomes with a generalized linear mixed model. Random effects are introduced to account for the dependence between survival time and longitudinal outcomes due to unobserved factors. The Expectation-Maximization (EM) algorithm is used to derive the point estimates for the model parameters, and the observed information matrix is adopted to estimate their asymptotic variances. Asymptotic properties for our proposed maximum likelihood estimators are established using the theory of empirical processes. The method is demonstrated to perform well in finite samples via simulation studies. We illustrate our approach with data from the Carolina Head and Neck Cancer Study (CHANCE) and compare the results based on our simultaneous analysis and the separately conducted analyses using the generalized linear mixed model and the Cox proportional hazards model. Our proposed method identifies more predictors than by separate analyses.

Keywords: EM algorithm, Generalized linear mixed model, Maximum likelihood estimator, Random effect, Simultaneous modeling, Stratified Cox proportional hazards model

1 Introduction

In biomedical or public health research, it is common that both longitudinal outcomes over time and survival endpoint are collected for a subject, along with the subject’s characteristics or risk factors. Investigators are often interested in finding important variables which predict both longitudinal outcomes and survival time. Since longitudinal outcomes and survival time are dependent, it is natural to analyze these two outcomes jointly.

Among the existing approaches for longitudinal data and survival time, the selection model and the pattern mixture model have been widely used. The selection model estimates the distribution of survival time given longitudinal data. The selection model with continuous longitudinal data was studied by Tsiatis, De Gruttola, and Wulfsohn (1995), Faucett and Thomas (1996), Wulfsohn and Tsiatis (1997), Henderson, Diggle and Dobson (2000), Tsiatis and Davidian (2001), Xu and Zeger (2001a,b), Song, Davidian and Tsiatis (2002), Tseng, Hsieh and Wang (2005), Song and Wang (2007) and Ye, Lin and Taylor (2008) among others. The selection model with categorical longitudinal data was considered by Faucett, Schenker and Elashoff (1998), Huang et al. (2001), Xu and Zeger (2001a,b), Lin, McCulloch, and Mayne (2002), Chen, Ibrahim, and Lipsitz (2002), Larsen (2004), Yao (2008), and Chakraborty and Das (2010) among others. The pattern mixture model focuses on the trend of longitudinal outcomes conditional on survival time. The pattern mixture model with continuous longitudinal outcomes was studied by Wu and Carroll (1988), Wu and Bailey (1989), Schluchter (1992), Hogan and Laird (1997), Ribaudo, Thompson and Allen-Mersh (2000) and more recently by Ding and Wang (2008). Pulkstenis, Ten Have and Landis (1998) considered the pattern mixture model of binary longitudinal outcomes with informative dropout. Albert and Follmann (2000) proposed to model repeated count data subject to informative dropout, and Albert, Follmann, Wang and Suh (2002) and Albert and Follmann (2007) studied binary longitudinal data with informative missingness. However, these methods cannot be applied directly to assess covariate effects on both outcomes. Simultaneous modeling of the longitudinal and survival data is needed for such purpose.

Xu and Zeger (2001b) and Zeng and Cai (2005a) proposed simultaneous models of longitudinal outcome and survival time. In their articles, heterogeneity caused by unobserved factors is represented using subject-specific random effects. Xu and Zeger (2001b) considered both continuous and categorical longitudinal outcome and proposed the Bayesian approach using the MCMC for estimation. Zeng and Cai (2005a) considered continuous longitudinal outcome and adopted the EM algorithm for estimation. In their approach, given random effects, survival time and the repeated measurements of longitudinal outcomes are assumed to follow a Cox proportional hazards model and a Gaussian distribution, respectively. Recently, simultaneous models with varied types of survival events and random effect structures have been studied (Elashfoff, Li and Ni 2007, 2008; Liu, Ma and O’Quigley 2008; Rizopoulos, Verbeke and Molenberghs 2008; Rizopoulos, Verbeke, Lesaffre and Vanrenterghem 2008). Bayesian methods were also proposed for inference (Wang and Taylor 2001; Brown and Ibrahim 2003; Dunson and Herring 2005; Chen, Ghosh, Raghunathan, and Sargent 2009; Hu, Li and Li 2009; Huang, Li, Elashfoff and Pan 2011). However, among all the aforementioned simultaneous models, most studies except Xu and Zeger (2001b), Dunson and Herring (2005), Rizopoulos, Verbeke, Lesaffre and Vanrenterghem (2008), and Chen, Ghosh, Raghunathan, and Sargent (2009) are restricted to continuous longitudinal outcomes. For non-continuous longitudinal outcomes, Rizopoulos, Verbeke, Lesaffre and Vanrenterghem (2008) studied binary data with excess zeros through extending the previous study by Rizopoulos, Verbeke and Molenberghs (2008) which assumed an accelerated failure time model and used a copula function for random effects in the continuous longitudinal and survival processes, Dunson and Herring (2005) proposed a general underlying Poisson variable framework for discrete survival and longitudinal outcomes accommodating dependency through an additive gamma frailty model for the Poisson means in Bayesian approach, and Chen, Ghosh, Raghunathan, and Sargent (2009) considered a latent variable-based multivariate regression model with structured variance covariance matrix by assuming probit models for two binary outcomes and a log-normal accelerated failure time model for survival outcome and conducted the Bayesian inference through the Markov Chain Monte Carlo (MCMC) method.

Compared to the studies for continuous longitudinal data and survival time, relatively little work has been done in the simultaneous modeling frame work for categorical longitudinal data and survival time. However, the longitudinal outcomes may not be continuous in some biomedical studies, for example, where the outcomes are disease symptom with categories of mild/moderate/severe, quality of life measurements with dissatisfied/satisfied, or dichotomized test results with categories of positive/negative. With these categorical longitudinal outcomes, the existing theory for continuous longitudinal outcomes cannot be applied directly and the numerical algorithm needs to be modified. Therefore, in this paper, we investigate the simultaneous modeling of survival time and longitudinal categorical outcomes. Survival time is modeled in the Cox proportional hazards with the unspecified baseline hazard rate, and furthermore the hazards model is extended to allow multiple strata. Random effects are introduced into the proposed models to account for the dependence between survival time and longitudinal outcomes due to unobserved factors. We also establish the theoretical justification of the asymptotic properties of the maximum likelihood estimates by employing the theory of empirical process. Therefore, the contributions of this paper to the recent developments in the simultaneous modeling of categorical longitudinal data and survival data since Xu and Zeger (2001b) are the following: (1) We propose an efficient estimation based on the Nonparametric Maximum Likelihood Estimation (NPMLE) with no assumption for baseline hazard rates while Dunson and Herring (2005), Rizopoulos, Verbeke, Lesaffre and Vanrenterghem (2008), and Chen, Ghosh, Raghunathan, and Sargent (2009) considered parametric models for survival outcomes. (2) We implemented our proposed model in an Expectation-Maximization (EM) algorithm while Xu and Zeger (2001b), Dunson and Herring (2005), and Chen, Ghosh, Raghunathan, and Sargent (2009) proposed Bayesian estimation methods. (3) We provided an asymptotic theory for our efficient estimators.

The outline of this paper is as follows. In Section 2, we present a simultaneous modeling for longitudinal categorical outcomes and survival time, and describe the inference procedure. Asymptotic properties of the proposed estimators are investigated in Section 3, and numerical results from simulation studies are given in Section 4. Our proposed method is illustrated with the data from the Carolina Head and Neck Cancer Study (CHANCE) in Section 5. In Section 6, we discuss some further consideration and generalization. EM-algorithms are provided in Appendix A and the proofs for asymptotic results are given in Appendix B.

2 Model and Inference Procedure

2.1 Model formulation and notation

We use Y(t) to denote the value of a longitudinal marker process at time t. Suppose Y(t) is from a distribution belonging to exponential family in order to incorporate both continuous and categorical measurements. Let T denote survival time, and suppose that the survival time T is possibly right censored. Suppose a set of n subjects are followed over an interval [0,τ], where τ is the study end time. Denote b_i, i = 1,…n, as a vector of subject-specific random effects of dimension d_b and b_i’s are mutually independent and identically distributed from a multivariate normal with mean zero and covariance matrix Σ_b.

Given the random effects b_i, the observed covariates, and the observed outcome history till time t, we assume that the longitudinal outcome Y_i(t) at time t for subject i follows a distribution from the exponential family with density,

\exp {\frac{y_{i} η_{i} (t) - B (η_{i} (t))}{A (D_{i} (t; ϕ))} + C (y_{i}, D_{i} (t; ϕ))}

(1)

with $μ_{i} (t) = E (Y_{i} (t) ∣ b_{i}) = B^{'} (η_{i} (t))$ and $ν_{i} (t) = Var (Y_{i} (t) ∣ b_{i}) = B^{″} (η_{i} (t)) A (D_{i} (t; ϕ))$ , satisfying

η_{i} (t) = g (μ_{i} (t)) = X_{i} (t) β + {\tilde{X}}_{i} (t) b_{i}

and v_i(t) = v(μ_i(t))A(D_i(t;ϕ)), where g(·) and v(·) are known link and variance functions respectively, X_i(t) and ${\tilde{X}}_{i} (t)$ are the row vectors of the observed covariates for subject i, and β is a column vector of coefficients for X_i(t). The random effect b_i is allowed to differ for different individuals. Additionally, X_i(t) and ${\tilde{X}}_{i} (t)$ can be completely different or share some components, and may include dummy variables for different strata.

Given the random effects b_i, the observed covariates, and the observed survival history before time t, the conditional hazard rate function for the survival time T_i of subject i is assumed to follow a stratified multiplicative hazards model,

λ_{s} (t) \exp {{\tilde{Z}}_{i} (t) (ψ \circ b_{i}) + Z_{i} (t) γ},

(2)

where Z_i(t) and ${\tilde{Z}}_{i} (t)$ are the row vectors of the observed covariates and may share some components, ψ is a vector of parameters of the coefficients for random effects, γ is a column vector of coefficients for Z_i(t), and λ_s(t) is the s-th stratum baseline hazard rate function so that the baseline hazard rate is allowed to vary across levels of the stratification variable. Note that Z_i(t) does not include dummy variables for strata since baseline hazard rate is stratum-specific. We assume common fixed effects and random effects across strata in both hazard and longitudinal models. However, the model may allow for possibly different covariate effects for different strata, which can be achieved by including interaction terms of the covariates with the indicator variables for the stratification variable. Subjects in different strata are assumed to be independent. Here, for any vectors a₁ and a₂ of the same dimension, $a_{1} \circ a_{2}$ denotes the component-wise product. In addition, ${\tilde{X}}_{i} (t)$ and ${\tilde{Z}}_{i} (t)$ have the same dimensions as b_i’s.

Under models (1) and (2), the two outcomes Y(t) and T are independent conditional on the covariates and random effect. The parameter ψ in model (2) characterizes the dependence between the longitudinal outcomes and the survival time due to latent random effect: When the k-th component of ψ is 0 (i.e. ψ_k = 0), it implies that the dependence between the survival time and longitudinal responses is not due to the corresponding latent variable b_ik; ψ_k ≠ 0 implies that such dependence may be due to the corresponding latent variable b_ik.

Let n_i be the number of the observed longitudinal measurements for subject i, and assume that the distributions of n_i and the observation times for longitudinal measurements are independent of the parameters of interest conditional on b_i in this joint model. We also assume n_i is bounded, which is a reasonable assumption in many biomedical studies. The observed data from n subjects are (n_i,Y_ij,X_ij, ${\tilde{X}}_{ij}$ ), j=1,…,n_i, i=1,…,n, and $(V_{i}, Δ_{i}, S_{i}, {(Z_{i} (t), {\tilde{Z}}_{i} (t)) : t \leq V_{i}}), i = 1, \dots, n$ , where for subject i, $(Y_{ij}, X_{ij}, {\tilde{X}}_{ij})$ is the j-th observation of $(Y_{i} (t), X_{i} (t), {\tilde{X}}_{i} (t))$ , C_i is the right-censoring time and independent of T_i and Y_i(t) given the covariates and the random effects, V_i = min(T_i;C_i), S_i denotes the stratum, and Δ_i = I(T_i ≤ C_i).

Our goal is to estimate and make inferences on the parameters θ =(β^T,ϕ^T,Vech(Σ_b)^T,ψ^T,γ^T)^T and the baseline cumulative hazard functions with S strata, Λ(t)=(Λ₁(t),…,Λ_S(t))^T, where $Λ_{s} (t) = \int_{0}^{t} λ_{s} (u) du$ , s=1,…,S. The parameters β and ϕ are from the longitudinal model, ψ and γ are from the hazard model, and Σ_b is associated with the random effects. Vech(·) operator creates a column vector from a matrix by stacking the diagonal and upper-triangle elements of the matrix.

2.2 Inference procedure

For all n subjects, we write $Y = {(Y_{1}^{T}, \dots, Y_{n}^{T})}^{T}, Y_{i} = {(Y_{i 1}, \dots, Y_{{in}_{i}})}^{T}, V = {(V_{1}, \dots, V_{n})}^{T}$ , and $b = {(b_{1}^{T}, \dots, b_{n}^{T})}^{T}$ . We also denote $X, \tilde{X}, Z$ , and $\tilde{Z}$ as block diagonal matrices with the i-th diagonal components, $X_{i} = {(X_{i 1}^{T}, \dots, X_{{in}_{i}}^{T})}^{T} {\tilde{X}}_{i} {({\tilde{X}}_{i 1}^{T}, \dots, {\tilde{X}}_{{in}_{i}}^{T})}^{T}, Z_{i} = {(Z_{i 1}^{T}, \dots, Z_{{in}_{i}}^{T})}^{T}$ , and ${\tilde{Z}}_{i} = {({\tilde{Z}}_{i 1}^{T}, \dots, {\tilde{Z}}_{{in}_{i}}^{T})}^{T}$ , respectively, and S=(S₁,…,S_n)^T. Then, the likelihood function of the complete data $(Y, V, b; X, \tilde{X}, Z, \tilde{Z}, S)$ has the form,

\begin{matrix} L_{c} (θ & , Λ; Y, V, b) \\ = & \prod_{s = 1}^{S} \prod_{i = 1}^{n} {[f (Y_{i}, V_{i} ∣ b_{i}) f (b_{i})]}^{I (S_{i} = s)} = \prod_{i = 1}^{n} f (Y_{i} ∣ b_{i}) (\prod_{s = 1}^{S} {[f (V_{i} ∣ b_{i})]}^{I (S_{i} = s)}) f (b_{i}) \\ = & \prod_{i = 1}^{n} \exp {\sum_{j = 1}^{n_{i}} [\frac{Y_{ij} (X_{ij} β + {\tilde{X}}_{ij} b_{i}) - B (β; b_{i})}{A (D_{i} (t_{j}; ϕ))} + C (Y_{ij}; D_{i} (t_{j}; ϕ))]} \\ \times (\prod_{s = 1}^{S} [λ_{s} {(V_{i})}^{Δ_{i}} \exp {Δ_{i} [{\tilde{Z}}_{i} (V_{i}) (ψ \circ b_{i}) + Z_{i} (V_{i}) γ] \\ - \int_{0}^{V_{i}} {\exp {{\tilde{Z}}_{i} (u) (ψ \circ b_{i}) + Z_{i} (u) γ} d Λ_{s} (u)}]}^{I (S_{i} = s)}) \\ \times {(2 π)}^{- d_{b} ∕ 2} {∣ Σ_{b} ∣}^{- 1 ∕ 2} \exp {- \frac{1}{2} b_{i}^{T} Σ_{b}^{- 1} b_{i}}, \end{matrix}

and the full likelihood function of the observed data $(Y, V; X, \tilde{X}, Z, \tilde{Z}, S)$ for the parameter (θ,Λ) is expressed as

L_{f} (θ, Λ; Y, V) = \int_{b} L_{c} (θ, Λ; Y, V, b) db .

(3)

The proposed estimation method is to calculate the maximum likelihood estimates for (θ,Λ(t)) over a set of θ and Λ(t). We let each Λ_s(t) of Λ(t), s = 1,…,S, be a non-decreasing and right-continuous step function with jumps only at the observed failure times belonging to stratum s.

EM-algorithm is used for calculating the maximum likelihood estimates. In the EM-algorithm, b_i is considered as missing data for i = 1,…,n. Therefore, the M-step solves the conditional score equations from complete data given observations, where the conditional expectation can be evaluated in E-step. The procedure involves iterating between the following two steps until convergence is achieved: at the k-th iteration,

(1) E-step

Calculate the conditional expectations of some known functions of b_i, needed in the next M-step, for subject i with S_i = s given observations and the current estimate $(θ^{(k)}, Λ_{s}^{(k)})$ . To do this, denote q(b_i) and $E [q (b_{i}) ∣ θ^{(k)}, Λ_{s}^{(k)}]$ as a known function and its conditional expectation, respectively. By some algebra, $E [q (b_{i}) ∣ θ^{(k)}, Λ_{s}^{(k)}]$ can be expressed in terms of a vector of new variables z_G following a multivariate Gaussian distribution with mean zero. The conditional expection is calculated using the Gauss-Hermite Quadrature numerical approximation with 20 quadrature points.

(2) M-step

After differentiating the conditional expectation of complete data log-likelihood function given observations and the current estimate (θ^(k),Λ^(k)), the updated estimator (θ^(k+1),Λ^(k+1)) can be obtained as follows: (β^(k+1),ϕ^(k+1)) solves the conditional expectation of complete data log-likelihood score equation using one-step Newton-Raphson iteration;

Σ_{b}^{(k + 1)} = \frac{1}{n} \sum_{i = 1}^{n} \sum_{s = 1}^{S} E [b_{i} b_{i}^{T} ∣ θ^{(k)}, Λ_{s}^{(k)}] I (S_{i} = s);

(ψ^(k+1),γ^(k+1)) solves the partial likelihood score equation from the full data using one-step Newton-Raphson iteration,

\begin{matrix} \sum_{i = 1}^{n} \sum_{s = 1}^{S} Δ_{i} {(\begin{matrix} E [({\tilde{Z}}_{i}^{T} (V_{i}) \circ b_{i}) ∣ θ^{(k)}, Λ_{s}^{(k)}] \\ Z_{i} \end{matrix}) \\ - \frac{\sum_{l : V_{l} \geq V_{i}} (\begin{matrix} E [({\tilde{Z}}_{l}^{T} (V_{i}) \circ b_{l}) \exp {{\tilde{Z}}_{l} (V_{i}) (ψ \circ b_{l}) + Z_{l} (V_{i}) γ ∣ θ^{(k)}, Λ_{s}^{(k)}] \\ E [Z_{l} (V_{i}) \exp {{\tilde{Z}}_{l} (V_{i}) (ψ \circ b_{l}) + Z_{l} (V_{i}) γ} ∣ θ^{(k)}, Λ_{s}^{(k)}] \end{matrix}) I (S_{l} = s)}{\sum_{l : V_{l} \geq V_{i}} E [\exp {{\tilde{Z}}_{l} (V_{i}) (ψ \circ b_{l}) + Z_{l} (V_{i}) γ} ∣ θ^{(k)}, Λ_{s}^{(k)}] I (S_{l} = s)}} I (S_{i} = s) \\ = 0; \end{matrix}

$Λ_{s}^{(k + 1)}$ is obtained as an empirical function with jumps only at the observed failure time,

Λ_{s}^{(k + 1)} (t) = \sum_{i : V_{i} \leq t} \frac{Δ_{i} I (S_{i} = s)}{\sum_{l : V_{l} \geq V_{i}} E [\exp {{\tilde{Z}}_{l} (V_{i}) (ψ^{(k + 1)} \circ b_{l}) + Z_{l} (V_{i}) γ^{(k + 1)}} ∣ θ^{(k)}, Λ_{s}^{(k)}] I (S_{l} = s)} .

The expressions of the conditional expectation and the conditional score equations calculated in the E- and M-steps for binary and Poisson longitudinal outcomes with survival time are given respectively in Appendices A.1 and A.2.

The observed information matrix is adopted to obtain the variance estimate for $(\hat{θ}, \hat{Λ} (t))$ . For the numerical calculation of the observed information matrix, we consider Λ_s{V_i}, the jump size of Λ_s(t) at V_i belonging to stratum s for which Δ_i = 1, instead of λ_s(V_i). That is, $Λ {\cdot} = {(Λ_{1}^{T} {\cdot}, \dots, Λ_{S}^{T} {\cdot})}^{T}$ with Λ_s{·}=(Λ{T_s1},…,Λ{T_{sm_s}})^T for m_s failure times among n_s subjects (0 ≤ m_s ≤ n_s) of the s-th stratum, s = 1,…,S. Then, by the Louis (1982) formula,

\begin{matrix} I (θ, Λ {\cdot}; Y, V) = & E_{b ∣ Y, V} [B_{c} (θ, Λ {\cdot}; Y, V, b) ∣ Y, V] \\ - E_{b ∣ Y, V} [U_{c} (θ, Λ {\cdot}; Y, V, b) U_{c}^{T} (θ, Λ {\cdot}; Y, V, b) ∣ Y, V] \\ + E_{b ∣ Y, V} [U_{c} (θ, Λ {\cdot}; Y, V)] E_{b ∣ Y, V} [U_{c}^{T} (θ, Λ {\cdot}; Y, V)], \end{matrix}

where U_c(θ,λ{·};(Y,V,b) and B_c(θ,Λ{·};(Y,V,b) are respectively the first derivative vector and the negative of the second derivative matrix of the complete data log-likelihood l_c(θ,Λ{·};(Y,V,b) with respect to (θ,Λ{·}). The variance of $\sqrt{n} \hat{θ}$ is asymptotically equal to the corresponding sub-matrix of the inverse of the calculated observed information matrix. The variance of $\hat{Λ} (t)$ is obtained using the estimated variances and covariances corresponding to Λ{·} from the inverse of the observed information matrix where T ≤ t at the observed failures. In the EM-algorithm for variance estimation, we evaluate these conditional expectations only at the last iteration of the EM procedure for point estimation, where the conditional expectation of U_c is zero.

3 Asymptotic Properties

To study the asymptotic properties of the proposed estimator $(\hat{θ}, \hat{Λ} (t))$ with $\hat{θ} = {({\hat{β}}^{T}, {\hat{ϕ}}^{T}, Vech {({\hat{Σ}}_{b})}^{T}, {\hat{ψ}}^{T}, {\hat{γ}}^{T})}^{T}$ and $\hat{Λ} (t) = {({\hat{Λ}}_{1} (t), \dots, {\hat{Λ}}_{S} (t))}^{T}$ , we assume the following conditions below.

(A1) The true parameter $θ_{0} = {(β_{0}^{T}, ϕ_{0}^{T}, Vech {(Σ_{b 0})}^{T}, ψ_{0}^{T}, γ_{0}^{T})}^{T}$ belongs to a known compact set Θ which lies in the interior of the domain for θ.
(A2) The true baseline hazard rate function λ₀(t) = (λ₁₀(t),…,λ_S0(t)) is bounded and positive in [0,τ].
(A3) For the censoring time $C, P (C \geq τ ∣ Z, \tilde{Z}) = P (C = τ ∣ Z, \tilde{Z}) > 0$ .
(A4) For the number of observed longitudinal measurements per subject n_i, $P (n_{i} > d_{b} ∣ X, \tilde{X}) > 0$ with probability one, and P(n_i ≤ n₀) = 1 for some integer n₀.
(A5) Both X^TX and ${\tilde{X}}^{T} \tilde{X}$ are full rank with positive probability. Moreover, if there exist constant vectors c₁ and c₂ such that, with positive probability, for any t, Z(t)c₁ = α₀(t) and $\tilde{Z} (t) \circ c_{2} = 0$ for a deterministic function α₀(t), then c₁ = 0, c₂ = 0, and α₀(t) = 0.

Assumption (A3) means that, by the end of the study, some proportion of the subjects will still be alive and censored at the study end time τ, and thus the maximum right censoring time is equal to τ. Assumption (A4) implies that some proportion of the subjects have at least d_b longitudinal observations, and there exists an integer n₀ such that P(n_i ≤ n₀) = 1. Consistency and asymptotic distribution of the proposed estimator are summarized in the following two theorems. The proofs for Theorem 1 and Theorem 2 are given in Appendices B.1 and B.2, respectively, and we will present outlines of the proofs here.

Theorem 1

Under the assumptions (A1)~(A5), as n→∞, the maximum likelihood estimator $(\hat{θ}, \hat{Λ} (t))$ is consistent under the product norm of the Euclidean distance and the supremum norm on [0,τ]. That is, $‖ \hat{θ} - θ_{0} ‖ + \sup_{t \in [0, τ]} ‖ \hat{Λ} (t) - Λ_{0} (t) ‖ \to 0$ as., where $‖ \hat{Λ} (t) - Λ_{0} (t) ‖ = \sum_{s = 1}^{S} ∣ {\hat{Λ}}_{s} (t) - Λ_{s 0} (t) ∣$ .

Consistency in Theorem 1 can be proved by verifying the following three steps: First, we show that the maximum likelihood estimate $(\hat{θ}, \hat{Λ})$ exists. This can be achieved by showing that the jump size Λ_s{V_i}, with Δ_i = 1, is finite. Second, we show that, with probability one, ${\hat{Λ}}_{s} (τ), s = 1, \dots, S$ , are bounded as n → ∞. This can be proved by showing $\log {\hat{Λ}}_{s} (τ)$ is bounded. Third, given that the second step is true, by Helly’s selection theorem (van der Vaart, 1998), we can choose a subsequence of ${\hat{Λ}}_{s} (t)$ such that ${\hat{Λ}}_{s} (t)$ weakly converges to some right-continuous monotone function $Λ_{s}^{*} (t)$ with probability one. For any sub-sequence, we can find a further sub-sequence, still denoted as $\hat{θ}$ , such that $\hat{θ} \to θ^{*}$ . Using empirical process formulation and relevant Donsker properties with parameter identifiability, we can show that θ*=θ₀ and $Λ_{s}^{*} = Λ_{s 0}, s = 1, \dots, S$ . Based on these results, we can conclude that, with probability one, $\hat{θ}$ converges to θ₀ and ${\hat{Λ}}_{s} (t)$ converges to Λ_s0(t) in [0,τ], s=1,…,S. Moreover, since Λ_s0(t) is right-continuous in [0,τ], the latter can be strengthened to uniform convergence; that is, $\sup_{t \in [0, τ]} ‖ \hat{Λ} (t) - Λ_{0} (t) ‖ \to 0$ almost surely.

Theorem 2

Under the assumptions (A1)~(A5), as $n \to \infty, \sqrt{n} {({(\hat{θ} - θ_{0})}^{T}, {(\hat{Λ} (t) - Λ_{0} (t))}^{T})}^{T}$ weakly converges to a Gaussian random element in $R^{d_{θ}} \times ℓ^{\infty} [0, τ] \times \dots \times ℓ^{\infty} [0, τ]$ , and the estimator $\hat{θ}$ is asymptotically efficient, where d_θ is the dimension of θ and l^∞[0,τ] is the normed space containing all the bounded functions in [0,τ].

Once consistency is held, the conditions of Theorem 3.3.1 in van der Vaart and Wellner (1996), which implies the asymptotic normality of Theorem 2, can be verified via the tools of empirical processes. These conditions are restated in Theorem 4 of Parner (1998). The smoothness conditions in Theorem 4 of Parner (1998) can be verified using the regularity of the log-likelihood function in terms of model parameters and the Donsker properties of the score operators. By Theorem 3.3.1 of van der Vaart and Wellner (1996), $\sqrt{n} (\hat{θ} - θ_{0}, {\hat{Λ}}_{s} - Λ_{s 0})$ weakly converges to a Gaussian process, and, by Proposition 3.3.1 in Bickel et al. (1993), $\hat{θ}$ is an efficient estimator for θ₀.

When the sample size increases, the number of event times also increases. However, our asymptotic theory is not based on classical asymptotic theory. Instead it is based on modern empirical process theory, where the nuisance parameter is allowed to be infinite dimension. Thus, the asymptotic properties of the proposed estimator are not affected by the increment of the number of event times.

4 Simulation Studies

In this section, we present some results from our simulation studies. Two sets of simulations with different generalized linear mixed models for the longitudinal outcomes are performed. Binary and Poisson data are considered for longitudinal process in the first and second sets of simulations, respectively.

4.1 Binary longitudinal outcomes and survival time

In this first set of simulations, we assume Y_ij to be the j-th binary outcome of the i-th subject following

P (Y_{ij} = y_{ij} ∣ b_{i}) = \exp {y_{ij} η_{ij} - \log (1 + \exp {η_{ij}})}, y_{ij} = 0, 1,

(4)

with η_ij = X_ijβ+b_i=β₀+β₁X_1i+β₂X_2i+β₃X_{3_ij}+b_i for j=1,…,n_i, and consider the hazard of the i-th subject at time t to be

h (t ∣ b_{i}) = λ (t) \exp {ψ b_{i} + Z_{i} (t) γ} = λ (t) \exp {ψ b_{i} + γ_{1} Z_{1 i} + γ_{2} Z_{2 i}},

(5)

where $b_{i} \sim N (0, σ_{b}^{2})$ , and, for simplicity, one stratum of hazard for survival time is simulated.

X_1i≡Z1_i are generated from a Bernoulli distribution with success probability being 0.5, and X_2i≡ Z_2i are simulated from the uniform distribution between 0 and 1. They are included in the hazard and longitudinal models with a fixed effect intercept. There is one additional covariate denoted as X_3ij, the time at measurement, included in the longitudinal model. We suppose the longitudinal data are observed for every 0.3 unit of time, and thus X_3ij has the value of every 0.3 unit ranging over 0 through 2.4. The longitudinal data are generated from the Bernoulli distribution with the success probability P(Y_ij = 1|b_i) given in (4) and the average number of longitudinal observations (n_i) per subject is 3 with the range of 1 to 8. To generate the survival time, we first generate u_i from uniform (0,1) distribution. For a given hazard function λ, the survival time is then generated by t_i = −log(u_i)×exp{−(ψb_i+γ₁+γ₂Z_2i)}/λ. Censoring time is generated from the uniform distribution between 0.4 and 2.4 so that the censoring proportion is around 25~35%. The observed survival time is obtained by the minimum of the generated survival and censoring times. For the comparison of the estimated baseline cumulative hazards over simulations, we consider three time points: 0.9, 1.4, and 1.9, which correspond to the quartiles of the true survival distribution. The three time points are not the only distinct survival times but are selected to report the estimated cumulative hazard function at these points.

We consider different ψ values of −1, 0, and 1 for negative, zero, and positive dependency between longitudinal process and survival time model, respectively. The parameters in the two models are chosen as β₀ = −1, β₁ = 1, β₂ = −0.5, β₃ = −0.5, $σ_{b}^{2} = 0.5$ , ψ=−1/0/ 1, γ₁=−1, γ₂=1, and λ(t). Different sample sizes (n=200, 400) are simulated with 1000 replications. The results of the maximum likelihood estimates for θ and the baseline cumulative hazards at the three time points with their respective standard error estimates are reported in Table 1. The simulation study is conducted using R.

Table 1.

Summary of simulation results of maximum likelihood estimation for binary longitudinal outcomes and survival time.

				n=200				n=400
ψ	Par.	TRUE	Est.	SSD	ESE	CP	Est.	SSD	ESE	CP
−1	β ₀	−1	−1.019	.271	.273	.953	−1.002	.189	.192	.960
	β ₁	1	1.018	.235	.241	.962	1.000	.169	.169	.951
	β ₂	− .5	− .494	.413	.396	.942	− .498	.275	.277	.946
	β ₃	− .5	− .469	.223	.221	.947	− .484	.158	.154	.946
	$σ_{b}^{2}$	.5	.518	.216	.270	.966	.519	.164	.190	.955
	ψ	−1	− .967	.471	.584	.923	− .988	.364	.420	.921
	γ ₁	−1	−1.002	.242	.254	.961	−1.002	.177	.183	.960
	γ ₂	1	1.001	.379	.393	.962	.995	.278	.279	.953
	Λ( .9)	.9	.922	.228	.227	.959	.909	.167	.157	.943
	Λ(1:4)	1.4	1.442	.397	.389	.944	1.421	.283	.269	.952
	Λ(1:9)	1.9	1.956	.594	.600	.953	1.950	.456	.426	.949
0	β ₀	−1	−1.016	.281	.276	.957	−1.004	.191	.193	.956
	β ₁	1	1.003	.258	.247	.936	.994	.170	.173	.957
	β ₂	− .5	− .476	.414	.401	.941	−.485	.275	.281	.959
	β ₃	− .5	− .496	.235	.238	.957	−.498	.169	.167	.948
	$σ_{b}^{2}$	.5	.500	.233	.287	.957	.497	.181	.200	.952
	ψ	0	.021	.331	.378	.996	.001	.235	.244	.990
	γ ₁	−1	−1.031	.197	.191	.949	−1.014	.133	.131	.953
	γ ₂	1	1.036	.314	.315	.953	1.015	.223	.218	.944
	Λ( .9)	.9	.912	.189	.187	.952	.906	.134	.129	.942
	Λ(1:4)	1.4	1.450	.315	.307	.958	1.418	.215	.207	.950
	Λ(1:9)	1.9	1.990	.485	.464	.948	1.935	.322	.308	.948
1	β ₀	−1	−1.014	.273	.284	.952	−1.007	.200	.198	.955
	β ₁	1	1.017	.252	.251	.956	1.011	.176	.176	.952
	β ₂	− .5	− .518	.428	.412	.953	− .512	.287	.287	.950
	β ₃	− .5	− .540	.245	.248	.954	− .520	.176	.174	.946
	$σ_{b}^{2}$	.5	.543	.250	.300	.947	.524	.179	.209	.966
	ψ	1	.956	.488	.609	.898	.992	.366	.450	.930
	γ ₁	−1	−1.000	.264	.255	.945	− .998	.176	.183	.953
	γ ₂	1	1.009	.381	.395	.961	.990	.283	.280	.953
	Λ( .9)	.9	.921	.235	.228	.961	.918	.166	.159	.937
	Λ(1:4)	1.4	1.443	.412	.393	.963	1.430	.275	.271	.956
	Λ(1.9)	1.9	1.976	.677	.620	.957	1.946	.416	.424	.959

Open in a new tab

In Table 1, “True” gives the true values of the parameters; the averages of the maximum likelihood estimates from the EM algorithm are in “Est.”; the sample standard deviations from 1000 simulations are reported in “SSD”; “ESE” is the average of 1000 standard error estimates based on the observed information matrix; “CP” is the coverage proportion of the 95% confidence intervals based on the estimated standard error “ESE”. Satterthwaite (1946) method is used for the coverage probability of $σ_{b}^{2}$ .

From Table 1, we can see that even for the smaller sample size (n=200), the bias of the estimates from EM algorithm is negligible for most cases. The estimated standard errors calculated from the observed information matrix are close to the sample standard deviations from the 1000 estimates, and the 95% confidence interval coverage rates are close to 0.95 except those for ψ. However, the coverage rate of the parameter ψ is improved for larger sample size. Additional simulations we conducted show that, with sample sizes of 800, the coverage rates for ψ=−1, 0 and 1 were 95.5%, 95.9% and 95.9%, respectively. In addition, the simulations show that the variances of the estimators decrease as the sample size (n) increases. We also can see that the estimates are fairly robust and close to the true values for all different ψ values.

4.2 Poisson longitudinal outcomes and survival time

In the second set of simulations, we assume Y_ij to follow a Poisson distribution,

P (Y_{ij} = y_{ij} ∣ b_{i}) = \exp {y_{ij} η_{ij} - \exp {η_{ij}} - \log (y_{ij}!)},

with η_ij defined as in Section 4.1. We also consider the same hazards model and simulation setting as those used in Section 4.1 except $σ_{b}^{2} = 0.2$ . The simulated Poisson longitudinal outcomes range over 0 to 7 with the average 0.5.

Table 2 shows that overall the estimates perform well even for the smaller sample size n = 200 with small biases of the estimates except ψ. We conducted additional simulations with sample sizes of 800 and 1000, and the bias for ψ decreases as sample size increases. The estimated standard errors using the observed information matrix are close to the sample standard deviations, and the 95% confidence interval coverage rates are close to 0.95 except for $σ_{b}^{2}$ and ψ=0.

Table 2.

Summary of simulation results of maximum likelihood estimation for Poisson longitudinal outcomes and survival time.

				n=200				n=400
ψ	Par.	TRUE	Est.	SSD	ESE	CP	Est.	SSD	ESE	CP
−1	β ₀	−1	− .995	.203	.196	.940	−1.005	.138	.138	.948
	β ₁	1	.998	.178	.171	.942	1.007	.119	.121	.945
	β ₂	− .5	− .510	.273	.264	.946	− .502	.192	.186	.935
	β ₃	− .5	− .489	.150	.151	.950	− .492	.107	.106	.949
	$σ_{b}^{2}$	.2	.196	.074	.092	.983	.200	.055	.065	.976
	ψ	−1	−1.038	.716	.771	.947	−1.003	.503	.514	.934
	γ ₁	−1	−1.025	.227	.228	.969	−1.014	.160	.158	.954
	γ ₂	1	1.034	.371	.358	.947	1.009	.248	.248	.959
	Λ( .9)	.9	.918	.222	.208	.943	.909	.141	.142	.944
	Λ(1.4)	1.4	1.456	.403	.357	.943	1.424	.235	.237	.953
	Λ(1:9)	1.9	1.999	.632	.560	.953	1.948	.368	.366	.961
0	β ₀	−1	−1.007	.202	.199	.943	−1.003	.137	.140	.957
	β ₁	1	1.010	.184	.175	.932	.998	.120	.124	.951
	β ₂	− .5	− .513	.283	.268	.937	− .505	.193	.189	.942
	β ₃	− .5	− .500	.164	.161	.951	− .489	.114	.114	.950
	$σ_{b}^{2}$	.2	.199	.074	.097	.981	.206	.055	.070	.973
	ψ	0	.006	.571	.610	.993	.011	.391	.387	.978
	γ ₁	−1	−1.039	.188	.194	.958	−1.015	.128	.132	.953
	γ ₂	1	1.021	.326	.319	.948	1.003	.226	.219	.944
	Λ( .9)	.9	.917	.191	.188	.952	.909	.129	.130	.950
	Λ(1.4)	1.4	1.448	.313	.308	.953	1.432	.206	.210	.950
	Λ(1.9)	1.9	2.006	.478	.473	.954	1.966	.312	.316	.950
1	β ₀	−1	−1.014	.195	.202	.952	−1.004	.138	.142	.954
	β ₁	1	1.014	.180	.178	.954	1.008	.126	.125	.951
	β ₂	− .5	− .512	.273	.271	.947	− .511	.190	.191	.945
	β ₃	− .5	− .514	.174	.172	.943	− .509	.123	.122	.959
	$σ_{b}^{2}$	.2	.201	.083	.098	.967	.204	.060	.070	.960
	ψ	1	.993	.664	.768	.942	1.008	.477	.512	.942
	γ ₁	−1	−1.030	.230	.224	.952	−1.003	.158	.157	.950
	γ ₂	1	1.014	.363	.354	.949	1.006	.252	.247	.942
	Λ( .9)	.9	.925	.22	.207	.949	.91	.144	.142	.948
	Λ(1.4)	1.4	1.46	.389	.351	.95	1.435	.246	.237	.942
	Λ(1.9)	1.9	2.018	.639	.554	.957	1.957	.373	.365	.947

Open in a new tab

From Table 2, $σ_{b}^{2}$ is seemingly underestimated with higher than the nominal coverage rates, but the coverage rate is improved for larger sample size. This implies that variance of $σ_{b}^{2}$ may not be estimated well with small sample size for Poisson longitudinal distribution. In the mean time, the test for $σ_{b}^{2}$ is conservative with small sample size, but the type I error becomes closer to the nominal level as sample size increases. Profile likelihood may be an alternative estimation approach for $σ_{b}^{2}$ . The 95% confidence interval coverage for ψ=0 also appears to be higher than the nominal level, but the additional simulation with sample sizes of 800 shows that the high coverage rate reduces to the 95% nominal level. Table 2 also shows that the variances of the estimators decrease for larger sample size, and the estimates are fairly robust and close to the true values for all three different ψ values.

In all the simulations, the EM algorithm converged within 60 iterations. The CPU time used for 1000 data sets in the simulation studies was about 6 hours for sample size 200 averaging 20 seconds per each data set and 15 hours for sample size 400 averaging 1 minute per each data set on a computer with a 64-bit operating system.

5 Analysis of the CHANCE Study

The Carolina Head and Neck Cancer Study (CHANCE) is a population based epidemiologic study conducted at 60 hospitals in 46 counties in North Carolina from 2002 through 2006 (Divaris et al. 2010). Patients were diagnosed with head and neck cancer (oral, pharynx, and larynx cancer) from 2002–2006. Their survival status was collected up to 2007 and QoL was evaluated over time for three years after diagnosis. QoL information was collected through questionnaires. Based on summary scores of the five domains of self-perceived quality of life including Physical Well-Being (PWB), Social/Family Well-Being (SWB), Emotional Well-Being (EWB), Functional Well-Being (FWB) and Head and Neck Cancer Specific symptoms (HNCS), patient’s QoL information was classified into satisfaction or dissatisfaction with life. Survival time is defined as the time to death from diagnosis. Demographic and life style characteristics, medical histories and clinical factors are also collected. Ending in December 2009, information on QoL has been obtained from 554 head and neck cancer patients in the analysis. Based on the death information through 2007 available from the National Death Index (NDI), 85 of 554 patients died and the censoring rate is 85%. The number of observations per patient ranges from 1 to 3 with average of 1.93 which may look sparse. However, even if the number of longitudinal measurements per subject is sparse, since our estimation is based on pooling information from all the subjects, the actual measurements used for estimation are not sparse. It is of interest to elucidate the variables which are associated with both QoL satisfaction and survival time for patients with head and neck cancer. In particular, we are interested in the comparison between African-Americans and Whites since it is known that African-Americans have a higher incidence of head and neck cancer and worse survival than Whites. The longitudinal QoL satisfaction outcomes and survival time are correlated within a patient, and this dependency should be taken into account in the analysis.

We apply our proposed method to the Head and Neck Cancer Specific symptoms (HNCS) among the QoL domains with survival time. Longitudinal HNCS QoL outcomes are binary measurements with 1 (“satisfied”) and 0 (“dissatisfied”). We are interested in investigating which factors are related to QoL satisfaction and the risk of death. In the full models for both longitudinal QoL and survival time, we consider race (African-Americans, Whites), the number of 12 oz. beers consumed per week (None, <1, 1–4, 5–14, 15–29, ≥ 30), household income (0–10K, 20–30K, 40–50K, ≤ 60K), surgery (Yes/No), radiation therapy (Yes/No), chemotherapy (Yes/No), primary tumor site (Oral & Pharyngeal, Laryngeal) and tumor stage (I, II, III, IV) as categorical variables, and age at diagnosis (range: 24–80), the number of persons supported by household income (range: 1–5), body mass index (BMI) (range: 15.66–56.28) and the total number of medical conditions reported (range: 0–6) as continuous variables. Additionally, 2 interactions with race, i.e. race × the total number of medical conditions reported and race × tumor site, are included in both models since we are particularly interested in the difference of QoL and survival between African American and White. Time at survey measurement is also included as a covariate for longitudinal outcomes. A random intercept for the dependence between the QoL satisfaction and the risk of death is included in both models, and assumed to follow a normal distribution with mean zero. In addition to the simultaneous analysis, we also conduct separate analyses fitting the generalized linear mixed model and the Cox proportional hazards model to the longitudinal QoL and survival time respectively and compare the results to those from our proposed simultaneous method.

After fitting the simultaneous models with all the covariates, we use backward variable selection based on the Likelihood Ratio Test (LRT) and find that surgery, chemotherapy, tumor site, age at diagnosis, and all 2 interactions are not statistically significant in both models for HNCS QoL satisfaction and survival time at the significance level 0.05. We remove these variables and refit the simultaneous models. Then, the LRT shows that race, radiation therapy, the number of persons supported by household income, BMI, and the total number of medical conditions reported are not statistically significant for the risk of death. We further reduce the models by removing them from the hazards model and refit the reduced simultaneous models.

Table 3 gives the results from this final models. From the “Simultaneous” columns, we see that the number of 12 oz. beers consumed per week, household income, tumor stage, and the total number of medical conditions reported are significantly associated with both patients’ HNCS QoL satisfaction and hazard of death. Using 30 or more of 12 oz. beers consumed per week as the reference group, all categories of the smaller amount are associated with HNCS QoL satisfaction and lower risk of death, higher household income is in general associated with HNCS QoL satisfaction and lower risk of death, both patients’ HNCS QoL satisfaction and risk of death are significantly different for patients in different tumor stages, and patients with a greater number of medical conditions reported have lower HNCS QoL satisfaction and higher risk of death. Specifically, for instance, with the log-scaled odds and hazard ratios of 1.060 and −1.076 for HNCS QoL satisfaction and death respectively, patients who consumed 5 to 14 of 12 oz. beers per week appear to have 2.886 times odds for HNCS QoL satisfaction and 0.341 times hazards of death compared to those that consumed 30 or more of 12 oz. beers per week after adjusting for the other covariates in the model. Looking at the number of medical conditions reported, for each additional medial condition reported, the odds ratio of HNCS QoL satisfaction is decreased by 16% and the hazard of death is increased by 29%. In the meantime, race (African-American), radiation therapy, the number of persons supported by household income, and BMI are selected only in the HNCS QoL longitudinal model. African-Americans, patients not treated with radiation therapy, patients in the family with the smaller number of persons supported by household income, or patients with higher BMI are likely to be satisfied with longitudinal HNCS QoL while the risk of death is not affected by these factors. Furthermore, we also find that time at survey measurement is statistically significant in the HNCS QoL longitudinal model implying that patients are more satisfied over time. The parameter ψ for the dependence between longitudinal HNCS QoL and survival time is negative and has p-value=0.131. This implies that the longitudinal HNCS QoL and survival time are marginally correlated and some latent factors which increase HNCS QoL satisfaction also decrease the risk of death.

Table 3.

Analyses results for the HNCS and survival time of the CHANCE study

		Simultaneous			Separate
Parameter		Est.	ESE	P-value	Est.	ESE	P-value
< HNCS QoL longitudinal model >
Intercept	β ₀	.744	.538	.167	1.190	.390	.002
Race (ref= White)
– African American	β ₁	.564	.229	.014	.511	.256	.047
# of 12 oz. beers consumed per week (ref=30 or more)
– None	β ₂	.636	.269	.018	.622	.300	.038
– less than 1	β ₃	.830	.357	.020	.735	.396	.064
– 1 to 4	β ₄	1.302	.294	<.001	1.268	.326	<.001
– 5 to 14	β ₅	1.060	.251	<.001	1.018	.279	<.001
– 15 to 29	β ₆	.601	.289	.037	.547	.327	.095
Household income (ref= level1: 0–10K)
– level2: 20–30K	β ₇	−.271	.231	.241	−.328	.258	.204
– level3: 40–50K	β ₈	.297	.255	.245	.250	.282	.376
– level4: ≥ 60K	β ₉	1.199	.274	<.001	1.045	.286	<.001
Radiation therapy (ref= No)
– Yes	β ₁₀	−1.132	.260	<.001	−1.048	.280	<.001
Tumor stage (ref= I)
– II	β ₁₁	−.416	.300	.166	−.352	.330	.286
– III	β ₁₂	−1.335	.284	<.001	−1.198	.314	<.001
– IV	β ₁₃	−1.175	.254	<.001	−1.057	.277	<.001
# of persons supported by household income	β ₁₄	−.189	.084	.025
BMI	β ₁₅	.041	.015	.007
Total # of medical conditions reported	β ₁₆	−.175	.080	.030
Time at survey measurement (years)	β ₁₇	.241	.066	<.001	.254	.067	<.001
variance of random effects	$σ_{b}^{2}$	.303	.173	.013	1.169	.257
< Hazards model >
Random effect coefficient	ψ	−1.427	.946	.131
# of 12 oz. beers consumed per week (ref=30 or more)
– None	γ ₁	−.772	.386	.045
– less than 1	γ ₂	−.155	.426	.715
– 1 to 4	γ ₃	−.802	.414	.053
– 5 to 14	γ ₄	−1.076	.383	.005
– 15 to 29	γ ₅	−.591	.399	.139
Household income (ref= level1: 0–10K)
– level2: 20–30K	γ ₆	−.218	.294	.459	−.219	.263	.406
– level3: 40–50K	γ ₇	−.941	.371	.011	−.928	.331	.005
– level4: ≥ 60K	γ ₈	−1.463	.401	<.001	−1.393	.358	<.001
Tumor stage (ref= I)
– II	γ ₉	−.199	.465	.668	−.295	.435	.498
– III	γ ₁₀	.235	.433	.588	.136	.389	.727
– IV	γ ₁₁	1.059	.360	.003	.914	.295	.002
Total # of medical conditions reported	γ ₁₂	.256	.110	.020	.205	.091	.025

Open in a new tab

P-value for testing $σ_{b}^{2}$ being zero is based on a mixture of 0 and χ² distribution with 1 degree of freedom with equal mixing probabilities.

For the purpose of comparison, we conducted separate analyses for longitudinal HNCS QoL and survival time whose final results are given in the last three columns of Table 3. The generalized linear mixed model (GLMM) and the Cox proportional hazards model are used for longitudinal outcomes and survival time respectively. The GLMM also considers individual heterogeneity through subject-specific random effects although it does not incorporate the correlation between longitudinal outcomes and survival time. Comparing the results from the simultaneous and separate analyses of Table 3, we can see our simultaneous analysis additionally indicates the number of persons supported by household income, BMI, and the total number of medical conditions reported in the HNCS QoL longitudinal model (p-values=0.025, 0.007, and 0.030, respectively) and the number of 12 oz. beers consumed per week in the hazard model (p-values=0.045 and 0.005 for ‘None’ and ‘5 to 14’) as significant while they are not statistically significant by separate analyses.

Figure 1 shows the estimated baseline cumulative hazard rates over follow-up time with the 95% confidence interval. Since the baseline cumulative hazard rates are bounded by 0, we first log-transformed the estimated baseline cumulative hazard rates and obtained the 95% lower and upper bounds for the log-scaled estimated baseline cumulative hazards. Then, we re-transformed them into their original scale. The estimated baseline cumulative hazard rates look flat at the very early time within a year, but soon appear to be linearly increasing. Figure 2 shows the Kaplan-Meier estimates (solid line) and the predicted survival probabilities based on the simultaneous analysis (dashed line). These two survival curves are very close to each other which implies our proposed method fits the data well.

Fig. 1 — Estimated baseline cumulative hazards (solid line) with 95% confidence interval (dashed lines) by the simultaneous analysis of HNCS QoL longitudinal outcome and survival time

Fig. 2 — Kaplan-Meier estimates (solid line) and the predicted survival probabilities based on the simultaneous analysis of HNCS QoL longitudinal outcome and survival time (dashed line)

6 Concluding Remarks

We have proposed a method for the simultaneous modeling of longitudinal outcomes including both categorical and continuous data with a generalized linear mixed model and survival time with a stratified multiplicative proportional hazards model through random effects. We have also developed a maximum likelihood estimation method for the proposed simultaneous model, and presented asymptotic properties of the proposed estimators. The proposed estimation procedure using EM algorithm has been assessed via simulation studies. The proposed estimates performed well in finite samples. The variance estimates based on the observed information matrix approximate the true variance well in finite samples. The proposed method was applied to data from the CHANCE study.

When the dimension for random effects is high, computational intensity is increased due to high dimensional Gauss-Hermit quadrature (GQ) integration and the convergence could be slow. To handle such situation, alternative numerical methods such as adaptive quadrature or MCMC may be useful.

A stratified Cox PH model is considered for survival data and each stratum has its unspecified baseline hazard. Even though more strata introduce more baseline hazard functions, the number of the parameters which are associated with the jumps of each cumulative hazard functions remains the same as the total number of observed failures. Therefore, the additional strata do not increase computational complexity while using strata gives more flexible and robust structure if we do believe that survival experience between certain groups is very different and are not proportional over time.

In our proposed method, all the information on survival, longitudinal outcomes, and covaraties are used. As a result of this, the parameter estimates can be more efficient. The proposed model also generalizes previous work to general longitudinal outcomes. Future work can include relaxing normal assumption for the random effects, considering generalization to mixed types of longitudinal outcomes, and improving computational efficiency.

Acknowledgements

The authors thank the editor, the associate editor, and two referees of the Statistics in Bioscience for their valuable suggestions that considerably improved this article.

Appendix A. EM Algorithms

A.1. EM algorithm – Binary longitudinal data and survival time

(1) E-step

For binary longitudinal outcomes and survival time, we calculate the conditional expectation of q(b_i) for subject i with S_i=s given the observations and the current estimate (θ^(k),Λ_s^(k)) for some known function q(·). The conditional expectation denoted by E[q(b_i|θ^(k),Λs^(k)] can be expressed as the following: Given the current estimate (θ^(k),Λ_s^(k)),

E [q (b_{i}) ∣ θ^{(k)}, Λ_{s}^{(k)}] = \frac{\int_{z_{G}} q (R (z_{G})) K (z_{G}) \exp {- z_{G}^{T} z_{G}} {dz}_{G}}{\int_{z_{G}} K (z_{G}) \exp {- z_{G}^{T} z_{G}} {dz}_{G}},

(6)

where

\begin{matrix} R (z_{G}) & = {(Σ_{b}^{(k)})}^{\frac{1}{2}} [\sqrt{2} z_{G} + {(Σ_{b}^{(k)})}^{\frac{1}{2}} (\sum_{j = 1}^{n_{i}} y_{ij} {\tilde{X}}_{ij}^{T} + Δ_{i} ({\tilde{Z}}_{i}^{T} (V_{i}) \circ ψ^{(k)}))], \\ K (z_{G}) & = \exp {- \sum_{j = 1}^{n_{i}} \log (1 + e^{X_{ij} β^{(k)} + {\tilde{X}}_{ij} R (z_{G})}) - \int_{0}^{V_{i}} e^{{\tilde{Z}}_{i} (u) (ψ^{(k)} \circ R (z_{G})) + Z_{i} (u) γ^{(k)}} d Λ_{s}^{(k)} (u)}, \end{matrix}

(7)

${(Σ_{b}^{(k)})}^{\frac{1}{2}}$ is an unique non-negative square root of $Σ_{b}^{(k)}$ (i.e. ${(Σ_{b}^{(k)})}^{\frac{1}{2}} \times {(Σ_{b}^{(k)})}^{\frac{1}{2}} = Σ_{b}^{(k)}$ ), and z_G follows a multivariate Gaussian distribution with mean zero.

(2) M-step

Since the parameter ϕ is set to 1 for logistic distribution, we estimate only β in the longitudinal process. β^(k+1) solves the conditional expectation of complete data log-likelihood score equation, using one-step Newton-Raphson iteration,

\sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} (y_{ij} - \sum_{s = 1}^{S} E [\frac{\exp {X_{ij} β^{(k + 1)} + {\tilde{X}}_{ij} b_{i}}}{1 + \exp {X_{ij} β^{(k + 1)} + {\tilde{X}}_{ij} b_{i}}} ∣ θ^{(k)}, Λ_{s}^{(k)}] I (S_{i} = s)) X_{ij}^{T} = 0 .

$Σ_{b}^{(k + 1)}, ψ^{(k + 1)}, γ^{(k + 1)}$ , and $Λ_{s}^{(k + 1)}$ have the same expressions as in Section 2.2.

A.2. EM algorithm – Poisson longitudinal data and survival time

(1) E-step

For Poisson longitudinal outcomes and survival time, given the current estimate (θ^(k),Λ_s^(k)), the conditional expectation denoted by $E [q (b_{i}) ∣ θ^{(k)}, Λ_{s}^{(k)}]$ can be expressed as in (6) with R(z_G) defined as in (7),

K (z_{G}) = \exp {- \sum_{j = 1}^{n_{i}} e^{X_{ij} β^{(k)} + {\tilde{X}}_{ij} R (z_{G})} - \int_{0}^{V_{i}} e^{{\tilde{Z}}_{i} (u) (ψ^{(k)} \circ R (z_{G})) + Z_{i} (u) γ^{(k)}} d Λ_{s}^{(k)} (u)},

and z_G follows a multivariate Gaussian distribution with mean zero.

(2) M-step

Since the parameter ϕ is set to 1 for Poisson distribution, we estimate only β in the longitudinal process. β^(k+1) solves the conditional expectation of complete data log-likelihood score equation, using one-step Newton-Raphson iteration,

\sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} (y_{ij} - \sum_{s = 1}^{S} E [\exp {X_{ij} β^{(k + 1)} + {\tilde{X}}_{ij} b_{i}} ∣ θ^{(k)}, Λ_{s}^{(k)}] I (S_{i} = s)) X_{ij}^{T} = 0 .

$Σ_{b}^{(k + 1)}, ψ^{(k + 1)}, γ^{(k + 1)}$ , and $Λ_{s}^{(k + 1)}$ have the same expressions as in Section 2.2.

Appendix B. Proofs for Theorems

In Appendices B.1 and B.2, we sketch the proofs for Theorem 1 and Theorem 2. All detailed technical proofs are available from the authors. From (3) in Section 2.2, we can have the observed log-likelihood function, l_f(θ,Λ;Y,V) = log{L_f(θ,Λ;Y,V)}. We obtain and use the following modified object function, by replacing λ_s(V_i) with Λ_s{V_i} in l_f(θ,Λ;Y,V) where Λ_s{V_i} is the jump size of Λ_s(t) at the observed time V_i with Δ_i = 1,

\begin{matrix} l_{n} (θ, Λ) & = \sum_{i = 1}^{n} \log [\int_{b_{i}} \exp {\sum_{j = 1}^{n_{i}} [\frac{Y_{ij} (X_{ij} β + {\tilde{X}}_{ij} b_{i}) - B (β; b_{i})}{A (D_{i} (t_{j}; ϕ))} + C (Y_{ij}; D_{i} (t_{j}; ϕ))]} \\ \times (\prod_{s = 1}^{S} {[Λ_{s} {V_{i}}^{Δ_{i}} \exp {Δ_{i} [{\tilde{Z}}_{i} (V_{i}) (ψ \circ b_{i}) + Z_{i} (V_{i}) γ] - \int_{0}^{V_{i}} \exp {{\tilde{Z}}_{i} (u) (ψ \circ b_{i}) + Z_{i} (u) γ} d Λ_{s} (u)}]}^{I (S_{i} = s)}) \\ \times {(2 π)}^{- d_{b} ∕ 2} {∣ Σ_{b} ∣}^{- 1 ∕ 2} \exp {- \frac{1}{2} b_{i}^{T} Σ_{b}^{- 1} b_{i}} {db}_{i}], \end{matrix}

(8)

and $(\hat{θ}, \hat{Λ})$ maximizes l_n(θ,Λ) over the space ${(θ, Λ) : θ \in Θ, Λ \in W \times W \dots \times W}$ , where $W$ consists of all the right-continuous step functions only; that is, $Λ = {(Λ_{1}, \dots, Λ_{S})}^{T}, s = 1, \dots, S, Λ_{s} \in W$ . For the proofs of both Theorem 1 and Theorem 2, the modified object function is used in place of the observed log-likelihood function.

B.1. Proof of consistency – Theorem 1

Consistency can be proved by verifying the following three steps: First, we show the maximum likelihood estimate $(\hat{θ}, \hat{Λ})$ exists. Second, we show that, with probability one, ${\hat{Λ}}_{s} (τ), s = 1, \dots, S$ , are bounded as n→∞. Third, if the second step is true, by Helly’s selection theorem (van der Vaart 1998), we can choose a subsequence of ${\hat{Λ}}_{s} (t)$ such that ${\hat{Λ}}_{s} (t)$ weakly converges to some right-continuous monotone function $Λ_{s}^{*} (t)$ with probability one. For any sub-sequence, we can find a further sub-sequence, still denoted as $\hat{θ}$ , such that $\hat{θ} \to θ^{*}$ . Thus, in the third step, we show θ*=θ₀ and $Λ_{s}^{*} = Λ_{s 0}, s = 1, \dots, S$ . Once the three steps are completed, we can conclude that, with probability one, $\hat{θ}$ converges to θ₀ and ${\hat{Λ}}_{s} (t)$ converges to Λ_s0(t) in [0,τ], s=1,…,S. Moreover, since Λ_s0(t) is right-continuous in [0,τ], the latter can be strengthened to uniform convergence; that is, $\sup_{t \in [0, τ]} ‖ \hat{Λ} (t) - Λ_{0} (t) ‖ \to 0$ a.s. Then, the proof of Theorem 1 will be done.

In the first step, since θ belongs to a compact set Θ by Assumption (A1), it is sufficient to show that Λ_s{V_i} with Δ_i = 1 is finite. For each subject i with Δ_i = 1, after simple algebra, we have that, from (8),

\begin{matrix} l_{n} & (θ, Λ) \leq \sum_{i = 1}^{n} \log \int_{b_{i}} [\exp {\sum_{j = 1}^{n_{i}} [\frac{Y_{ij} (X_{ij} β + {\tilde{X}}_{ij} b_{i})}{A (D_{i} (t_{j}; ϕ))} + C (Y_{ij}; D_{i} (t_{j}; ϕ))]} \\ \times (\prod_{s = 1}^{S} {[{(Λ_{s} {V_{i}})}^{- Δ_{i}} \exp {- Δ_{i} [{\tilde{Z}}_{i} (V_{i}) (ψ \circ b_{i}) + Z_{i} (V_{i}) γ]}]}^{I (S_{i} = s)}) {(2 π)}^{- d_{b} ∕ 2} {∣ Σ_{b} ∣}^{- 1 ∕ 2} \exp {- \frac{1}{2} b_{i}^{T} Σ_{b}^{- 1} b_{i}}] {db}_{i} . \end{matrix}

If Λ_s{V_i} → ∞ for some i with Δ_i = 1, then l_n(θ,Λ) → −∞, which is contradictory to that l_n(θ,Λ) is bounded. Therefore, we conclude that Λ_s{·} must be finite. By the conclusion and assumption (A1), the maximum likelihood estimate $(\hat{θ}, \hat{Λ})$ exists.

In the second step, we define ${\hat{ζ}}_{s} = \log {\hat{Λ}}_{s} (τ)$ and rescale ${\hat{Λ}}_{s}$ by the factor $e^{{\hat{ζ}}_{s}}$ . Then, we let ${\tilde{Λ}}_{s}$ denote the rescaled function; that is, ${\tilde{Λ}}_{s} (t) = {\hat{Λ}}_{s} (t) ∕ {\hat{Λ}}_{s} (τ) = {\hat{Λ}}_{s} (t) e^{- {\hat{ζ}}_{s}}$ . thus, ${\tilde{Λ}}_{s} (τ) = 1$ . To prove this second step, it is sufficient to show ${\hat{ζ}}_{s}$ is bounded. After some algebra in (8), we obtain that, for any $Λ \in W \times W \dots \times W$ ,

\begin{matrix} n^{- 1} l_{n} (\hat{θ}, Λ) = \frac{1}{2} & \sum_{i = 1}^{n} [\sum_{j = 1}^{n_{i}} (\frac{Y_{ij} X_{ij} \hat{β}}{A (D_{i} (t_{j}; \hat{ϕ}))} + C (Y_{ij}; D_{i} (t_{j}; \hat{ϕ}))) + \sum_{s = 1}^{s} Δ_{i} (Z_{i} (V_{i}) \hat{γ}) I (S_{i} = s) \\ - \frac{1}{2} \log {{(2 π)}^{d_{b}} ∣ {\hat{Σ}}_{b} ∣} + \frac{1}{2} M_{i}^{T} M_{i} - \frac{1}{2} \log ∣ {\hat{Σ}}_{b} ∣ + \sum_{s = 1}^{s} Δ_{i} I (S_{i} = s) \log Λ_{s} {V_{i}} \\ + & \log \int_{b_{0}} [\exp {- \frac{1}{2} b_{0}^{T} b_{0} - \sum_{j = 1}^{n_{i}} \frac{B (\hat{β}; b_{0})}{A (D_{i} (t_{j}; \hat{ϕ}))} - \sum_{s = 1}^{s} I (S_{i} = s) \int_{0}^{V_{i}} e^{Q_{1 i} (t, b_{0}, \hat{θ})} d Λ_{s} (t)}] {db}_{0}], \end{matrix}

where $M_{i} = {\hat{Σ}}_{b}^{1 ∕ 2} {(\sum_{j = 1}^{n_{i}} \frac{Y_{ij} {\tilde{X}}_{ij}}{A (D_{i} (t_{j}; \hat{θ}))} + \sum_{s = 1}^{s} I (S_{i} = s) Δ_{i} ({\tilde{Z}}_{i} (V_{i}) \circ {\hat{ψ}}^{T}))}^{T}, b_{0} = Σ_{b}^{- 1 ∕ 2} b - M_{i}$ , and $Q_{1 i} (t, b_{0}, \hat{θ}) = ({\tilde{Z}}_{i} (t) \circ {\hat{ψ}}^{T}) {\hat{Σ}}_{b}^{1 ∕ 2} b_{0} + Z_{i} (t) \hat{γ} + ({\tilde{Z}}_{i} (t) \circ {\hat{ψ}}^{T}) Σ_{b}^{1 ∕ 2} M_{i}$ . Thus, since $0 \leq n^{- 1} l_{n} (\hat{θ}, \hat{Λ}) - n^{- 1} l_{n} (\hat{θ}, \tilde{Λ})$ where $\hat{Λ} = e^{\hat{ξ}} \circ \tilde{Λ}$ , it follows that

\begin{matrix} 0 \leq & \frac{1}{n} \sum_{i = 1}^{n} \sum_{s = 1}^{s} Δ_{i} I (S_{i} = s) (\log e^{{\hat{ζ}}_{s}} {\tilde{Λ}}_{s} - \log {\tilde{Λ}}_{s}) \\ + \frac{1}{n} \sum_{i = 1}^{n} \log \int_{b_{0}} [\exp {- \frac{1}{2} b_{0}^{T} b_{0} - \sum_{j = 1}^{n_{i}} \frac{B (\hat{β}; b_{0})}{A (D_{i} (t_{j}; \hat{ϕ}))} - \sum_{s = 1}^{s} I (S_{i} = s) e^{{\hat{ζ}}_{s}} \int_{0}^{V_{i}} e^{Q_{1 i} (t, b_{0}, \hat{θ})} d {\tilde{Λ}}_{s} (t)}] {db}_{0} \\ - \frac{1}{n} \sum_{i = 1}^{n} \log \int_{b_{0}} [\exp {- \frac{1}{2} b_{0}^{T} b_{0} - \sum_{j = 1}^{n_{i}} \frac{B (\hat{β}; b_{0})}{A (D_{i} (t_{j}; \hat{ϕ}))} - \sum_{s = 1}^{s} I (S_{i} = s) \int_{0}^{V_{i}} e^{Q_{1 i} (t, b_{0}, \hat{θ})} d {\tilde{Λ}}_{s} (t)}] {db}_{0} . \end{matrix}

(9)

According to the assumption (A2), there exist some positive constants C1,C2 and C3 such that $∣ Q_{1 i} (t, b_{0}, \hat{θ}) ∣ \leq C_{1} ‖ b_{0} ‖ + C_{2} ‖ Y_{i} ‖ + C_{3}$ . By denoting b₀ as a vector of variables following a standard multivariate normal distribution, from concavity of the logarithm function, in the third term of (9),

\begin{matrix} H_{i} & ≜ \log \int_{b_{0}} [\exp {- \frac{1}{2} b_{0}^{T} b_{0} - \sum_{j = 1}^{n_{i}} \frac{B (\hat{β}; b_{0})}{A (D_{i} (t_{j}; \hat{θ}))} - \sum_{s = 1}^{s} I (S_{i} = s) \int_{0}^{V_{i}} e^{Q_{1 i} (t, b_{0}, \hat{θ})} d {\tilde{Λ}}_{s} (t)}] {db}_{0} \\ \geq {(2 π)}^{d_{b} ∕ 2} E_{b_{0}} [- \sum_{j = 1}^{n_{i}} \frac{B (\hat{β}; b_{0})}{A (D_{i} (t_{j}; \hat{ϕ}))} - e^{C_{1} ‖ b_{0} ‖ + C_{2} ‖ Y_{i} ‖ + C_{3}}] = - e^{C_{2} ‖ Y_{i} ‖ + C_{4}} - C_{5}, \end{matrix}

where C4 and C5 are positive constants. Since it is easily verified that $E_{b_{0}} [\sum_{j = 1}^{n_{i}} \frac{B (\hat{β}; b_{0})}{A (D_{i} (t_{j}; \hat{ϕ}))} + e^{C_{1} ‖ b_{0} ‖ + C_{2} ‖ Y_{i} ‖ + C_{3}}] < \infty$ , by the strong law of large numbers and the assumption (A4), the third term of (9) can be bounded by a constant C6. i.e. $- \frac{1}{n} \sum_{i = 1}^{n} H_{i} \leq \frac{1}{n} \sum_{i = 1}^{n} (e^{C_{2} ‖ Y_{i} ‖ + C_{4}} + C_{5}) ≜ C_{6}$ . Then, (9) becomes

\begin{matrix} 0 \leq & \frac{1}{n} \sum_{i = 1}^{n} \sum_{s = 1}^{s} Δ_{i} I (S_{i} = s) {\hat{ζ}}_{s} \\ + \frac{1}{n} \sum_{i = 1}^{n} \log \int_{b_{0}} [\exp {- \frac{1}{2} b_{0}^{T} b_{0} - \sum_{j = 1}^{n_{i}} \frac{B (\hat{β}; b_{0})}{A (D_{i} (t_{j}; \hat{ϕ}))} - \sum_{s = 1}^{s} e^{{\hat{ζ}}_{s}} \int_{0}^{V_{i}} e^{Q_{1 i} (t, b_{0}, \hat{θ})} d {\tilde{Λ}}_{s} (t)}] {db}_{0} + C_{6} \\ \leq & \frac{1}{n} \sum_{i = 1}^{n} \sum_{s = 1}^{s} Δ_{i} I (S_{i} = s) {\hat{ζ}}_{s} \\ + \frac{1}{n} \sum_{i = 1}^{n} I (V_{i} = τ) \log \int_{b_{0}} [\exp {- \frac{1}{2} b_{0}^{T} b_{0} - \sum_{j = 1}^{n_{i}} \frac{B (\hat{β}; b_{0})}{A (D_{i} (t_{j}; \hat{ϕ}))} - \sum_{s = 1}^{s} e^{{\hat{ζ}}_{s}} \int_{0}^{τ} e^{Q_{1 i} (t, b_{0} \hat{θ})} d {\tilde{Λ}}_{s} (t)}] {db}_{0} + C_{7}, \end{matrix}

(10)

where C7 is a constant. On the other hand, since, for any Γ ≥ 0 and x > 0, Γlog (1+x/Γ)≤Γx/Γ=x, we have that e^−x≤ (1+x/Γ)^−Γ. Therefore, with $Q_{1 i} (t, b_{0}, \hat{θ}) \geq - C_{1} ‖ b_{0} ‖ - C_{2} ‖ Y_{i} ‖ - C_{3}$ , (10) gives that

\begin{matrix} 0 \leq C_{7} + & \frac{1}{n} \sum_{i = 1}^{n} Δ_{i} (\sum_{s = 1}^{s} {\hat{ζ}}_{s}) + \frac{1}{n} \sum_{i = 1}^{n} I (V_{i} = τ) \log {Γ^{Γ} \times \exp {- Γ \sum_{s = 1}^{s} I (S_{i} = s) {\hat{ζ}}_{s}} \\ \times \int_{b_{0}} [\exp {- \frac{1}{2} b_{0}^{T} b_{0} - \sum_{j = 1}^{n_{i}} \frac{B (\hat{β}; b_{0})}{A (D_{i} (t_{j}; \hat{ϕ}))} + C_{1} Γ ‖ b_{0} ‖ + C_{2} Γ ‖ Y_{i} ‖ + C_{3} Γ}] {db}_{0}} \\ = C_{7} + & \frac{1}{n} \sum_{i = 1}^{n} Δ_{i} (\sum_{s = 1}^{s} {\hat{ζ}}_{s}) - \frac{Γ}{n} \sum_{i = 1}^{n} I (V_{i} = τ) (\sum_{s = 1}^{s} {\hat{ζ}}_{s}) + C_{8} (Γ), \end{matrix}

(11)

where C8(Γ) is a deterministic function of Γ. For the s-th stratum, (11) is that

0 \leq C_{7} + \sum_{i = 1}^{n} Δ_{i} I (S_{i} = s) {\hat{ζ}}_{s} - \frac{Γ}{n} \sum_{i = 1}^{n} I (V_{i} = τ) I (S_{i} = s) {\hat{ζ}}_{s} + C_{8} (Γ) .

By the strong law of large numbers, $\sum_{i = 1}^{n} I (V_{i} = τ) I (S_{i} = s) ∕ n \to P (V_{i} = τ, S_{i} = s) > 0$ . Then, we can choose Γ large enough such that $\sum_{i = 1}^{n} Δ_{i} I (S_{i} = s) ∕ n \leq (Γ ∕ 2 n) \sum_{i = 1}^{n} I (V_{i} = τ) I (S_{i} = s)$ . Thus, we obtain $0 \leq C_{7} + C_{8} (Γ) - \frac{Γ}{2 n} \sum_{i = 1}^{n} I (V_{i} = τ) I (S_{i} = s) {\hat{ζ}}_{s}$ . In other words,

{\hat{ζ}}_{s} \leq \frac{(C_{7} + C_{8} (Γ)) 2 n}{Γ \sum_{i = 1}^{n} I (V_{i} = τ) I (S_{i} = s)} \to \frac{(C_{7} + C_{8} (Γ)) 2}{Γ P (V_{i} = τ, S_{i} = s)} .

If we denote B_s0 = exp{2(C₇+C8(Γ))/(ΓP(V_i=τ,S_i=s))}, we conclude that ${\hat{Λ}}_{s} (τ) \leq B_{s 0}, s = 1, \dots, S$ . Note that the above arguments hold for every sample in the probability space except a set with zero probability. Therefore, we have shown that, with probability one, ${\hat{Λ}}_{s} (τ)$ is bounded for any sample size n.

In the third step, for convenience, we omit the index i. Then, for the number of observed longitudinal measurements per subject, we use n_N instead of the n_i without subscript i since we denoted sample size as n. Use O to abbreviate the observed statistics $(Y, X, \tilde{X}, V, Δ, n_{N}, s)$ and ${Z (t), \tilde{Z} (t), 0 \leq t \leq V}$ for a subject, and define

\begin{matrix} G & (b, O; θ, Λ_{s}) = \exp {\sum_{j = 1}^{n_{N}} [\frac{Y_{j} (X_{j} β + {\tilde{X}}_{j} b) - B (β; b)}{A (D (t_{j}; ϕ))} + C (Y_{j}; D (t_{j}; ϕ))]} \\ \times \exp {Δ [\tilde{Z} (V) (ψ \circ b) + Z (V) γ] - \int_{0}^{V} \exp {\tilde{Z} (t) (ψ \circ b) + Z (t) γ} d Λ_{s} (t)} {(2 π)}^{- d_{b} ∕ 2} {∣ Σ_{b} ∣}^{- 1 ∕ 2} \exp {- \frac{1}{2} b^{T} Σ_{b}^{- 1} b}, \\ Q (ν, O; θ, Λ_{s}) = \frac{\int_{b} G (b, O; θ, Λ_{s}) \exp {\tilde{Z} (ν) (ψ \circ b) + Z (ν) γ} db}{\int_{b} G (b, O; θ, Λ_{s}) db}, \end{matrix}

and a class $F = {Q (ν, O; θ, Λ_{s} (t)) : ν \in [0, τ], θ \in Θ, Λ_{s} (t) \in W, Λ_{s} (0) = 0, Λ_{s} (τ) \leq B_{s 0}}$ , where B_s0 is the constant given in the second step and $W$ contains all nondecreasing functions in [0,τ]. Employing empirical process formulation, the class $F$ can be proved to be as P-Donsker by Theorems 2.5.6 and 2.7.5 in van der Vaart and Wellner (1996). In stratum s, denote m_s as the number of subjects, and V_sk and Δ_sk as the observed time and censoring indicator for the k-th subject, respectively. By differentiating (8) with respect to Λ_s{V_sk}, we obtain

{\hat{Λ}}_{s} {V_{sk}} = \frac{Δ_{sk}}{m_{s} P_{m_{s}} {I (V_{s} \geq ν) Q (ν, O; \hat{θ}, {\hat{Λ}}_{s})} ∣_{ν = V_{sk}}} .

We also construct ${\overset{‒}{Λ}}_{s} (t)$ , another step function with the jump size ${\overset{‒}{Λ}}_{s} {V_{sk}}$ , given by

{\hat{Λ}}_{s} {V_{sk}} = \frac{Δ_{sk}}{m_{s} P_{m_{s}} {I (V_{s} \geq ν) Q (ν, O; θ_{0}, Λ_{s 0})} ∣_{ν = V_{sk}}} .

Through the arguments using empirical process and relevant properties of P-Donsker and Glivenko-Cantelli class, we can prove that ${\overset{‒}{Λ}}_{s} (t)$ uniformly converges to Λ_s0(t) in [0,τ]. Next, by the bounded convergence theorem, the fact that $\hat{θ}$ converges to θ* and ${\hat{Λ}}_{s}$ weakly converges to $Λ_{s}^{*}$ , and the Arzela-Ascoli theorem, we can prove that ${\hat{Λ}}_{s} (t)$ uniformly converges to $Λ_{s}^{*} (t)$ . Then, from $n^{- 1} l_{n} (\hat{θ}, \hat{Λ}) - n^{- 1} l_{n} (θ_{0}, \overset{‒}{Λ}) \geq 0$ , using the properties of Glivenko-Cantelli class and Kullback-Leibler information, the following holds, with probability one,

{(λ_{s}^{*} (V_{s}))}^{Δ_{s}} \int_{b} G (b, O, θ^{*}, Λ_{s}^{*}) db = {(λ_{s 0} (V_{s}))}^{Δ_{s}} \int_{b} G (b, O, θ_{0}, Λ_{s 0}) db .

(12)

Our proof will be completed if we can show θ* = θ₀ and $Λ_{s}^{*} = Λ_{s 0}$ from (12). To show that β*=β₀, ϕ* = ϕ₀ and $Σ_{b}^{*} = Σ_{b 0}$ , we let Δ_s = 0 and V_s = 0 in (12). By the comparison of the coefficients of Y^TY and Y in the exponential part and the constant term out of the exponential part and assumption (A5), we obtain ϕ*=ϕ₀, β*=β₀ and $Σ_{b}^{*} = Σ_{b 0}$ . To show that ψ* =ψ₀, γ*=γ₀ and $Λ_{s}^{*} = Λ_{s 0}$ , we let Δ_s =0 in (12). By the similar arguments done for β* = β₀, ϕ* = ϕ₀ and $Σ_{b}^{*} = Σ_{b 0}$ , both sides of (12) with Δ_s = 0 are expressed as the expected values with respect to the random effects b following a multivariate normal distribution with mean $Σ_{b 0}^{1 ∕ 2} {(\sum_{j = 1}^{n_{N}} Y_{j} {\tilde{X}}_{j} ∕ A (D (t_{j}; ϕ_{0})))}^{T}$ and covariance Σ_b0. By the fact that for any fixed $\tilde{X}$ , treating ${\tilde{X}}^{T} Y$ as a parameter in the normal family, b is the complete statistic for ${\tilde{X}}^{T} Y$ , and the assumptions (A2) and (A5), we have ψ* = ψ₀, γ* = γ₀ and $Λ_{s}^{*} = Λ_{s 0}$ . Therefore, the proof of Theorem 1 is completed.

B.2. Proof of asymptotic normality – Theorem 2

Asymptotic distribution for the proposed estimators can be shown if we can verify the conditions of Theorem 3.3.1 in van der Vaart and Wellner (1996). Then, it will be shown that the distribution is Gaussian. For completeness, we use Theorem 4 in Parner (1998) which restated the Theorem 3.3.1 of van der Vaart and Wellner (1996).

Theorem 4 (Parner 1998)

Let U_n and U be random maps and a fixed map, respectively, from ξ to a Banach space such that:

$\sqrt{n} (U_{n} - U) ({\hat{ξ}}_{n}) - \sqrt{n} (U_{n} - U) (ξ_{0}) = o_{P}^{*} (1 + \sqrt{n} ‖ {\hat{ξ}}_{n} - ξ_{0} ‖)$ .
The sequence $\sqrt{n} (U_{n} - U) (ξ_{0})$ converges in distribution to a tight random element Z.
the function ξ → U(ξ) is Fréchet differentiable at ξ0 with a continuously invertible derivative ▽U_ξ0 (on its range).
U_ξ0 and ${\hat{ξ}}_{n}$ satisfies $U_{n} ({\hat{ξ}}_{n}) = o_{P}^{*} (n^{- 1 ∕ 2})$ and converges in outer probability to ξ₀. Then $\sqrt{n} ({\hat{ξ}}_{n} - ξ_{0}) \Rightarrow \nabla U_{ξ_{0}}^{- 1} Z$ .

In our situation, the parameter $ξ_{s} = (θ, Λ_{s}) \in Ξ = {(θ, Λ_{s}) : ‖ θ - θ_{0} ‖ + \sup_{t \in [0, τ]} ∣ Λ_{s} (t) - Λ_{s 0} (t) ∣ \leq δ, s = 1, \dots, S}$ for a fixed small constant δ. We define $H = {(h_{1}, h_{2}) : ‖ h_{1} ‖ \leq 1, {‖ h_{2} ‖}_{V} \leq 1}$ , where ∥h₂∥v is the total variation of h₂ in [0,τ] defined as

\sup_{0 = t_{0} \leq t_{2} \leq \dots \leq t_{k} = τ} \sum_{j = 1}^{k} ∣ h_{2} (t_{j}) - h_{2} (t_{j - 1}) ∣,

and also define that, for stratum s,

U_{m_{s}} (ξ_{s}) (h_{1}, h_{2}) = P_{m_{s}} {l_{θ} {(θ, Λ_{s})}^{T} h_{1} + l_{Λ_{s}} (θ, Λ_{s}) [h_{2}]}

and

U_{s} (ξ_{s}) (h_{1}, h_{2}) = P {l_{θ} {(θ, Λ_{s})}^{T} h_{1} + l_{Λ_{s}} (θ, Λ_{s}) [h_{2}]},

where lθ (θ,Λ_s) is the first derivative of the log-likelihood function from one single subject belonging to stratum s, denoted by l(O;θ,Λ_s), with respect to θ, and l_{Λ_s} (θ,Λ_s) is the derivative of l(O;θ,Λ_sε) at ε = 0, where $Λ_{s ε} (t) = \int_{0}^{t} (1 + ε h_{2} (u)) d Λ_{s 0} (u)$ . Therefore, both U_{m_s} and U_s map from Ξ to $ℓ^{\infty} (H)$ , and $\sqrt{m_{s}} {U_{m_{s}} (ξ_{s}) - U_{s} (ξ_{s})}$ is an empirical process in the space $ℓ^{\infty} (H)$ .

Denoting $h_{1} = {(h_{1}^{β T}, h_{1}^{ϕ T}, h_{1}^{bT}, h_{1}^{ψ T}, h_{1}^{γ T})}^{T}$ corresponding to θ=(β^T,ϕ^T,Vech(Σ_b)^T,ψ^T,γ^T)^T, for any $(h_{1}, h_{2}) \in H$ , the class

\begin{matrix} G = {l_{θ} {(θ, Λ_{s})}^{T} h_{1} + l_{Λ_{s}} & (θ, Λ_{s}) [h_{2}] - l_{θ} {(θ_{0}, Λ_{s 0})}^{T} h_{1} + l_{Λ_{s}} (θ_{0}, Λ_{s 0}) [h_{2}], \\ ‖ θ - θ_{0} ‖ + \sup_{t \in [0, τ]} ∣ Λ_{s} (t) - Λ_{s 0} (t) ∣ \leq δ, (h_{1}, h_{2}) \in H} \end{matrix}

can be shown as P-Donsker. From the P-Donsker property, it is also implied that

\sup_{(h_{1}, h_{2}) \in H} P {[l_{θ} {(θ, Λ_{s})}^{T} h_{1} + l_{Λ_{s}} (θ, Λ_{s}) [h_{2}] - l_{θ} {(θ_{0}, Λ_{s 0})}^{T} h_{1} + l_{Λ_{s}} (θ_{0}, Λ_{s 0}) [h_{2}]]}^{2} \to 0

as ∥θ–θ₀∥+sup_t∈[0,τ]|Λ_s(t)–Λ_s0(t)|→0. Then, for the conditions (a)–(d) in Theorem 4 of Parner (1998), we have that

(a) follows from Lemma 3.3.5 (p311) of van der Vaart and Wellner (1996);

(b) holds as a result of P-Donsker property and the convergence is defined in the metric space $ℓ^{\infty} (H)$ by the Donsker theorem in van der Vaart and Wellner (1996);

(d) is true because $(\hat{θ}, {\hat{Λ}}_{s})$ maximizes P_{m_s}l(O;θ,Λ_s), (θ₀,Λ_s0) maximizes Pl(O;θ,Λ_s), and $(\hat{θ}, {\hat{Λ}}_{s})$ converges to (θ₀,Λ_s0) from Theorem 1;

The first half of condition (c), that the function ξ → U(ξ) is Fréchet differentiable at ξ₀, is proved by showing there exists a bounded linear operator for the function.

Thus, it only remains to prove the second half of condition (c), that the derivative ▽U_ξ0 is continuously invertible on its range $ℓ^{\infty} (H)$ . From the proof of the Fréchet differentiability of U(ξ) at ξ₀, we have that for any (θ₁,Λ_s1) and (θ₁,Λ_s1) in Ξ,

\nabla U_{ξ_{0}} (θ_{1} - θ_{2}, Λ_{s 1} - Λ_{s 2}) [h_{1}, h_{2}] = {(θ_{1} - θ_{2})}^{T} Ω_{1} [h_{1}, h_{2}] + \int_{0}^{τ} Ω_{2} [h_{1}, h_{2}] d (Λ_{s 1} - Λ_{s 2}) (t),

(13)

where both Ω₁ and Ω₂ are bounded linear operators on $H$ , and Ω = (Ω₁,Ω₂) maps $H \subset R^{d} \times BV [0, τ]$ to R^d × BV[0,τ], where BV[0,τ] contains all the functions with finite total variation in [0,τ]. The explicit expressions of Ω₁ and Ω₂ can be obtained from P-Donsker property and the derivation of ▽U_ξ0 by definition. Thus, ▽U_ξ0 is a linear operator from $ℓ^{\infty} (H)$ to itself. We note that to prove that ▽U_ξ0 is continuously invertible is equivalent to showing that Ω is invertible. Then, by Theorem 4.25 of Rudin (1973), for the proof of invertibility of Ω, it is sufficient to verify that Ω is one to one: if Ω[h₁,h₂] = 0, then, by choosing θ₁–θ₂=ε*h₁ and Λ_s1–Λ_s2=ε* ∫h₂dΛ_s0 in (13) for a small constant ε*, we obtain

\nabla U_{ξ_{0}} (h_{1}, \int h_{2} d Λ_{s 0}) [h_{1}, h_{2}] = ε^{*} (h_{1}^{T}, h_{2}) (\begin{matrix} Ω_{1} [h_{1}, h_{2}] \\ Ω_{2} [h_{1}, h_{2}] \end{matrix}) = ε^{*} (h_{1}^{T}, h_{2}) Ω [h_{1}, h_{2}] = 0 .

Since ▽U_ξ0 (h₁, ∫h₂dΛ_s0)[h₁,h₂] is the negative information matrix in the submodel (θ₀+εh₁,Λ_s0+ε ∫h₂dΛ_s0), the score function along this submodel is lθ(θ₀,Λ_s0)^T h₁+l_Λs(θ₀, Λ_s0)[h₂] = 0; that is, with probability one, the numerator of the score function

\begin{matrix} 0 = \int_{b} & G (b, O; θ_{0}, Λ_{s 0}) \times [\frac{b^{T} Σ_{b 0}^{- 1} D_{b} Σ_{b 0}^{- 1} b}{2} - \frac{1}{2} Tr (Σ_{b 0}^{- 1} D_{b}) \\ + \sum_{j = 1}^{n_{N}} {- (\frac{Y_{j} (X_{j} β_{0} + {\tilde{X}}_{j} b) - B (β_{0}; b)}{A {(D (t_{j}; ϕ_{0}))}^{2}}) (A^{'} (D (t_{j}; ϕ_{0}))) h_{1}^{ϕ} + C^{'} (Y_{j}; D (t_{j}; ϕ_{0})) h_{1}^{ϕ}} \\ + \sum_{j = 1}^{n_{N}} {\frac{Y_{j} X_{j}}{A (D (t_{j}; ϕ_{0}))} h_{1}^{β} - B^{'} (β_{0}; b) h_{1}^{β}} + Δ_{s} {{(\tilde{Z} (V_{s}) \circ h_{1}^{ψ})}^{T} b + Z (V_{s}) h_{1}^{γ}} \\ - \int_{0}^{V_{s}} \exp {\tilde{Z} (t) (ψ_{0} \circ b) + Z (t) γ_{0}} \times {{(\tilde{Z} (t) \circ h_{1}^{ψ})}^{T} b + Z (t) h_{1}^{γ}} d Λ_{s 0} (t)] db \\ + & \int_{b} G (b, O; θ_{0}, Λ_{s 0}) \times [Δ_{s} h_{2} (V_{s}) \\ - \int_{0}^{V_{s}} h_{2} (t) \exp {\tilde{Z} (t) (ψ \circ b) + Z (t) γ_{0}} d Λ_{s 0} (t)] db, \end{matrix}

(14)

where A’(D(t_j;ϕ₀))and C’(Y_j;D(t_j;ϕ₀)) are the derivatives of A(D(t_j;ϕ))and C(Y_j;D(t_j;ϕ)) with respect to ϕ evaluated at ϕ₀ and B’(β₀;b) is the derivative of B(β;b) with respect to β evaluated at β₀. The proof of invertibility of Ω will be completed if we can show h₁ = 0 and h₂(t) = 0 from (14).

To show h₁ = 0, particularly we let Δ_s = 0 and V_s = 0 in (14). Examining the coefficient for Y and the constant terms without Y and using assumption (A5) and the similar arguments done in Appendix B.1 give $h_{1}^{ϕ} = 0, h_{1}^{β} = 0$ and D_b = 0. On the other hand, letting Δ_s = 0 in (14) with assumptions (A2) and (A5) and the similar arguments done in Appendix B.1 lead to $h_{1}^{ψ} = 0, h_{1}^{γ} = 0$ and h₂(t) = 0. Hence, the proof of condition (c) is completed.

Since the conditions (a)–(d) have been proved, Theorem 3.3.1 of van der Vaart and Wellner (1996) concludes that ${\sqrt{m}}_{s} (\hat{θ} - θ_{0}, {\hat{Λ}}_{s} - Λ_{s 0})$ weakly converges to a tight random element in $ℓ^{\infty} (H)$ . Then, we have

\begin{matrix} \sqrt{m_{s}} & \nabla U_{ψ_{0}} (\hat{θ} - θ_{0}, {\hat{Λ}}_{s} - Λ_{s 0}) [h_{1}, h_{2}] \\ = & \sqrt{m_{s}} (P_{m_{s}} - P) {l_{θ} {(θ_{0}, Λ_{s 0})}^{T} h_{1} + l_{Λ_{s}} (θ_{0}, Λ_{s 0}) [h_{2}]} + o_{P} (1), \end{matrix}

where oP(1) is a random variable which converges to zero in probability in $ℓ^{\infty} (H)$ , and, from (13),

\begin{matrix} \sqrt{m_{s}} & \nabla U_{ψ_{0}} (\hat{θ} - θ_{0}, {\hat{Λ}}_{s} - Λ_{s 0}) [h_{1}, h_{2}] \\ = & \sqrt{m_{s}} {{(\hat{θ} - θ_{0})}^{T} Ω_{1} [h_{1}, h_{2}] + \int_{0}^{τ} Ω_{2} [h_{1}, h_{2}] d ({\hat{Λ}}_{s} - Λ_{s 0}) (t)} . \end{matrix}

By denoting $(h_{1}^{*}, h_{2}^{*}) = Ω^{- 1} (h_{1}, h_{2})$ , we have $(h_{1}, h_{2}) = Ω (h_{1}^{*}, h_{2}^{*})$ , and by replacing (h₁,h₂) with $(h_{1}^{*}, h_{2}^{*})$ in the above two equations, we obtain

\begin{matrix} {\sqrt{m}}_{s} & {{(\hat{θ} - θ_{0})}^{T} h_{1} + \int_{0}^{τ} h_{2} (t) d ({\hat{Λ}}_{s} - Λ_{s 0}) (t)} \\ = & \sqrt{m_{s}} (P_{m_{s}} - P) {l_{θ} {(θ_{0}, Λ_{s 0})}^{T} h_{1}^{*} + l_{Λ_{s}} (θ_{0}, Λ_{s 0}) [h_{2}^{*}]} + o_{P} (1) . \end{matrix}

(15)

We can see that the first term on the right-hand side in (15) is ${\sqrt{m}}_{s} {U_{m_{s}} (θ_{0}, Λ_{s 0}) - U_{s} (θ_{0}, Λ_{s 0})}$ , which is an empirical process in the space $ℓ^{\infty} (H)$ . Furthermore, it is already shown that $G$ is P-Donsker. Therefore, $\sqrt{m_{s}} (\hat{θ} - θ_{0}, {\hat{Λ}}_{s} - Λ_{s 0})$ weakly converges to a Gaussian process in $ℓ^{\infty} (H)$ .

Choose h₂ = 0 in (15) with Proposition 3.3.1 in Bickel et al. (1993) concludes that $\hat{θ}$ is an efficient estimator for θ₀. Therefore, Theorem 2 is proved.

Contributor Information

Jaeun Choi, Department of Health Care Policy, Harvard Medical School 180 Longwood Avenue, Boston, MA 02115, USA Tel.: +617-432-0183, Fax: +617-432-0173 choi@hcp.med.harvard.edu.

Jianwen Cai, Department of Biostatistics, University of North Carolina at Chapel Hill McGavran-Greenberg Hl, 135 Dauer Drive, CB 7420, Chapel Hill, NC 27599, USA cai@bios.unc.edu.

Donglin Zeng, Department of Biostatistics, University of North Carolina at Chapel Hill McGavran-Greenberg Hl, 135 Dauer Drive, CB 7420, Chapel Hill, NC 27599, USA dzeng@bios.unc.edu.

Andrew F. Olshan, Department of Epidemiology, University of North Carolina at Chapel Hill McGavran-Greenberg Hl, 135 Dauer Drive, CB 7435, Chapel Hill, NC 27599, USA andyolshan@unc.edu

References

Albert PS, Follmann DA. Modeling Repeated Count Data Subject to Informative Dropout. Biometrics. 2000;56:667–677. doi: 10.1111/j.0006-341x.2000.00667.x. [DOI] [PubMed] [Google Scholar]
Albert PS, Follmann DA. Random Effects and Latent processes approaches for analyzing binary longitudinal data with missingness: a comparison of approaches using opiate clinical trial data. Stat Methods Med Res. 2007;16:417–439. doi: 10.1177/0962280206075308. [DOI] [PubMed] [Google Scholar]
Albert PS, Follmann DA, Wang SA, Suh EB. A Latent Autoregressive Model for Longitudinal Binary Data subject to Informative Missingness. Biometrics. 2002;58:631–642. doi: 10.1111/j.0006-341x.2002.00631.x. [DOI] [PubMed] [Google Scholar]
Bickel PJ, Klaassen CAJ, Ritov Y, Wellner JA. Efficient and Adaptive Estimation for Semiparametric Models. Johns Hopkins University Press; Baltimore: 1993. [Google Scholar]
Brown ER, Ibrahim JG. A Bayesian Semiparametric Joint Hierarchical Model for Longitudinal and Survival Data. Biometrics. 2003;59:221–228. doi: 10.1111/1541-0420.00028. [DOI] [PubMed] [Google Scholar]
Chakraborty A, Das K. Inferences for joint modelling of repeated ordinal scores and time to event data. Comput Math Methods Med. 2010;11:281–295. doi: 10.1080/17486701003789096. [DOI] [PubMed] [Google Scholar]
Chen W, Ghosh D, Raghunathan TE, Sargent DJ. Bayesian Variable Selection with Joint Modeling of Categorical and Survival Outcomes: An Application to Individualizing Chemotherapy Treatment in Advanced Colorectal Cancer. Biometrics. 2009;65:1030–1040. doi: 10.1111/j.1541-0420.2008.01181.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen MH, Ibrahim JG, Lipsitz SR. Bayesian methods for missing covariates in cure rate models. Lifetime Data Anal. 2002;8:117–146. doi: 10.1023/a:1014835522957. [DOI] [PubMed] [Google Scholar]
Ding J, Wang JL. Modeling Longitudinal Data with Nonparametric Multiplicative Random Effects Jointly with Survival Data. Biometrics. 2008;64:546–556. doi: 10.1111/j.1541-0420.2007.00896.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Divaris K, Olshan AF, Smith J, Bell ME, Weissler MC, Funkhouser WK, Bradshaw PT. Oral Health and Risk for Head and Neck Squamous Cell Carcinoma: the Carolina Head and Neck Cancer Study. Cancer Cause Control. 2010;21:567–575. doi: 10.1007/s10552-009-9486-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dunson DB, Herring AH. Bayesian latent variable models for mixed discrete outcomes. Biostatistics. 2010;6:11–25. doi: 10.1093/biostatistics/kxh025. [DOI] [PubMed] [Google Scholar]
Elashoff RM, Li G, Li N. An Approach to Joint Analysis of Longitudinal Measurements and Competing Risks Failure Time Data. Stat Med. 2007;26:2813–2835. doi: 10.1002/sim.2749. [DOI] [PMC free article] [PubMed] [Google Scholar]
Elashoff RM, Li G, Li N. A Joint Model for Longitudinal Measurements and Survival Data in the Presence of Multiple Failure Types. Biometrics. 2008;64:762–771. doi: 10.1111/j.1541-0420.2007.00952.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Faucett CL, Schenker N, Elashoff RM. Analysis of Censored Survival Data with Intermittently Observed Time-Dependent Binary Covariates. J Amer Statist Assoc. 1998;93:427–437. [Google Scholar]
Faucett CJ, Thomas DC. Simultaneously Modeling Censored Survival Data and Repeatedly Measured Covariates: A Gibbs Sampling Approach. Stat Med. 1996;15:1663–1685. doi: 10.1002/(SICI)1097-0258(19960815)15:15<1663::AID-SIM294>3.0.CO;2-1. [DOI] [PubMed] [Google Scholar]
Gueorguieva RV, Agresti A. A Correlated Probit Model for Joint Modeling of Clustered Binary and Continuous Responses. J Amer Statist Assoc. 2001;96:1102–1112. [Google Scholar]
Henderson R, Diggle P, Dobson A. Joint Modeling of Longitudinal Measurements and Event Time Data. Biometrics. 2000;4:465–480. doi: 10.1093/biostatistics/1.4.465. [DOI] [PubMed] [Google Scholar]
Hogan J, Laird N. Mixture Models for the Joint Distribution of Repeated Measures and Event Times. Stat Med. 1997;16:239–257. doi: 10.1002/(sici)1097-0258(19970215)16:3<239::aid-sim483>3.0.co;2-x. [DOI] [PubMed] [Google Scholar]
Hu W, Li G, Li N. A Bayesian Approach to Joint Analysis of Longitudinal Measurements and Competing Risks Failure Time Data. Stat Med. 2009;28:1601–1619. doi: 10.1002/sim.3562. [DOI] [PMC free article] [PubMed] [Google Scholar]
Huang X, Li G, Elashfoff RM, Pan J. A General Joint Model for Longitudinal Measurements and Competing Risks Survival Data with Heterogeneous Random Effects. Lifetime Data Anal. 2011;17:80–100. doi: 10.1007/s10985-010-9169-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Huang W, Zeger S, Anthony J, Garrett E. Latent Variable Model for Joint Analysis of Multiple Repeated Measures and Bivariate Event times. J Amer Statist Assoc. 2001;96:906–14. [Google Scholar]
Larsen K. Joint analysis of time-to-event and multiple binary indicators of latent classes. Biometrics. 2004;60:85–92. doi: 10.1111/j.0006-341X.2004.00141.x. [DOI] [PubMed] [Google Scholar]
Lin HQ, McCulloch CE, Mayne ST. Maximum likelihood estimation in the joint analysis of time-to-event and multiple longitudinal variables. Stat Med. 2002;21:2369–2382. doi: 10.1002/sim.1179. [DOI] [PubMed] [Google Scholar]
Liu L, Ma JZ, O’Quigley J. Joint analysis of multi-level repeated measures data and survival: an application to the end stage renal disease (ESRD) data. Stat Med. 2008;27:5676–5691. doi: 10.1002/sim.3392. [DOI] [PubMed] [Google Scholar]
Louis TA. Finding the Observed Information Matrix when Using the EM Algorithm. J. Royal Statist Soc B. 1982;44:226–233. [Google Scholar]
Parner E. Asymptotic Theory for the Correlated Gamma-frailty Model. Ann Stat. 1998;26:183–214. [Google Scholar]
Pulkstenis EP, Ten Have TR, Landis JR. Model for the Analysis of Binary Longitudinal Pain Data Subject to Informative Dropout through Remedication. J Amer Statist Assoc. 1998;93:438–450. [Google Scholar]
Ribaudo HJ, Thompson SG, Allen-Mersh TG. A Joint Analysis of Quality of Life and Survival Using a Random Effect Selection Model. Stat Med. 2000;19:3237–3250. doi: 10.1002/1097-0258(20001215)19:23<3237::aid-sim624>3.0.co;2-q. [DOI] [PubMed] [Google Scholar]
Rizopoulos D, Verbeke G, Lesaffre E, Vanrenterghem Y. A Two-Part Joint Model for the Analysis of Survival and Longitudinal Binary Data with Excess Zeros Biometrics. 2008;64:611–619. doi: 10.1111/j.1541-0420.2007.00894.x. [DOI] [PubMed] [Google Scholar]
Rizopoulos D, Verbeke G, Molenberghs G. Shared Parameter Models under Random Effects Misspecification. Biometrika. 2008;95:63–74. [Google Scholar]
Satterthwaite FW. An Approximate Distribution of Estimates of Variance Components. Biometrics. 1946;2:110–114. [PubMed] [Google Scholar]
Schluchter MD. Methods for the analysis of informatively censored longitudinal data. Stat Med. 1992;11:1861–1870. doi: 10.1002/sim.4780111408. [DOI] [PubMed] [Google Scholar]
Song X, Davidian M, Tsiatis AA. A Semiparametric Likelihood Approach to Joint Modeling of Longitudinal and Time-to-Event Data. Biometrics. 2002;58:742–753. doi: 10.1111/j.0006-341x.2002.00742.x. [DOI] [PubMed] [Google Scholar]
Song X, Wang CY. Semiparametric Approaches for Joint Modeling of Longitudinal and Survival Data with Time-Varying Coefficients. Biometrics. 2007;64:557–566. doi: 10.1111/j.1541-0420.2007.00890.x. [DOI] [PubMed] [Google Scholar]
Tseng YK, Hsieh R, Wang JL. Joint Modelling of Accelerated Failure Time and Longitudinal Data. Biometrika. 2005;92:587–603. [Google Scholar]
Tsiatis AA, De Gruttola V, Wulfsohn M. Modeling the Relationship of Survival to Longitudinal Data Measured with Error. Applications to Survival and CD4 Counts in Patients with AIDS. J Amer Statist Assoc. 1995;90:27–37. [Google Scholar]
Tsiatis AA, Davidian M. A Semiparametric Estimator for the Proportional Hazards Model with Longitudinal Covariates Measured with Error. Biometrika. 2001;88:447–458. doi: 10.1093/biostatistics/3.4.511. [DOI] [PubMed] [Google Scholar]
van der Vaart AW. Asymptotic Statistics. Cambridge University Press; 1998. [Google Scholar]
van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. Springer–Verlag; New York: 1996. [Google Scholar]
Wang Y, Taylor JMG. Jointly Modeling Longitudinal and Event Time Data with Application to Acquired Immunodeficiency Syndrome. J Amer Statist Assoc. 2001;96:895–905. [Google Scholar]
Wu M, Bailey K. Estimation and Comparison of Changes in the Presence of Informative Right Censoring: Conditional Linear Model. Biometrics. 1989;45:939–955. [PubMed] [Google Scholar]
Wu M, Carroll R. Estimation and Comparison of Changes in the Presence of Informative Right Censoring by Modelling the Censoring Process. Biometrics. 1988;44:175–188. [Google Scholar]
Wulfsohn M, Tsiatis AA. A Joint Model for Survival and Longitudinal Data Measured with Error. Biometrics. 1997;53:330–39. [PubMed] [Google Scholar]
Xu J, Zeger S. The Evaluation of Multiple Surrogate Endpoints. Biometrics. 2001a;57:81–87. doi: 10.1111/j.0006-341x.2001.00081.x. [DOI] [PubMed] [Google Scholar]
Xu J, Zeger S. Joint Analysis of Longitudinal Data Comprising Repeated Measures and Times to Events. Appl Stat. 2001b;50:375–387. [Google Scholar]
Yao F. Functional approach of flexibly modelling generalized longitudinal data and survival time. Journal of Statistical Planning and Inference. 2008;138:995–1009. [Google Scholar]
Ye W, Lin XH, Taylor JMG. Semiparametric modeling of longitudinal measurements and time-to-event data–a two-stage regression calibration approach. Biometrics. 2008;64:1238–1246. doi: 10.1111/j.1541-0420.2007.00983.x. [DOI] [PubMed] [Google Scholar]
Zeng D, Cai J. Simultaneous Modelling of Survival and Longitudinal Data with an Application to Repeated Quality of Life Measures. Lifetime Data Anal. 2005a;11:151–174. doi: 10.1007/s10985-004-0381-0. [DOI] [PubMed] [Google Scholar]
Zeng D, Cai J. Asymptotic Results for Maximum Likelihood Estimators in Joint Analysis of Repeated Measurements and Survival Time. Ann Stat. 2005b;33:2132–2163. [Google Scholar]

[R1] Albert PS, Follmann DA. Modeling Repeated Count Data Subject to Informative Dropout. Biometrics. 2000;56:667–677. doi: 10.1111/j.0006-341x.2000.00667.x. [DOI] [PubMed] [Google Scholar]

[R2] Albert PS, Follmann DA. Random Effects and Latent processes approaches for analyzing binary longitudinal data with missingness: a comparison of approaches using opiate clinical trial data. Stat Methods Med Res. 2007;16:417–439. doi: 10.1177/0962280206075308. [DOI] [PubMed] [Google Scholar]

[R3] Albert PS, Follmann DA, Wang SA, Suh EB. A Latent Autoregressive Model for Longitudinal Binary Data subject to Informative Missingness. Biometrics. 2002;58:631–642. doi: 10.1111/j.0006-341x.2002.00631.x. [DOI] [PubMed] [Google Scholar]

[R4] Bickel PJ, Klaassen CAJ, Ritov Y, Wellner JA. Efficient and Adaptive Estimation for Semiparametric Models. Johns Hopkins University Press; Baltimore: 1993. [Google Scholar]

[R5] Brown ER, Ibrahim JG. A Bayesian Semiparametric Joint Hierarchical Model for Longitudinal and Survival Data. Biometrics. 2003;59:221–228. doi: 10.1111/1541-0420.00028. [DOI] [PubMed] [Google Scholar]

[R6] Chakraborty A, Das K. Inferences for joint modelling of repeated ordinal scores and time to event data. Comput Math Methods Med. 2010;11:281–295. doi: 10.1080/17486701003789096. [DOI] [PubMed] [Google Scholar]

[R7] Chen W, Ghosh D, Raghunathan TE, Sargent DJ. Bayesian Variable Selection with Joint Modeling of Categorical and Survival Outcomes: An Application to Individualizing Chemotherapy Treatment in Advanced Colorectal Cancer. Biometrics. 2009;65:1030–1040. doi: 10.1111/j.1541-0420.2008.01181.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] Chen MH, Ibrahim JG, Lipsitz SR. Bayesian methods for missing covariates in cure rate models. Lifetime Data Anal. 2002;8:117–146. doi: 10.1023/a:1014835522957. [DOI] [PubMed] [Google Scholar]

[R9] Ding J, Wang JL. Modeling Longitudinal Data with Nonparametric Multiplicative Random Effects Jointly with Survival Data. Biometrics. 2008;64:546–556. doi: 10.1111/j.1541-0420.2007.00896.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] Divaris K, Olshan AF, Smith J, Bell ME, Weissler MC, Funkhouser WK, Bradshaw PT. Oral Health and Risk for Head and Neck Squamous Cell Carcinoma: the Carolina Head and Neck Cancer Study. Cancer Cause Control. 2010;21:567–575. doi: 10.1007/s10552-009-9486-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] Dunson DB, Herring AH. Bayesian latent variable models for mixed discrete outcomes. Biostatistics. 2010;6:11–25. doi: 10.1093/biostatistics/kxh025. [DOI] [PubMed] [Google Scholar]

[R12] Elashoff RM, Li G, Li N. An Approach to Joint Analysis of Longitudinal Measurements and Competing Risks Failure Time Data. Stat Med. 2007;26:2813–2835. doi: 10.1002/sim.2749. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] Elashoff RM, Li G, Li N. A Joint Model for Longitudinal Measurements and Survival Data in the Presence of Multiple Failure Types. Biometrics. 2008;64:762–771. doi: 10.1111/j.1541-0420.2007.00952.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Faucett CL, Schenker N, Elashoff RM. Analysis of Censored Survival Data with Intermittently Observed Time-Dependent Binary Covariates. J Amer Statist Assoc. 1998;93:427–437. [Google Scholar]

[R15] Faucett CJ, Thomas DC. Simultaneously Modeling Censored Survival Data and Repeatedly Measured Covariates: A Gibbs Sampling Approach. Stat Med. 1996;15:1663–1685. doi: 10.1002/(SICI)1097-0258(19960815)15:15<1663::AID-SIM294>3.0.CO;2-1. [DOI] [PubMed] [Google Scholar]

[R16] Gueorguieva RV, Agresti A. A Correlated Probit Model for Joint Modeling of Clustered Binary and Continuous Responses. J Amer Statist Assoc. 2001;96:1102–1112. [Google Scholar]

[R17] Henderson R, Diggle P, Dobson A. Joint Modeling of Longitudinal Measurements and Event Time Data. Biometrics. 2000;4:465–480. doi: 10.1093/biostatistics/1.4.465. [DOI] [PubMed] [Google Scholar]

[R18] Hogan J, Laird N. Mixture Models for the Joint Distribution of Repeated Measures and Event Times. Stat Med. 1997;16:239–257. doi: 10.1002/(sici)1097-0258(19970215)16:3<239::aid-sim483>3.0.co;2-x. [DOI] [PubMed] [Google Scholar]

[R19] Hu W, Li G, Li N. A Bayesian Approach to Joint Analysis of Longitudinal Measurements and Competing Risks Failure Time Data. Stat Med. 2009;28:1601–1619. doi: 10.1002/sim.3562. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] Huang X, Li G, Elashfoff RM, Pan J. A General Joint Model for Longitudinal Measurements and Competing Risks Survival Data with Heterogeneous Random Effects. Lifetime Data Anal. 2011;17:80–100. doi: 10.1007/s10985-010-9169-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] Huang W, Zeger S, Anthony J, Garrett E. Latent Variable Model for Joint Analysis of Multiple Repeated Measures and Bivariate Event times. J Amer Statist Assoc. 2001;96:906–14. [Google Scholar]

[R22] Larsen K. Joint analysis of time-to-event and multiple binary indicators of latent classes. Biometrics. 2004;60:85–92. doi: 10.1111/j.0006-341X.2004.00141.x. [DOI] [PubMed] [Google Scholar]

[R23] Lin HQ, McCulloch CE, Mayne ST. Maximum likelihood estimation in the joint analysis of time-to-event and multiple longitudinal variables. Stat Med. 2002;21:2369–2382. doi: 10.1002/sim.1179. [DOI] [PubMed] [Google Scholar]

[R24] Liu L, Ma JZ, O’Quigley J. Joint analysis of multi-level repeated measures data and survival: an application to the end stage renal disease (ESRD) data. Stat Med. 2008;27:5676–5691. doi: 10.1002/sim.3392. [DOI] [PubMed] [Google Scholar]

[R25] Louis TA. Finding the Observed Information Matrix when Using the EM Algorithm. J. Royal Statist Soc B. 1982;44:226–233. [Google Scholar]

[R26] Parner E. Asymptotic Theory for the Correlated Gamma-frailty Model. Ann Stat. 1998;26:183–214. [Google Scholar]

[R27] Pulkstenis EP, Ten Have TR, Landis JR. Model for the Analysis of Binary Longitudinal Pain Data Subject to Informative Dropout through Remedication. J Amer Statist Assoc. 1998;93:438–450. [Google Scholar]

[R28] Ribaudo HJ, Thompson SG, Allen-Mersh TG. A Joint Analysis of Quality of Life and Survival Using a Random Effect Selection Model. Stat Med. 2000;19:3237–3250. doi: 10.1002/1097-0258(20001215)19:23<3237::aid-sim624>3.0.co;2-q. [DOI] [PubMed] [Google Scholar]

[R29] Rizopoulos D, Verbeke G, Lesaffre E, Vanrenterghem Y. A Two-Part Joint Model for the Analysis of Survival and Longitudinal Binary Data with Excess Zeros Biometrics. 2008;64:611–619. doi: 10.1111/j.1541-0420.2007.00894.x. [DOI] [PubMed] [Google Scholar]

[R30] Rizopoulos D, Verbeke G, Molenberghs G. Shared Parameter Models under Random Effects Misspecification. Biometrika. 2008;95:63–74. [Google Scholar]

[R31] Satterthwaite FW. An Approximate Distribution of Estimates of Variance Components. Biometrics. 1946;2:110–114. [PubMed] [Google Scholar]

[R32] Schluchter MD. Methods for the analysis of informatively censored longitudinal data. Stat Med. 1992;11:1861–1870. doi: 10.1002/sim.4780111408. [DOI] [PubMed] [Google Scholar]

[R33] Song X, Davidian M, Tsiatis AA. A Semiparametric Likelihood Approach to Joint Modeling of Longitudinal and Time-to-Event Data. Biometrics. 2002;58:742–753. doi: 10.1111/j.0006-341x.2002.00742.x. [DOI] [PubMed] [Google Scholar]

[R34] Song X, Wang CY. Semiparametric Approaches for Joint Modeling of Longitudinal and Survival Data with Time-Varying Coefficients. Biometrics. 2007;64:557–566. doi: 10.1111/j.1541-0420.2007.00890.x. [DOI] [PubMed] [Google Scholar]

[R35] Tseng YK, Hsieh R, Wang JL. Joint Modelling of Accelerated Failure Time and Longitudinal Data. Biometrika. 2005;92:587–603. [Google Scholar]

[R36] Tsiatis AA, De Gruttola V, Wulfsohn M. Modeling the Relationship of Survival to Longitudinal Data Measured with Error. Applications to Survival and CD4 Counts in Patients with AIDS. J Amer Statist Assoc. 1995;90:27–37. [Google Scholar]

[R37] Tsiatis AA, Davidian M. A Semiparametric Estimator for the Proportional Hazards Model with Longitudinal Covariates Measured with Error. Biometrika. 2001;88:447–458. doi: 10.1093/biostatistics/3.4.511. [DOI] [PubMed] [Google Scholar]

[R38] van der Vaart AW. Asymptotic Statistics. Cambridge University Press; 1998. [Google Scholar]

[R39] van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. Springer–Verlag; New York: 1996. [Google Scholar]

[R40] Wang Y, Taylor JMG. Jointly Modeling Longitudinal and Event Time Data with Application to Acquired Immunodeficiency Syndrome. J Amer Statist Assoc. 2001;96:895–905. [Google Scholar]

[R41] Wu M, Bailey K. Estimation and Comparison of Changes in the Presence of Informative Right Censoring: Conditional Linear Model. Biometrics. 1989;45:939–955. [PubMed] [Google Scholar]

[R42] Wu M, Carroll R. Estimation and Comparison of Changes in the Presence of Informative Right Censoring by Modelling the Censoring Process. Biometrics. 1988;44:175–188. [Google Scholar]

[R43] Wulfsohn M, Tsiatis AA. A Joint Model for Survival and Longitudinal Data Measured with Error. Biometrics. 1997;53:330–39. [PubMed] [Google Scholar]

[R44] Xu J, Zeger S. The Evaluation of Multiple Surrogate Endpoints. Biometrics. 2001a;57:81–87. doi: 10.1111/j.0006-341x.2001.00081.x. [DOI] [PubMed] [Google Scholar]

[R45] Xu J, Zeger S. Joint Analysis of Longitudinal Data Comprising Repeated Measures and Times to Events. Appl Stat. 2001b;50:375–387. [Google Scholar]

[R46] Yao F. Functional approach of flexibly modelling generalized longitudinal data and survival time. Journal of Statistical Planning and Inference. 2008;138:995–1009. [Google Scholar]

[R47] Ye W, Lin XH, Taylor JMG. Semiparametric modeling of longitudinal measurements and time-to-event data–a two-stage regression calibration approach. Biometrics. 2008;64:1238–1246. doi: 10.1111/j.1541-0420.2007.00983.x. [DOI] [PubMed] [Google Scholar]

[R48] Zeng D, Cai J. Simultaneous Modelling of Survival and Longitudinal Data with an Application to Repeated Quality of Life Measures. Lifetime Data Anal. 2005a;11:151–174. doi: 10.1007/s10985-004-0381-0. [DOI] [PubMed] [Google Scholar]

[R49] Zeng D, Cai J. Asymptotic Results for Maximum Likelihood Estimators in Joint Analysis of Repeated Measurements and Survival Time. Ann Stat. 2005b;33:2132–2163. [Google Scholar]

PERMALINK

Joint Analysis of Survival Time and Longitudinal Categorical Outcomes

Jaeun Choi

Jianwen Cai

Donglin Zeng

Andrew F Olshan

Abstract

1 Introduction

2 Model and Inference Procedure

2.1 Model formulation and notation

2.2 Inference procedure

(1) E-step

(2) M-step

3 Asymptotic Properties

Theorem 1

Theorem 2

4 Simulation Studies

4.1 Binary longitudinal outcomes and survival time

Table 1.

4.2 Poisson longitudinal outcomes and survival time

Table 2.

5 Analysis of the CHANCE Study

Table 3.

Fig. 1.

Fig. 2.

6 Concluding Remarks

Acknowledgements

Appendix A. EM Algorithms

A.1. EM algorithm – Binary longitudinal data and survival time

(1) E-step

(2) M-step

A.2. EM algorithm – Poisson longitudinal data and survival time

(1) E-step

(2) M-step

Appendix B. Proofs for Theorems

B.1. Proof of consistency – Theorem 1

B.2. Proof of asymptotic normality – Theorem 2

Theorem 4 (Parner 1998)

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases