Abstract
In biomedical or public health research, it is common for both survival time and longitudinal categorical outcomes to be collected for a subject, along with the subject’s characteristics or risk factors. Investigators are often interested in finding important variables for predicting both survival time and longitudinal outcomes which could be correlated within the same subject. Existing approaches for such joint analyses deal with continuous longitudinal outcomes. New statistical methods need to be developed for categorical longitudinal outcomes. We propose to simultaneously model the survival time with a stratified Cox proportional hazards model and the longitudinal categorical outcomes with a generalized linear mixed model. Random effects are introduced to account for the dependence between survival time and longitudinal outcomes due to unobserved factors. The Expectation-Maximization (EM) algorithm is used to derive the point estimates for the model parameters, and the observed information matrix is adopted to estimate their asymptotic variances. Asymptotic properties for our proposed maximum likelihood estimators are established using the theory of empirical processes. The method is demonstrated to perform well in finite samples via simulation studies. We illustrate our approach with data from the Carolina Head and Neck Cancer Study (CHANCE) and compare the results based on our simultaneous analysis and the separately conducted analyses using the generalized linear mixed model and the Cox proportional hazards model. Our proposed method identifies more predictors than by separate analyses.
Keywords: EM algorithm, Generalized linear mixed model, Maximum likelihood estimator, Random effect, Simultaneous modeling, Stratified Cox proportional hazards model
1 Introduction
In biomedical or public health research, it is common that both longitudinal outcomes over time and survival endpoint are collected for a subject, along with the subject’s characteristics or risk factors. Investigators are often interested in finding important variables which predict both longitudinal outcomes and survival time. Since longitudinal outcomes and survival time are dependent, it is natural to analyze these two outcomes jointly.
Among the existing approaches for longitudinal data and survival time, the selection model and the pattern mixture model have been widely used. The selection model estimates the distribution of survival time given longitudinal data. The selection model with continuous longitudinal data was studied by Tsiatis, De Gruttola, and Wulfsohn (1995), Faucett and Thomas (1996), Wulfsohn and Tsiatis (1997), Henderson, Diggle and Dobson (2000), Tsiatis and Davidian (2001), Xu and Zeger (2001a,b), Song, Davidian and Tsiatis (2002), Tseng, Hsieh and Wang (2005), Song and Wang (2007) and Ye, Lin and Taylor (2008) among others. The selection model with categorical longitudinal data was considered by Faucett, Schenker and Elashoff (1998), Huang et al. (2001), Xu and Zeger (2001a,b), Lin, McCulloch, and Mayne (2002), Chen, Ibrahim, and Lipsitz (2002), Larsen (2004), Yao (2008), and Chakraborty and Das (2010) among others. The pattern mixture model focuses on the trend of longitudinal outcomes conditional on survival time. The pattern mixture model with continuous longitudinal outcomes was studied by Wu and Carroll (1988), Wu and Bailey (1989), Schluchter (1992), Hogan and Laird (1997), Ribaudo, Thompson and Allen-Mersh (2000) and more recently by Ding and Wang (2008). Pulkstenis, Ten Have and Landis (1998) considered the pattern mixture model of binary longitudinal outcomes with informative dropout. Albert and Follmann (2000) proposed to model repeated count data subject to informative dropout, and Albert, Follmann, Wang and Suh (2002) and Albert and Follmann (2007) studied binary longitudinal data with informative missingness. However, these methods cannot be applied directly to assess covariate effects on both outcomes. Simultaneous modeling of the longitudinal and survival data is needed for such purpose.
Xu and Zeger (2001b) and Zeng and Cai (2005a) proposed simultaneous models of longitudinal outcome and survival time. In their articles, heterogeneity caused by unobserved factors is represented using subject-specific random effects. Xu and Zeger (2001b) considered both continuous and categorical longitudinal outcome and proposed the Bayesian approach using the MCMC for estimation. Zeng and Cai (2005a) considered continuous longitudinal outcome and adopted the EM algorithm for estimation. In their approach, given random effects, survival time and the repeated measurements of longitudinal outcomes are assumed to follow a Cox proportional hazards model and a Gaussian distribution, respectively. Recently, simultaneous models with varied types of survival events and random effect structures have been studied (Elashfoff, Li and Ni 2007, 2008; Liu, Ma and O’Quigley 2008; Rizopoulos, Verbeke and Molenberghs 2008; Rizopoulos, Verbeke, Lesaffre and Vanrenterghem 2008). Bayesian methods were also proposed for inference (Wang and Taylor 2001; Brown and Ibrahim 2003; Dunson and Herring 2005; Chen, Ghosh, Raghunathan, and Sargent 2009; Hu, Li and Li 2009; Huang, Li, Elashfoff and Pan 2011). However, among all the aforementioned simultaneous models, most studies except Xu and Zeger (2001b), Dunson and Herring (2005), Rizopoulos, Verbeke, Lesaffre and Vanrenterghem (2008), and Chen, Ghosh, Raghunathan, and Sargent (2009) are restricted to continuous longitudinal outcomes. For non-continuous longitudinal outcomes, Rizopoulos, Verbeke, Lesaffre and Vanrenterghem (2008) studied binary data with excess zeros through extending the previous study by Rizopoulos, Verbeke and Molenberghs (2008) which assumed an accelerated failure time model and used a copula function for random effects in the continuous longitudinal and survival processes, Dunson and Herring (2005) proposed a general underlying Poisson variable framework for discrete survival and longitudinal outcomes accommodating dependency through an additive gamma frailty model for the Poisson means in Bayesian approach, and Chen, Ghosh, Raghunathan, and Sargent (2009) considered a latent variable-based multivariate regression model with structured variance covariance matrix by assuming probit models for two binary outcomes and a log-normal accelerated failure time model for survival outcome and conducted the Bayesian inference through the Markov Chain Monte Carlo (MCMC) method.
Compared to the studies for continuous longitudinal data and survival time, relatively little work has been done in the simultaneous modeling frame work for categorical longitudinal data and survival time. However, the longitudinal outcomes may not be continuous in some biomedical studies, for example, where the outcomes are disease symptom with categories of mild/moderate/severe, quality of life measurements with dissatisfied/satisfied, or dichotomized test results with categories of positive/negative. With these categorical longitudinal outcomes, the existing theory for continuous longitudinal outcomes cannot be applied directly and the numerical algorithm needs to be modified. Therefore, in this paper, we investigate the simultaneous modeling of survival time and longitudinal categorical outcomes. Survival time is modeled in the Cox proportional hazards with the unspecified baseline hazard rate, and furthermore the hazards model is extended to allow multiple strata. Random effects are introduced into the proposed models to account for the dependence between survival time and longitudinal outcomes due to unobserved factors. We also establish the theoretical justification of the asymptotic properties of the maximum likelihood estimates by employing the theory of empirical process. Therefore, the contributions of this paper to the recent developments in the simultaneous modeling of categorical longitudinal data and survival data since Xu and Zeger (2001b) are the following: (1) We propose an efficient estimation based on the Nonparametric Maximum Likelihood Estimation (NPMLE) with no assumption for baseline hazard rates while Dunson and Herring (2005), Rizopoulos, Verbeke, Lesaffre and Vanrenterghem (2008), and Chen, Ghosh, Raghunathan, and Sargent (2009) considered parametric models for survival outcomes. (2) We implemented our proposed model in an Expectation-Maximization (EM) algorithm while Xu and Zeger (2001b), Dunson and Herring (2005), and Chen, Ghosh, Raghunathan, and Sargent (2009) proposed Bayesian estimation methods. (3) We provided an asymptotic theory for our efficient estimators.
The outline of this paper is as follows. In Section 2, we present a simultaneous modeling for longitudinal categorical outcomes and survival time, and describe the inference procedure. Asymptotic properties of the proposed estimators are investigated in Section 3, and numerical results from simulation studies are given in Section 4. Our proposed method is illustrated with the data from the Carolina Head and Neck Cancer Study (CHANCE) in Section 5. In Section 6, we discuss some further consideration and generalization. EM-algorithms are provided in Appendix A and the proofs for asymptotic results are given in Appendix B.
2 Model and Inference Procedure
2.1 Model formulation and notation
We use Y(t) to denote the value of a longitudinal marker process at time t. Suppose Y(t) is from a distribution belonging to exponential family in order to incorporate both continuous and categorical measurements. Let T denote survival time, and suppose that the survival time T is possibly right censored. Suppose a set of n subjects are followed over an interval [0,τ], where τ is the study end time. Denote bi, i = 1,…n, as a vector of subject-specific random effects of dimension db and bi’s are mutually independent and identically distributed from a multivariate normal with mean zero and covariance matrix Σb.
Given the random effects bi, the observed covariates, and the observed outcome history till time t, we assume that the longitudinal outcome Yi(t) at time t for subject i follows a distribution from the exponential family with density,
(1) |
with and , satisfying
and vi(t) = v(μi(t))A(Di(t;ϕ)), where g(·) and v(·) are known link and variance functions respectively, Xi(t) and are the row vectors of the observed covariates for subject i, and β is a column vector of coefficients for Xi(t). The random effect bi is allowed to differ for different individuals. Additionally, Xi(t) and can be completely different or share some components, and may include dummy variables for different strata.
Given the random effects bi, the observed covariates, and the observed survival history before time t, the conditional hazard rate function for the survival time Ti of subject i is assumed to follow a stratified multiplicative hazards model,
(2) |
where Zi(t) and are the row vectors of the observed covariates and may share some components, ψ is a vector of parameters of the coefficients for random effects, γ is a column vector of coefficients for Zi(t), and λs(t) is the s-th stratum baseline hazard rate function so that the baseline hazard rate is allowed to vary across levels of the stratification variable. Note that Zi(t) does not include dummy variables for strata since baseline hazard rate is stratum-specific. We assume common fixed effects and random effects across strata in both hazard and longitudinal models. However, the model may allow for possibly different covariate effects for different strata, which can be achieved by including interaction terms of the covariates with the indicator variables for the stratification variable. Subjects in different strata are assumed to be independent. Here, for any vectors a1 and a2 of the same dimension, denotes the component-wise product. In addition, and have the same dimensions as bi’s.
Under models (1) and (2), the two outcomes Y(t) and T are independent conditional on the covariates and random effect. The parameter ψ in model (2) characterizes the dependence between the longitudinal outcomes and the survival time due to latent random effect: When the k-th component of ψ is 0 (i.e. ψk = 0), it implies that the dependence between the survival time and longitudinal responses is not due to the corresponding latent variable bik; ψk ≠ 0 implies that such dependence may be due to the corresponding latent variable bik.
Let ni be the number of the observed longitudinal measurements for subject i, and assume that the distributions of ni and the observation times for longitudinal measurements are independent of the parameters of interest conditional on bi in this joint model. We also assume ni is bounded, which is a reasonable assumption in many biomedical studies. The observed data from n subjects are (ni,Yij,Xij,), j=1,…,ni, i=1,…,n, and , where for subject i, is the j-th observation of , Ci is the right-censoring time and independent of Ti and Yi(t) given the covariates and the random effects, Vi = min(Ti;Ci), Si denotes the stratum, and Δi = I(Ti ≤ Ci).
Our goal is to estimate and make inferences on the parameters θ =(βT,ϕT,Vech(Σb)T,ψT,γT)T and the baseline cumulative hazard functions with S strata, Λ(t)=(Λ1(t),…,ΛS(t))T, where , s=1,…,S. The parameters β and ϕ are from the longitudinal model, ψ and γ are from the hazard model, and Σb is associated with the random effects. Vech(·) operator creates a column vector from a matrix by stacking the diagonal and upper-triangle elements of the matrix.
2.2 Inference procedure
For all n subjects, we write , and . We also denote , and as block diagonal matrices with the i-th diagonal components, , and , respectively, and S=(S1,…,Sn)T. Then, the likelihood function of the complete data has the form,
and the full likelihood function of the observed data for the parameter (θ,Λ) is expressed as
(3) |
The proposed estimation method is to calculate the maximum likelihood estimates for (θ,Λ(t)) over a set of θ and Λ(t). We let each Λs(t) of Λ(t), s = 1,…,S, be a non-decreasing and right-continuous step function with jumps only at the observed failure times belonging to stratum s.
EM-algorithm is used for calculating the maximum likelihood estimates. In the EM-algorithm, bi is considered as missing data for i = 1,…,n. Therefore, the M-step solves the conditional score equations from complete data given observations, where the conditional expectation can be evaluated in E-step. The procedure involves iterating between the following two steps until convergence is achieved: at the k-th iteration,
(1) E-step
Calculate the conditional expectations of some known functions of bi, needed in the next M-step, for subject i with Si = s given observations and the current estimate . To do this, denote q(bi) and as a known function and its conditional expectation, respectively. By some algebra, can be expressed in terms of a vector of new variables zG following a multivariate Gaussian distribution with mean zero. The conditional expection is calculated using the Gauss-Hermite Quadrature numerical approximation with 20 quadrature points.
(2) M-step
After differentiating the conditional expectation of complete data log-likelihood function given observations and the current estimate (θ(k),Λ(k)), the updated estimator (θ(k+1),Λ(k+1)) can be obtained as follows: (β(k+1),ϕ(k+1)) solves the conditional expectation of complete data log-likelihood score equation using one-step Newton-Raphson iteration;
(ψ(k+1),γ(k+1)) solves the partial likelihood score equation from the full data using one-step Newton-Raphson iteration,
is obtained as an empirical function with jumps only at the observed failure time,
The expressions of the conditional expectation and the conditional score equations calculated in the E- and M-steps for binary and Poisson longitudinal outcomes with survival time are given respectively in Appendices A.1 and A.2.
The observed information matrix is adopted to obtain the variance estimate for . For the numerical calculation of the observed information matrix, we consider Λs{Vi}, the jump size of Λs(t) at Vi belonging to stratum s for which Δi = 1, instead of λs(Vi). That is, with Λs{·}=(Λ{Ts1},…,Λ{Tsms})T for ms failure times among ns subjects (0 ≤ ms ≤ ns) of the s-th stratum, s = 1,…,S. Then, by the Louis (1982) formula,
where Uc(θ,λ{·};(Y,V,b) and Bc(θ,Λ{·};(Y,V,b) are respectively the first derivative vector and the negative of the second derivative matrix of the complete data log-likelihood lc(θ,Λ{·};(Y,V,b) with respect to (θ,Λ{·}). The variance of is asymptotically equal to the corresponding sub-matrix of the inverse of the calculated observed information matrix. The variance of is obtained using the estimated variances and covariances corresponding to Λ{·} from the inverse of the observed information matrix where T ≤ t at the observed failures. In the EM-algorithm for variance estimation, we evaluate these conditional expectations only at the last iteration of the EM procedure for point estimation, where the conditional expectation of Uc is zero.
3 Asymptotic Properties
To study the asymptotic properties of the proposed estimator with and , we assume the following conditions below.
(A1) The true parameter belongs to a known compact set Θ which lies in the interior of the domain for θ.
(A2) The true baseline hazard rate function λ0(t) = (λ10(t),…,λS0(t)) is bounded and positive in [0,τ].
(A3) For the censoring time .
(A4) For the number of observed longitudinal measurements per subject ni, with probability one, and P(ni ≤ n0) = 1 for some integer n0.
(A5) Both XTX and are full rank with positive probability. Moreover, if there exist constant vectors c1 and c2 such that, with positive probability, for any t, Z(t)c1 = α0(t) and for a deterministic function α0(t), then c1 = 0, c2 = 0, and α0(t) = 0.
Assumption (A3) means that, by the end of the study, some proportion of the subjects will still be alive and censored at the study end time τ, and thus the maximum right censoring time is equal to τ. Assumption (A4) implies that some proportion of the subjects have at least db longitudinal observations, and there exists an integer n0 such that P(ni ≤ n0) = 1. Consistency and asymptotic distribution of the proposed estimator are summarized in the following two theorems. The proofs for Theorem 1 and Theorem 2 are given in Appendices B.1 and B.2, respectively, and we will present outlines of the proofs here.
Theorem 1
Under the assumptions (A1)~(A5), as n→∞, the maximum likelihood estimator is consistent under the product norm of the Euclidean distance and the supremum norm on [0,τ]. That is, as., where .
Consistency in Theorem 1 can be proved by verifying the following three steps: First, we show that the maximum likelihood estimate exists. This can be achieved by showing that the jump size Λs{Vi}, with Δi = 1, is finite. Second, we show that, with probability one, , are bounded as n → ∞. This can be proved by showing is bounded. Third, given that the second step is true, by Helly’s selection theorem (van der Vaart, 1998), we can choose a subsequence of such that weakly converges to some right-continuous monotone function with probability one. For any sub-sequence, we can find a further sub-sequence, still denoted as , such that . Using empirical process formulation and relevant Donsker properties with parameter identifiability, we can show that θ*=θ0 and . Based on these results, we can conclude that, with probability one, converges to θ0 and converges to Λs0(t) in [0,τ], s=1,…,S. Moreover, since Λs0(t) is right-continuous in [0,τ], the latter can be strengthened to uniform convergence; that is, almost surely.
Theorem 2
Under the assumptions (A1)~(A5), as weakly converges to a Gaussian random element in , and the estimator is asymptotically efficient, where dθ is the dimension of θ and l∞[0,τ] is the normed space containing all the bounded functions in [0,τ].
Once consistency is held, the conditions of Theorem 3.3.1 in van der Vaart and Wellner (1996), which implies the asymptotic normality of Theorem 2, can be verified via the tools of empirical processes. These conditions are restated in Theorem 4 of Parner (1998). The smoothness conditions in Theorem 4 of Parner (1998) can be verified using the regularity of the log-likelihood function in terms of model parameters and the Donsker properties of the score operators. By Theorem 3.3.1 of van der Vaart and Wellner (1996), weakly converges to a Gaussian process, and, by Proposition 3.3.1 in Bickel et al. (1993), is an efficient estimator for θ0.
When the sample size increases, the number of event times also increases. However, our asymptotic theory is not based on classical asymptotic theory. Instead it is based on modern empirical process theory, where the nuisance parameter is allowed to be infinite dimension. Thus, the asymptotic properties of the proposed estimator are not affected by the increment of the number of event times.
4 Simulation Studies
In this section, we present some results from our simulation studies. Two sets of simulations with different generalized linear mixed models for the longitudinal outcomes are performed. Binary and Poisson data are considered for longitudinal process in the first and second sets of simulations, respectively.
4.1 Binary longitudinal outcomes and survival time
In this first set of simulations, we assume Yij to be the j-th binary outcome of the i-th subject following
(4) |
with ηij = Xijβ+bi=β0+β1X1i+β2X2i+β3X3ij+bi for j=1,…,ni, and consider the hazard of the i-th subject at time t to be
(5) |
where , and, for simplicity, one stratum of hazard for survival time is simulated.
X1i≡Z1i are generated from a Bernoulli distribution with success probability being 0.5, and X2i≡ Z2i are simulated from the uniform distribution between 0 and 1. They are included in the hazard and longitudinal models with a fixed effect intercept. There is one additional covariate denoted as X3ij, the time at measurement, included in the longitudinal model. We suppose the longitudinal data are observed for every 0.3 unit of time, and thus X3ij has the value of every 0.3 unit ranging over 0 through 2.4. The longitudinal data are generated from the Bernoulli distribution with the success probability P(Yij = 1|bi) given in (4) and the average number of longitudinal observations (ni) per subject is 3 with the range of 1 to 8. To generate the survival time, we first generate ui from uniform (0,1) distribution. For a given hazard function λ, the survival time is then generated by ti = −log(ui)×exp{−(ψbi+γ1+γ2Z2i)}/λ. Censoring time is generated from the uniform distribution between 0.4 and 2.4 so that the censoring proportion is around 25~35%. The observed survival time is obtained by the minimum of the generated survival and censoring times. For the comparison of the estimated baseline cumulative hazards over simulations, we consider three time points: 0.9, 1.4, and 1.9, which correspond to the quartiles of the true survival distribution. The three time points are not the only distinct survival times but are selected to report the estimated cumulative hazard function at these points.
We consider different ψ values of −1, 0, and 1 for negative, zero, and positive dependency between longitudinal process and survival time model, respectively. The parameters in the two models are chosen as β0 = −1, β1 = 1, β2 = −0.5, β3 = −0.5, , ψ=−1/0/ 1, γ1=−1, γ2=1, and λ(t). Different sample sizes (n=200, 400) are simulated with 1000 replications. The results of the maximum likelihood estimates for θ and the baseline cumulative hazards at the three time points with their respective standard error estimates are reported in Table 1. The simulation study is conducted using R.
Table 1.
Summary of simulation results of maximum likelihood estimation for binary longitudinal outcomes and survival time.
n=200 | n=400 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
ψ | Par. | TRUE | Est. | SSD | ESE | CP | Est. | SSD | ESE | CP |
−1 | β 0 | −1 | −1.019 | .271 | .273 | .953 | −1.002 | .189 | .192 | .960 |
β 1 | 1 | 1.018 | .235 | .241 | .962 | 1.000 | .169 | .169 | .951 | |
β 2 | − .5 | − .494 | .413 | .396 | .942 | − .498 | .275 | .277 | .946 | |
β 3 | − .5 | − .469 | .223 | .221 | .947 | − .484 | .158 | .154 | .946 | |
.5 | .518 | .216 | .270 | .966 | .519 | .164 | .190 | .955 | ||
ψ | −1 | − .967 | .471 | .584 | .923 | − .988 | .364 | .420 | .921 | |
γ 1 | −1 | −1.002 | .242 | .254 | .961 | −1.002 | .177 | .183 | .960 | |
γ 2 | 1 | 1.001 | .379 | .393 | .962 | .995 | .278 | .279 | .953 | |
Λ( .9) | .9 | .922 | .228 | .227 | .959 | .909 | .167 | .157 | .943 | |
Λ(1:4) | 1.4 | 1.442 | .397 | .389 | .944 | 1.421 | .283 | .269 | .952 | |
Λ(1:9) | 1.9 | 1.956 | .594 | .600 | .953 | 1.950 | .456 | .426 | .949 | |
0 | β 0 | −1 | −1.016 | .281 | .276 | .957 | −1.004 | .191 | .193 | .956 |
β 1 | 1 | 1.003 | .258 | .247 | .936 | .994 | .170 | .173 | .957 | |
β 2 | − .5 | − .476 | .414 | .401 | .941 | −.485 | .275 | .281 | .959 | |
β 3 | − .5 | − .496 | .235 | .238 | .957 | −.498 | .169 | .167 | .948 | |
.5 | .500 | .233 | .287 | .957 | .497 | .181 | .200 | .952 | ||
ψ | 0 | .021 | .331 | .378 | .996 | .001 | .235 | .244 | .990 | |
γ 1 | −1 | −1.031 | .197 | .191 | .949 | −1.014 | .133 | .131 | .953 | |
γ 2 | 1 | 1.036 | .314 | .315 | .953 | 1.015 | .223 | .218 | .944 | |
Λ( .9) | .9 | .912 | .189 | .187 | .952 | .906 | .134 | .129 | .942 | |
Λ(1:4) | 1.4 | 1.450 | .315 | .307 | .958 | 1.418 | .215 | .207 | .950 | |
Λ(1:9) | 1.9 | 1.990 | .485 | .464 | .948 | 1.935 | .322 | .308 | .948 | |
1 | β 0 | −1 | −1.014 | .273 | .284 | .952 | −1.007 | .200 | .198 | .955 |
β 1 | 1 | 1.017 | .252 | .251 | .956 | 1.011 | .176 | .176 | .952 | |
β 2 | − .5 | − .518 | .428 | .412 | .953 | − .512 | .287 | .287 | .950 | |
β 3 | − .5 | − .540 | .245 | .248 | .954 | − .520 | .176 | .174 | .946 | |
.5 | .543 | .250 | .300 | .947 | .524 | .179 | .209 | .966 | ||
ψ | 1 | .956 | .488 | .609 | .898 | .992 | .366 | .450 | .930 | |
γ 1 | −1 | −1.000 | .264 | .255 | .945 | − .998 | .176 | .183 | .953 | |
γ 2 | 1 | 1.009 | .381 | .395 | .961 | .990 | .283 | .280 | .953 | |
Λ( .9) | .9 | .921 | .235 | .228 | .961 | .918 | .166 | .159 | .937 | |
Λ(1:4) | 1.4 | 1.443 | .412 | .393 | .963 | 1.430 | .275 | .271 | .956 | |
Λ(1.9) | 1.9 | 1.976 | .677 | .620 | .957 | 1.946 | .416 | .424 | .959 |
In Table 1, “True” gives the true values of the parameters; the averages of the maximum likelihood estimates from the EM algorithm are in “Est.”; the sample standard deviations from 1000 simulations are reported in “SSD”; “ESE” is the average of 1000 standard error estimates based on the observed information matrix; “CP” is the coverage proportion of the 95% confidence intervals based on the estimated standard error “ESE”. Satterthwaite (1946) method is used for the coverage probability of .
From Table 1, we can see that even for the smaller sample size (n=200), the bias of the estimates from EM algorithm is negligible for most cases. The estimated standard errors calculated from the observed information matrix are close to the sample standard deviations from the 1000 estimates, and the 95% confidence interval coverage rates are close to 0.95 except those for ψ. However, the coverage rate of the parameter ψ is improved for larger sample size. Additional simulations we conducted show that, with sample sizes of 800, the coverage rates for ψ=−1, 0 and 1 were 95.5%, 95.9% and 95.9%, respectively. In addition, the simulations show that the variances of the estimators decrease as the sample size (n) increases. We also can see that the estimates are fairly robust and close to the true values for all different ψ values.
4.2 Poisson longitudinal outcomes and survival time
In the second set of simulations, we assume Yij to follow a Poisson distribution,
with ηij defined as in Section 4.1. We also consider the same hazards model and simulation setting as those used in Section 4.1 except . The simulated Poisson longitudinal outcomes range over 0 to 7 with the average 0.5.
Table 2 shows that overall the estimates perform well even for the smaller sample size n = 200 with small biases of the estimates except ψ. We conducted additional simulations with sample sizes of 800 and 1000, and the bias for ψ decreases as sample size increases. The estimated standard errors using the observed information matrix are close to the sample standard deviations, and the 95% confidence interval coverage rates are close to 0.95 except for and ψ=0.
Table 2.
Summary of simulation results of maximum likelihood estimation for Poisson longitudinal outcomes and survival time.
n=200 | n=400 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
ψ | Par. | TRUE | Est. | SSD | ESE | CP | Est. | SSD | ESE | CP |
−1 | β 0 | −1 | − .995 | .203 | .196 | .940 | −1.005 | .138 | .138 | .948 |
β 1 | 1 | .998 | .178 | .171 | .942 | 1.007 | .119 | .121 | .945 | |
β 2 | − .5 | − .510 | .273 | .264 | .946 | − .502 | .192 | .186 | .935 | |
β 3 | − .5 | − .489 | .150 | .151 | .950 | − .492 | .107 | .106 | .949 | |
.2 | .196 | .074 | .092 | .983 | .200 | .055 | .065 | .976 | ||
ψ | −1 | −1.038 | .716 | .771 | .947 | −1.003 | .503 | .514 | .934 | |
γ 1 | −1 | −1.025 | .227 | .228 | .969 | −1.014 | .160 | .158 | .954 | |
γ 2 | 1 | 1.034 | .371 | .358 | .947 | 1.009 | .248 | .248 | .959 | |
Λ( .9) | .9 | .918 | .222 | .208 | .943 | .909 | .141 | .142 | .944 | |
Λ(1.4) | 1.4 | 1.456 | .403 | .357 | .943 | 1.424 | .235 | .237 | .953 | |
Λ(1:9) | 1.9 | 1.999 | .632 | .560 | .953 | 1.948 | .368 | .366 | .961 | |
0 | β 0 | −1 | −1.007 | .202 | .199 | .943 | −1.003 | .137 | .140 | .957 |
β 1 | 1 | 1.010 | .184 | .175 | .932 | .998 | .120 | .124 | .951 | |
β 2 | − .5 | − .513 | .283 | .268 | .937 | − .505 | .193 | .189 | .942 | |
β 3 | − .5 | − .500 | .164 | .161 | .951 | − .489 | .114 | .114 | .950 | |
.2 | .199 | .074 | .097 | .981 | .206 | .055 | .070 | .973 | ||
ψ | 0 | .006 | .571 | .610 | .993 | .011 | .391 | .387 | .978 | |
γ 1 | −1 | −1.039 | .188 | .194 | .958 | −1.015 | .128 | .132 | .953 | |
γ 2 | 1 | 1.021 | .326 | .319 | .948 | 1.003 | .226 | .219 | .944 | |
Λ( .9) | .9 | .917 | .191 | .188 | .952 | .909 | .129 | .130 | .950 | |
Λ(1.4) | 1.4 | 1.448 | .313 | .308 | .953 | 1.432 | .206 | .210 | .950 | |
Λ(1.9) | 1.9 | 2.006 | .478 | .473 | .954 | 1.966 | .312 | .316 | .950 | |
1 | β 0 | −1 | −1.014 | .195 | .202 | .952 | −1.004 | .138 | .142 | .954 |
β 1 | 1 | 1.014 | .180 | .178 | .954 | 1.008 | .126 | .125 | .951 | |
β 2 | − .5 | − .512 | .273 | .271 | .947 | − .511 | .190 | .191 | .945 | |
β 3 | − .5 | − .514 | .174 | .172 | .943 | − .509 | .123 | .122 | .959 | |
.2 | .201 | .083 | .098 | .967 | .204 | .060 | .070 | .960 | ||
ψ | 1 | .993 | .664 | .768 | .942 | 1.008 | .477 | .512 | .942 | |
γ 1 | −1 | −1.030 | .230 | .224 | .952 | −1.003 | .158 | .157 | .950 | |
γ 2 | 1 | 1.014 | .363 | .354 | .949 | 1.006 | .252 | .247 | .942 | |
Λ( .9) | .9 | .925 | .22 | .207 | .949 | .91 | .144 | .142 | .948 | |
Λ(1.4) | 1.4 | 1.46 | .389 | .351 | .95 | 1.435 | .246 | .237 | .942 | |
Λ(1.9) | 1.9 | 2.018 | .639 | .554 | .957 | 1.957 | .373 | .365 | .947 |
From Table 2, is seemingly underestimated with higher than the nominal coverage rates, but the coverage rate is improved for larger sample size. This implies that variance of may not be estimated well with small sample size for Poisson longitudinal distribution. In the mean time, the test for is conservative with small sample size, but the type I error becomes closer to the nominal level as sample size increases. Profile likelihood may be an alternative estimation approach for . The 95% confidence interval coverage for ψ=0 also appears to be higher than the nominal level, but the additional simulation with sample sizes of 800 shows that the high coverage rate reduces to the 95% nominal level. Table 2 also shows that the variances of the estimators decrease for larger sample size, and the estimates are fairly robust and close to the true values for all three different ψ values.
In all the simulations, the EM algorithm converged within 60 iterations. The CPU time used for 1000 data sets in the simulation studies was about 6 hours for sample size 200 averaging 20 seconds per each data set and 15 hours for sample size 400 averaging 1 minute per each data set on a computer with a 64-bit operating system.
5 Analysis of the CHANCE Study
The Carolina Head and Neck Cancer Study (CHANCE) is a population based epidemiologic study conducted at 60 hospitals in 46 counties in North Carolina from 2002 through 2006 (Divaris et al. 2010). Patients were diagnosed with head and neck cancer (oral, pharynx, and larynx cancer) from 2002–2006. Their survival status was collected up to 2007 and QoL was evaluated over time for three years after diagnosis. QoL information was collected through questionnaires. Based on summary scores of the five domains of self-perceived quality of life including Physical Well-Being (PWB), Social/Family Well-Being (SWB), Emotional Well-Being (EWB), Functional Well-Being (FWB) and Head and Neck Cancer Specific symptoms (HNCS), patient’s QoL information was classified into satisfaction or dissatisfaction with life. Survival time is defined as the time to death from diagnosis. Demographic and life style characteristics, medical histories and clinical factors are also collected. Ending in December 2009, information on QoL has been obtained from 554 head and neck cancer patients in the analysis. Based on the death information through 2007 available from the National Death Index (NDI), 85 of 554 patients died and the censoring rate is 85%. The number of observations per patient ranges from 1 to 3 with average of 1.93 which may look sparse. However, even if the number of longitudinal measurements per subject is sparse, since our estimation is based on pooling information from all the subjects, the actual measurements used for estimation are not sparse. It is of interest to elucidate the variables which are associated with both QoL satisfaction and survival time for patients with head and neck cancer. In particular, we are interested in the comparison between African-Americans and Whites since it is known that African-Americans have a higher incidence of head and neck cancer and worse survival than Whites. The longitudinal QoL satisfaction outcomes and survival time are correlated within a patient, and this dependency should be taken into account in the analysis.
We apply our proposed method to the Head and Neck Cancer Specific symptoms (HNCS) among the QoL domains with survival time. Longitudinal HNCS QoL outcomes are binary measurements with 1 (“satisfied”) and 0 (“dissatisfied”). We are interested in investigating which factors are related to QoL satisfaction and the risk of death. In the full models for both longitudinal QoL and survival time, we consider race (African-Americans, Whites), the number of 12 oz. beers consumed per week (None, <1, 1–4, 5–14, 15–29, ≥ 30), household income (0–10K, 20–30K, 40–50K, ≤ 60K), surgery (Yes/No), radiation therapy (Yes/No), chemotherapy (Yes/No), primary tumor site (Oral & Pharyngeal, Laryngeal) and tumor stage (I, II, III, IV) as categorical variables, and age at diagnosis (range: 24–80), the number of persons supported by household income (range: 1–5), body mass index (BMI) (range: 15.66–56.28) and the total number of medical conditions reported (range: 0–6) as continuous variables. Additionally, 2 interactions with race, i.e. race × the total number of medical conditions reported and race × tumor site, are included in both models since we are particularly interested in the difference of QoL and survival between African American and White. Time at survey measurement is also included as a covariate for longitudinal outcomes. A random intercept for the dependence between the QoL satisfaction and the risk of death is included in both models, and assumed to follow a normal distribution with mean zero. In addition to the simultaneous analysis, we also conduct separate analyses fitting the generalized linear mixed model and the Cox proportional hazards model to the longitudinal QoL and survival time respectively and compare the results to those from our proposed simultaneous method.
After fitting the simultaneous models with all the covariates, we use backward variable selection based on the Likelihood Ratio Test (LRT) and find that surgery, chemotherapy, tumor site, age at diagnosis, and all 2 interactions are not statistically significant in both models for HNCS QoL satisfaction and survival time at the significance level 0.05. We remove these variables and refit the simultaneous models. Then, the LRT shows that race, radiation therapy, the number of persons supported by household income, BMI, and the total number of medical conditions reported are not statistically significant for the risk of death. We further reduce the models by removing them from the hazards model and refit the reduced simultaneous models.
Table 3 gives the results from this final models. From the “Simultaneous” columns, we see that the number of 12 oz. beers consumed per week, household income, tumor stage, and the total number of medical conditions reported are significantly associated with both patients’ HNCS QoL satisfaction and hazard of death. Using 30 or more of 12 oz. beers consumed per week as the reference group, all categories of the smaller amount are associated with HNCS QoL satisfaction and lower risk of death, higher household income is in general associated with HNCS QoL satisfaction and lower risk of death, both patients’ HNCS QoL satisfaction and risk of death are significantly different for patients in different tumor stages, and patients with a greater number of medical conditions reported have lower HNCS QoL satisfaction and higher risk of death. Specifically, for instance, with the log-scaled odds and hazard ratios of 1.060 and −1.076 for HNCS QoL satisfaction and death respectively, patients who consumed 5 to 14 of 12 oz. beers per week appear to have 2.886 times odds for HNCS QoL satisfaction and 0.341 times hazards of death compared to those that consumed 30 or more of 12 oz. beers per week after adjusting for the other covariates in the model. Looking at the number of medical conditions reported, for each additional medial condition reported, the odds ratio of HNCS QoL satisfaction is decreased by 16% and the hazard of death is increased by 29%. In the meantime, race (African-American), radiation therapy, the number of persons supported by household income, and BMI are selected only in the HNCS QoL longitudinal model. African-Americans, patients not treated with radiation therapy, patients in the family with the smaller number of persons supported by household income, or patients with higher BMI are likely to be satisfied with longitudinal HNCS QoL while the risk of death is not affected by these factors. Furthermore, we also find that time at survey measurement is statistically significant in the HNCS QoL longitudinal model implying that patients are more satisfied over time. The parameter ψ for the dependence between longitudinal HNCS QoL and survival time is negative and has p-value=0.131. This implies that the longitudinal HNCS QoL and survival time are marginally correlated and some latent factors which increase HNCS QoL satisfaction also decrease the risk of death.
Table 3.
Analyses results for the HNCS and survival time of the CHANCE study
Simultaneous | Separate | ||||||
---|---|---|---|---|---|---|---|
Parameter | Est. | ESE | P-value | Est. | ESE | P-value | |
< HNCS QoL longitudinal model > | |||||||
Intercept | β 0 | .744 | .538 | .167 | 1.190 | .390 | .002 |
Race (ref= White) | |||||||
– African American | β 1 | .564 | .229 | .014 | .511 | .256 | .047 |
# of 12 oz. beers consumed per week (ref=30 or more) | |||||||
– None | β 2 | .636 | .269 | .018 | .622 | .300 | .038 |
– less than 1 | β 3 | .830 | .357 | .020 | .735 | .396 | .064 |
– 1 to 4 | β 4 | 1.302 | .294 | <.001 | 1.268 | .326 | <.001 |
– 5 to 14 | β 5 | 1.060 | .251 | <.001 | 1.018 | .279 | <.001 |
– 15 to 29 | β 6 | .601 | .289 | .037 | .547 | .327 | .095 |
Household income (ref= level1: 0–10K) | |||||||
– level2: 20–30K | β 7 | −.271 | .231 | .241 | −.328 | .258 | .204 |
– level3: 40–50K | β 8 | .297 | .255 | .245 | .250 | .282 | .376 |
– level4: ≥ 60K | β 9 | 1.199 | .274 | <.001 | 1.045 | .286 | <.001 |
Radiation therapy (ref= No) | |||||||
– Yes | β 10 | −1.132 | .260 | <.001 | −1.048 | .280 | <.001 |
Tumor stage (ref= I) | |||||||
– II | β 11 | −.416 | .300 | .166 | −.352 | .330 | .286 |
– III | β 12 | −1.335 | .284 | <.001 | −1.198 | .314 | <.001 |
– IV | β 13 | −1.175 | .254 | <.001 | −1.057 | .277 | <.001 |
# of persons supported by household income | β 14 | −.189 | .084 | .025 | |||
BMI | β 15 | .041 | .015 | .007 | |||
Total # of medical conditions reported | β 16 | −.175 | .080 | .030 | |||
Time at survey measurement (years) | β 17 | .241 | .066 | <.001 | .254 | .067 | <.001 |
variance of random effects | .303 | .173 | .013 | 1.169 | .257 | ||
< Hazards model > | |||||||
Random effect coefficient | ψ | −1.427 | .946 | .131 | |||
# of 12 oz. beers consumed per week (ref=30 or more) | |||||||
– None | γ 1 | −.772 | .386 | .045 | |||
– less than 1 | γ 2 | −.155 | .426 | .715 | |||
– 1 to 4 | γ 3 | −.802 | .414 | .053 | |||
– 5 to 14 | γ 4 | −1.076 | .383 | .005 | |||
– 15 to 29 | γ 5 | −.591 | .399 | .139 | |||
Household income (ref= level1: 0–10K) | |||||||
– level2: 20–30K | γ 6 | −.218 | .294 | .459 | −.219 | .263 | .406 |
– level3: 40–50K | γ 7 | −.941 | .371 | .011 | −.928 | .331 | .005 |
– level4: ≥ 60K | γ 8 | −1.463 | .401 | <.001 | −1.393 | .358 | <.001 |
Tumor stage (ref= I) | |||||||
– II | γ 9 | −.199 | .465 | .668 | −.295 | .435 | .498 |
– III | γ 10 | .235 | .433 | .588 | .136 | .389 | .727 |
– IV | γ 11 | 1.059 | .360 | .003 | .914 | .295 | .002 |
Total # of medical conditions reported | γ 12 | .256 | .110 | .020 | .205 | .091 | .025 |
P-value for testing being zero is based on a mixture of 0 and χ2 distribution with 1 degree of freedom with equal mixing probabilities.
For the purpose of comparison, we conducted separate analyses for longitudinal HNCS QoL and survival time whose final results are given in the last three columns of Table 3. The generalized linear mixed model (GLMM) and the Cox proportional hazards model are used for longitudinal outcomes and survival time respectively. The GLMM also considers individual heterogeneity through subject-specific random effects although it does not incorporate the correlation between longitudinal outcomes and survival time. Comparing the results from the simultaneous and separate analyses of Table 3, we can see our simultaneous analysis additionally indicates the number of persons supported by household income, BMI, and the total number of medical conditions reported in the HNCS QoL longitudinal model (p-values=0.025, 0.007, and 0.030, respectively) and the number of 12 oz. beers consumed per week in the hazard model (p-values=0.045 and 0.005 for ‘None’ and ‘5 to 14’) as significant while they are not statistically significant by separate analyses.
Figure 1 shows the estimated baseline cumulative hazard rates over follow-up time with the 95% confidence interval. Since the baseline cumulative hazard rates are bounded by 0, we first log-transformed the estimated baseline cumulative hazard rates and obtained the 95% lower and upper bounds for the log-scaled estimated baseline cumulative hazards. Then, we re-transformed them into their original scale. The estimated baseline cumulative hazard rates look flat at the very early time within a year, but soon appear to be linearly increasing. Figure 2 shows the Kaplan-Meier estimates (solid line) and the predicted survival probabilities based on the simultaneous analysis (dashed line). These two survival curves are very close to each other which implies our proposed method fits the data well.
Fig. 1.
Estimated baseline cumulative hazards (solid line) with 95% confidence interval (dashed lines) by the simultaneous analysis of HNCS QoL longitudinal outcome and survival time
Fig. 2.
Kaplan-Meier estimates (solid line) and the predicted survival probabilities based on the simultaneous analysis of HNCS QoL longitudinal outcome and survival time (dashed line)
6 Concluding Remarks
We have proposed a method for the simultaneous modeling of longitudinal outcomes including both categorical and continuous data with a generalized linear mixed model and survival time with a stratified multiplicative proportional hazards model through random effects. We have also developed a maximum likelihood estimation method for the proposed simultaneous model, and presented asymptotic properties of the proposed estimators. The proposed estimation procedure using EM algorithm has been assessed via simulation studies. The proposed estimates performed well in finite samples. The variance estimates based on the observed information matrix approximate the true variance well in finite samples. The proposed method was applied to data from the CHANCE study.
When the dimension for random effects is high, computational intensity is increased due to high dimensional Gauss-Hermit quadrature (GQ) integration and the convergence could be slow. To handle such situation, alternative numerical methods such as adaptive quadrature or MCMC may be useful.
A stratified Cox PH model is considered for survival data and each stratum has its unspecified baseline hazard. Even though more strata introduce more baseline hazard functions, the number of the parameters which are associated with the jumps of each cumulative hazard functions remains the same as the total number of observed failures. Therefore, the additional strata do not increase computational complexity while using strata gives more flexible and robust structure if we do believe that survival experience between certain groups is very different and are not proportional over time.
In our proposed method, all the information on survival, longitudinal outcomes, and covaraties are used. As a result of this, the parameter estimates can be more efficient. The proposed model also generalizes previous work to general longitudinal outcomes. Future work can include relaxing normal assumption for the random effects, considering generalization to mixed types of longitudinal outcomes, and improving computational efficiency.
Acknowledgements
The authors thank the editor, the associate editor, and two referees of the Statistics in Bioscience for their valuable suggestions that considerably improved this article.
Appendix A. EM Algorithms
A.1. EM algorithm – Binary longitudinal data and survival time
(1) E-step
For binary longitudinal outcomes and survival time, we calculate the conditional expectation of q(bi) for subject i with Si=s given the observations and the current estimate (θ(k),Λs(k)) for some known function q(·). The conditional expectation denoted by E[q(bi|θ(k),Λs(k)] can be expressed as the following: Given the current estimate (θ(k),Λs(k)),
(6) |
where
(7) |
is an unique non-negative square root of (i.e. ), and zG follows a multivariate Gaussian distribution with mean zero.
(2) M-step
Since the parameter ϕ is set to 1 for logistic distribution, we estimate only β in the longitudinal process. β(k+1) solves the conditional expectation of complete data log-likelihood score equation, using one-step Newton-Raphson iteration,
, and have the same expressions as in Section 2.2.
A.2. EM algorithm – Poisson longitudinal data and survival time
(1) E-step
For Poisson longitudinal outcomes and survival time, given the current estimate (θ(k),Λs(k)), the conditional expectation denoted by can be expressed as in (6) with R(zG) defined as in (7),
and zG follows a multivariate Gaussian distribution with mean zero.
(2) M-step
Since the parameter ϕ is set to 1 for Poisson distribution, we estimate only β in the longitudinal process. β(k+1) solves the conditional expectation of complete data log-likelihood score equation, using one-step Newton-Raphson iteration,
, and have the same expressions as in Section 2.2.
Appendix B. Proofs for Theorems
In Appendices B.1 and B.2, we sketch the proofs for Theorem 1 and Theorem 2. All detailed technical proofs are available from the authors. From (3) in Section 2.2, we can have the observed log-likelihood function, lf(θ,Λ;Y,V) = log{Lf(θ,Λ;Y,V)}. We obtain and use the following modified object function, by replacing λs(Vi) with Λs{Vi} in lf(θ,Λ;Y,V) where Λs{Vi} is the jump size of Λs(t) at the observed time Vi with Δi = 1,
(8) |
and maximizes ln(θ,Λ) over the space , where consists of all the right-continuous step functions only; that is, . For the proofs of both Theorem 1 and Theorem 2, the modified object function is used in place of the observed log-likelihood function.
B.1. Proof of consistency – Theorem 1
Consistency can be proved by verifying the following three steps: First, we show the maximum likelihood estimate exists. Second, we show that, with probability one, , are bounded as n→∞. Third, if the second step is true, by Helly’s selection theorem (van der Vaart 1998), we can choose a subsequence of such that weakly converges to some right-continuous monotone function with probability one. For any sub-sequence, we can find a further sub-sequence, still denoted as , such that . Thus, in the third step, we show θ*=θ0 and . Once the three steps are completed, we can conclude that, with probability one, converges to θ0 and converges to Λs0(t) in [0,τ], s=1,…,S. Moreover, since Λs0(t) is right-continuous in [0,τ], the latter can be strengthened to uniform convergence; that is, a.s. Then, the proof of Theorem 1 will be done.
In the first step, since θ belongs to a compact set Θ by Assumption (A1), it is sufficient to show that Λs{Vi} with Δi = 1 is finite. For each subject i with Δi = 1, after simple algebra, we have that, from (8),
If Λs{Vi} → ∞ for some i with Δi = 1, then ln(θ,Λ) → −∞, which is contradictory to that ln(θ,Λ) is bounded. Therefore, we conclude that Λs{·} must be finite. By the conclusion and assumption (A1), the maximum likelihood estimate exists.
In the second step, we define and rescale by the factor . Then, we let denote the rescaled function; that is, . thus, . To prove this second step, it is sufficient to show is bounded. After some algebra in (8), we obtain that, for any ,
where , and . Thus, since where , it follows that
(9) |
According to the assumption (A2), there exist some positive constants C1,C2 and C3 such that . By denoting b0 as a vector of variables following a standard multivariate normal distribution, from concavity of the logarithm function, in the third term of (9),
where C4 and C5 are positive constants. Since it is easily verified that , by the strong law of large numbers and the assumption (A4), the third term of (9) can be bounded by a constant C6. i.e. . Then, (9) becomes
(10) |
where C7 is a constant. On the other hand, since, for any Γ ≥ 0 and x > 0, Γlog (1+x/Γ)≤Γx/Γ=x, we have that e−x≤ (1+x/Γ)−Γ. Therefore, with , (10) gives that
(11) |
where C8(Γ) is a deterministic function of Γ. For the s-th stratum, (11) is that
By the strong law of large numbers, . Then, we can choose Γ large enough such that . Thus, we obtain . In other words,
If we denote Bs0 = exp{2(C7+C8(Γ))/(ΓP(Vi=τ,Si=s))}, we conclude that . Note that the above arguments hold for every sample in the probability space except a set with zero probability. Therefore, we have shown that, with probability one, is bounded for any sample size n.
In the third step, for convenience, we omit the index i. Then, for the number of observed longitudinal measurements per subject, we use nN instead of the ni without subscript i since we denoted sample size as n. Use O to abbreviate the observed statistics and for a subject, and define
and a class , where Bs0 is the constant given in the second step and contains all nondecreasing functions in [0,τ]. Employing empirical process formulation, the class can be proved to be as P-Donsker by Theorems 2.5.6 and 2.7.5 in van der Vaart and Wellner (1996). In stratum s, denote ms as the number of subjects, and Vsk and Δsk as the observed time and censoring indicator for the k-th subject, respectively. By differentiating (8) with respect to Λs{Vsk}, we obtain
We also construct , another step function with the jump size , given by
Through the arguments using empirical process and relevant properties of P-Donsker and Glivenko-Cantelli class, we can prove that uniformly converges to Λs0(t) in [0,τ]. Next, by the bounded convergence theorem, the fact that converges to θ* and weakly converges to , and the Arzela-Ascoli theorem, we can prove that uniformly converges to . Then, from , using the properties of Glivenko-Cantelli class and Kullback-Leibler information, the following holds, with probability one,
(12) |
Our proof will be completed if we can show θ* = θ0 and from (12). To show that β*=β0, ϕ* = ϕ0 and , we let Δs = 0 and Vs = 0 in (12). By the comparison of the coefficients of YTY and Y in the exponential part and the constant term out of the exponential part and assumption (A5), we obtain ϕ*=ϕ0, β*=β0 and . To show that ψ* =ψ0, γ*=γ0 and , we let Δs =0 in (12). By the similar arguments done for β* = β0, ϕ* = ϕ0 and , both sides of (12) with Δs = 0 are expressed as the expected values with respect to the random effects b following a multivariate normal distribution with mean and covariance Σb0. By the fact that for any fixed , treating as a parameter in the normal family, b is the complete statistic for , and the assumptions (A2) and (A5), we have ψ* = ψ0, γ* = γ0 and . Therefore, the proof of Theorem 1 is completed.
B.2. Proof of asymptotic normality – Theorem 2
Asymptotic distribution for the proposed estimators can be shown if we can verify the conditions of Theorem 3.3.1 in van der Vaart and Wellner (1996). Then, it will be shown that the distribution is Gaussian. For completeness, we use Theorem 4 in Parner (1998) which restated the Theorem 3.3.1 of van der Vaart and Wellner (1996).
Theorem 4 (Parner 1998)
Let Un and U be random maps and a fixed map, respectively, from ξ to a Banach space such that:
.
The sequence converges in distribution to a tight random element Z.
the function ξ → U(ξ) is Fréchet differentiable at ξ0 with a continuously invertible derivative ▽Uξ0 (on its range).
Uξ0 and satisfies and converges in outer probability to ξ0. Then .
In our situation, the parameter for a fixed small constant δ. We define , where ∥h2∥v is the total variation of h2 in [0,τ] defined as
and also define that, for stratum s,
and
where lθ (θ,Λs) is the first derivative of the log-likelihood function from one single subject belonging to stratum s, denoted by l(O;θ,Λs), with respect to θ, and lΛs (θ,Λs) is the derivative of l(O;θ,Λsε) at ε = 0, where . Therefore, both Ums and Us map from Ξ to , and is an empirical process in the space .
Denoting corresponding to θ=(βT,ϕT,Vech(Σb)T,ψT,γT)T, for any , the class
can be shown as P-Donsker. From the P-Donsker property, it is also implied that
as ∥θ–θ0∥+supt∈[0,τ]|Λs(t)–Λs0(t)|→0. Then, for the conditions (a)–(d) in Theorem 4 of Parner (1998), we have that
(a) follows from Lemma 3.3.5 (p311) of van der Vaart and Wellner (1996);
(b) holds as a result of P-Donsker property and the convergence is defined in the metric space by the Donsker theorem in van der Vaart and Wellner (1996);
(d) is true because maximizes Pmsl(O;θ,Λs), (θ0,Λs0) maximizes Pl(O;θ,Λs), and converges to (θ0,Λs0) from Theorem 1;
The first half of condition (c), that the function ξ → U(ξ) is Fréchet differentiable at ξ0, is proved by showing there exists a bounded linear operator for the function.
Thus, it only remains to prove the second half of condition (c), that the derivative ▽Uξ0 is continuously invertible on its range . From the proof of the Fréchet differentiability of U(ξ) at ξ0, we have that for any (θ1,Λs1) and (θ1,Λs1) in Ξ,
(13) |
where both Ω1 and Ω2 are bounded linear operators on , and Ω = (Ω1,Ω2) maps to Rd × BV[0,τ], where BV[0,τ] contains all the functions with finite total variation in [0,τ]. The explicit expressions of Ω1 and Ω2 can be obtained from P-Donsker property and the derivation of ▽Uξ0 by definition. Thus, ▽Uξ0 is a linear operator from to itself. We note that to prove that ▽Uξ0 is continuously invertible is equivalent to showing that Ω is invertible. Then, by Theorem 4.25 of Rudin (1973), for the proof of invertibility of Ω, it is sufficient to verify that Ω is one to one: if Ω[h1,h2] = 0, then, by choosing θ1–θ2=ε*h1 and Λs1–Λs2=ε* ∫h2dΛs0 in (13) for a small constant ε*, we obtain
Since ▽Uξ0 (h1, ∫h2dΛs0)[h1,h2] is the negative information matrix in the submodel (θ0+εh1,Λs0+ε ∫h2dΛs0), the score function along this submodel is lθ(θ0,Λs0)T h1+lΛs(θ0, Λs0)[h2] = 0; that is, with probability one, the numerator of the score function
(14) |
where A’(D(tj;ϕ0))and C’(Yj;D(tj;ϕ0)) are the derivatives of A(D(tj;ϕ))and C(Yj;D(tj;ϕ)) with respect to ϕ evaluated at ϕ0 and B’(β0;b) is the derivative of B(β;b) with respect to β evaluated at β0. The proof of invertibility of Ω will be completed if we can show h1 = 0 and h2(t) = 0 from (14).
To show h1 = 0, particularly we let Δs = 0 and Vs = 0 in (14). Examining the coefficient for Y and the constant terms without Y and using assumption (A5) and the similar arguments done in Appendix B.1 give and Db = 0. On the other hand, letting Δs = 0 in (14) with assumptions (A2) and (A5) and the similar arguments done in Appendix B.1 lead to and h2(t) = 0. Hence, the proof of condition (c) is completed.
Since the conditions (a)–(d) have been proved, Theorem 3.3.1 of van der Vaart and Wellner (1996) concludes that weakly converges to a tight random element in . Then, we have
where oP(1) is a random variable which converges to zero in probability in , and, from (13),
By denoting , we have , and by replacing (h1,h2) with in the above two equations, we obtain
(15) |
We can see that the first term on the right-hand side in (15) is , which is an empirical process in the space . Furthermore, it is already shown that is P-Donsker. Therefore, weakly converges to a Gaussian process in .
Choose h2 = 0 in (15) with Proposition 3.3.1 in Bickel et al. (1993) concludes that is an efficient estimator for θ0. Therefore, Theorem 2 is proved.
Contributor Information
Jaeun Choi, Department of Health Care Policy, Harvard Medical School 180 Longwood Avenue, Boston, MA 02115, USA Tel.: +617-432-0183, Fax: +617-432-0173 choi@hcp.med.harvard.edu.
Jianwen Cai, Department of Biostatistics, University of North Carolina at Chapel Hill McGavran-Greenberg Hl, 135 Dauer Drive, CB 7420, Chapel Hill, NC 27599, USA cai@bios.unc.edu.
Donglin Zeng, Department of Biostatistics, University of North Carolina at Chapel Hill McGavran-Greenberg Hl, 135 Dauer Drive, CB 7420, Chapel Hill, NC 27599, USA dzeng@bios.unc.edu.
Andrew F. Olshan, Department of Epidemiology, University of North Carolina at Chapel Hill McGavran-Greenberg Hl, 135 Dauer Drive, CB 7435, Chapel Hill, NC 27599, USA andyolshan@unc.edu
References
- Albert PS, Follmann DA. Modeling Repeated Count Data Subject to Informative Dropout. Biometrics. 2000;56:667–677. doi: 10.1111/j.0006-341x.2000.00667.x. [DOI] [PubMed] [Google Scholar]
- Albert PS, Follmann DA. Random Effects and Latent processes approaches for analyzing binary longitudinal data with missingness: a comparison of approaches using opiate clinical trial data. Stat Methods Med Res. 2007;16:417–439. doi: 10.1177/0962280206075308. [DOI] [PubMed] [Google Scholar]
- Albert PS, Follmann DA, Wang SA, Suh EB. A Latent Autoregressive Model for Longitudinal Binary Data subject to Informative Missingness. Biometrics. 2002;58:631–642. doi: 10.1111/j.0006-341x.2002.00631.x. [DOI] [PubMed] [Google Scholar]
- Bickel PJ, Klaassen CAJ, Ritov Y, Wellner JA. Efficient and Adaptive Estimation for Semiparametric Models. Johns Hopkins University Press; Baltimore: 1993. [Google Scholar]
- Brown ER, Ibrahim JG. A Bayesian Semiparametric Joint Hierarchical Model for Longitudinal and Survival Data. Biometrics. 2003;59:221–228. doi: 10.1111/1541-0420.00028. [DOI] [PubMed] [Google Scholar]
- Chakraborty A, Das K. Inferences for joint modelling of repeated ordinal scores and time to event data. Comput Math Methods Med. 2010;11:281–295. doi: 10.1080/17486701003789096. [DOI] [PubMed] [Google Scholar]
- Chen W, Ghosh D, Raghunathan TE, Sargent DJ. Bayesian Variable Selection with Joint Modeling of Categorical and Survival Outcomes: An Application to Individualizing Chemotherapy Treatment in Advanced Colorectal Cancer. Biometrics. 2009;65:1030–1040. doi: 10.1111/j.1541-0420.2008.01181.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen MH, Ibrahim JG, Lipsitz SR. Bayesian methods for missing covariates in cure rate models. Lifetime Data Anal. 2002;8:117–146. doi: 10.1023/a:1014835522957. [DOI] [PubMed] [Google Scholar]
- Ding J, Wang JL. Modeling Longitudinal Data with Nonparametric Multiplicative Random Effects Jointly with Survival Data. Biometrics. 2008;64:546–556. doi: 10.1111/j.1541-0420.2007.00896.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Divaris K, Olshan AF, Smith J, Bell ME, Weissler MC, Funkhouser WK, Bradshaw PT. Oral Health and Risk for Head and Neck Squamous Cell Carcinoma: the Carolina Head and Neck Cancer Study. Cancer Cause Control. 2010;21:567–575. doi: 10.1007/s10552-009-9486-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dunson DB, Herring AH. Bayesian latent variable models for mixed discrete outcomes. Biostatistics. 2010;6:11–25. doi: 10.1093/biostatistics/kxh025. [DOI] [PubMed] [Google Scholar]
- Elashoff RM, Li G, Li N. An Approach to Joint Analysis of Longitudinal Measurements and Competing Risks Failure Time Data. Stat Med. 2007;26:2813–2835. doi: 10.1002/sim.2749. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Elashoff RM, Li G, Li N. A Joint Model for Longitudinal Measurements and Survival Data in the Presence of Multiple Failure Types. Biometrics. 2008;64:762–771. doi: 10.1111/j.1541-0420.2007.00952.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Faucett CL, Schenker N, Elashoff RM. Analysis of Censored Survival Data with Intermittently Observed Time-Dependent Binary Covariates. J Amer Statist Assoc. 1998;93:427–437. [Google Scholar]
- Faucett CJ, Thomas DC. Simultaneously Modeling Censored Survival Data and Repeatedly Measured Covariates: A Gibbs Sampling Approach. Stat Med. 1996;15:1663–1685. doi: 10.1002/(SICI)1097-0258(19960815)15:15<1663::AID-SIM294>3.0.CO;2-1. [DOI] [PubMed] [Google Scholar]
- Gueorguieva RV, Agresti A. A Correlated Probit Model for Joint Modeling of Clustered Binary and Continuous Responses. J Amer Statist Assoc. 2001;96:1102–1112. [Google Scholar]
- Henderson R, Diggle P, Dobson A. Joint Modeling of Longitudinal Measurements and Event Time Data. Biometrics. 2000;4:465–480. doi: 10.1093/biostatistics/1.4.465. [DOI] [PubMed] [Google Scholar]
- Hogan J, Laird N. Mixture Models for the Joint Distribution of Repeated Measures and Event Times. Stat Med. 1997;16:239–257. doi: 10.1002/(sici)1097-0258(19970215)16:3<239::aid-sim483>3.0.co;2-x. [DOI] [PubMed] [Google Scholar]
- Hu W, Li G, Li N. A Bayesian Approach to Joint Analysis of Longitudinal Measurements and Competing Risks Failure Time Data. Stat Med. 2009;28:1601–1619. doi: 10.1002/sim.3562. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang X, Li G, Elashfoff RM, Pan J. A General Joint Model for Longitudinal Measurements and Competing Risks Survival Data with Heterogeneous Random Effects. Lifetime Data Anal. 2011;17:80–100. doi: 10.1007/s10985-010-9169-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang W, Zeger S, Anthony J, Garrett E. Latent Variable Model for Joint Analysis of Multiple Repeated Measures and Bivariate Event times. J Amer Statist Assoc. 2001;96:906–14. [Google Scholar]
- Larsen K. Joint analysis of time-to-event and multiple binary indicators of latent classes. Biometrics. 2004;60:85–92. doi: 10.1111/j.0006-341X.2004.00141.x. [DOI] [PubMed] [Google Scholar]
- Lin HQ, McCulloch CE, Mayne ST. Maximum likelihood estimation in the joint analysis of time-to-event and multiple longitudinal variables. Stat Med. 2002;21:2369–2382. doi: 10.1002/sim.1179. [DOI] [PubMed] [Google Scholar]
- Liu L, Ma JZ, O’Quigley J. Joint analysis of multi-level repeated measures data and survival: an application to the end stage renal disease (ESRD) data. Stat Med. 2008;27:5676–5691. doi: 10.1002/sim.3392. [DOI] [PubMed] [Google Scholar]
- Louis TA. Finding the Observed Information Matrix when Using the EM Algorithm. J. Royal Statist Soc B. 1982;44:226–233. [Google Scholar]
- Parner E. Asymptotic Theory for the Correlated Gamma-frailty Model. Ann Stat. 1998;26:183–214. [Google Scholar]
- Pulkstenis EP, Ten Have TR, Landis JR. Model for the Analysis of Binary Longitudinal Pain Data Subject to Informative Dropout through Remedication. J Amer Statist Assoc. 1998;93:438–450. [Google Scholar]
- Ribaudo HJ, Thompson SG, Allen-Mersh TG. A Joint Analysis of Quality of Life and Survival Using a Random Effect Selection Model. Stat Med. 2000;19:3237–3250. doi: 10.1002/1097-0258(20001215)19:23<3237::aid-sim624>3.0.co;2-q. [DOI] [PubMed] [Google Scholar]
- Rizopoulos D, Verbeke G, Lesaffre E, Vanrenterghem Y. A Two-Part Joint Model for the Analysis of Survival and Longitudinal Binary Data with Excess Zeros Biometrics. 2008;64:611–619. doi: 10.1111/j.1541-0420.2007.00894.x. [DOI] [PubMed] [Google Scholar]
- Rizopoulos D, Verbeke G, Molenberghs G. Shared Parameter Models under Random Effects Misspecification. Biometrika. 2008;95:63–74. [Google Scholar]
- Satterthwaite FW. An Approximate Distribution of Estimates of Variance Components. Biometrics. 1946;2:110–114. [PubMed] [Google Scholar]
- Schluchter MD. Methods for the analysis of informatively censored longitudinal data. Stat Med. 1992;11:1861–1870. doi: 10.1002/sim.4780111408. [DOI] [PubMed] [Google Scholar]
- Song X, Davidian M, Tsiatis AA. A Semiparametric Likelihood Approach to Joint Modeling of Longitudinal and Time-to-Event Data. Biometrics. 2002;58:742–753. doi: 10.1111/j.0006-341x.2002.00742.x. [DOI] [PubMed] [Google Scholar]
- Song X, Wang CY. Semiparametric Approaches for Joint Modeling of Longitudinal and Survival Data with Time-Varying Coefficients. Biometrics. 2007;64:557–566. doi: 10.1111/j.1541-0420.2007.00890.x. [DOI] [PubMed] [Google Scholar]
- Tseng YK, Hsieh R, Wang JL. Joint Modelling of Accelerated Failure Time and Longitudinal Data. Biometrika. 2005;92:587–603. [Google Scholar]
- Tsiatis AA, De Gruttola V, Wulfsohn M. Modeling the Relationship of Survival to Longitudinal Data Measured with Error. Applications to Survival and CD4 Counts in Patients with AIDS. J Amer Statist Assoc. 1995;90:27–37. [Google Scholar]
- Tsiatis AA, Davidian M. A Semiparametric Estimator for the Proportional Hazards Model with Longitudinal Covariates Measured with Error. Biometrika. 2001;88:447–458. doi: 10.1093/biostatistics/3.4.511. [DOI] [PubMed] [Google Scholar]
- van der Vaart AW. Asymptotic Statistics. Cambridge University Press; 1998. [Google Scholar]
- van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. Springer–Verlag; New York: 1996. [Google Scholar]
- Wang Y, Taylor JMG. Jointly Modeling Longitudinal and Event Time Data with Application to Acquired Immunodeficiency Syndrome. J Amer Statist Assoc. 2001;96:895–905. [Google Scholar]
- Wu M, Bailey K. Estimation and Comparison of Changes in the Presence of Informative Right Censoring: Conditional Linear Model. Biometrics. 1989;45:939–955. [PubMed] [Google Scholar]
- Wu M, Carroll R. Estimation and Comparison of Changes in the Presence of Informative Right Censoring by Modelling the Censoring Process. Biometrics. 1988;44:175–188. [Google Scholar]
- Wulfsohn M, Tsiatis AA. A Joint Model for Survival and Longitudinal Data Measured with Error. Biometrics. 1997;53:330–39. [PubMed] [Google Scholar]
- Xu J, Zeger S. The Evaluation of Multiple Surrogate Endpoints. Biometrics. 2001a;57:81–87. doi: 10.1111/j.0006-341x.2001.00081.x. [DOI] [PubMed] [Google Scholar]
- Xu J, Zeger S. Joint Analysis of Longitudinal Data Comprising Repeated Measures and Times to Events. Appl Stat. 2001b;50:375–387. [Google Scholar]
- Yao F. Functional approach of flexibly modelling generalized longitudinal data and survival time. Journal of Statistical Planning and Inference. 2008;138:995–1009. [Google Scholar]
- Ye W, Lin XH, Taylor JMG. Semiparametric modeling of longitudinal measurements and time-to-event data–a two-stage regression calibration approach. Biometrics. 2008;64:1238–1246. doi: 10.1111/j.1541-0420.2007.00983.x. [DOI] [PubMed] [Google Scholar]
- Zeng D, Cai J. Simultaneous Modelling of Survival and Longitudinal Data with an Application to Repeated Quality of Life Measures. Lifetime Data Anal. 2005a;11:151–174. doi: 10.1007/s10985-004-0381-0. [DOI] [PubMed] [Google Scholar]
- Zeng D, Cai J. Asymptotic Results for Maximum Likelihood Estimators in Joint Analysis of Repeated Measurements and Survival Time. Ann Stat. 2005b;33:2132–2163. [Google Scholar]