Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 Feb 22.
Published in final edited form as: Ann Stat. 2012 Sep 5;40(3):1465–1488. doi: 10.1214/12-AOS996

MODELING LEFT-TRUNCATED AND RIGHT-CENSORED SURVIVAL DATA WITH LONGITUDINAL COVARIATES

Yu-Ru Su 1, Jane-Ling Wang 2
PMCID: PMC5822752  NIHMSID: NIHMS937870  PMID: 29479122

Abstract

There is a surge in medical follow-up studies that include longitudinal covariates in the modeling of survival data. So far, the focus has been largely on right censored survival data. We consider survival data that are subject to both left truncation and right censoring. Left truncation is well known to produce biased sample. The sampling bias issue has been resolved in the literature for the case which involves baseline or time-varying covariates that are observable. The problem remains open however for the important case where longitudinal covariates are present in survival models. A joint likelihood approach has been shown in the literature to provide an effective way to overcome those difficulties for right censored data, but this approach faces substantial additional challenges in the presence of left truncation. Here we thus propose an alternative likelihood to overcome these difficulties and show that the regression coefficient in the survival component can be estimated unbiasedly and efficiently. Issues about the bias for the longitudinal component are discussed. The new approach is illustrated numerically through simulations and data from a multi-center AIDS cohort study.

Keywords and phrases: Likelihood approach, Semiparametric efficiency, Biased sample, EM algorithm, Monte Carlo integration

1. Introduction

Since the seminal paper by Wulfsohn and Tsiatis (1997), longitudinal covariates have played an increasingly important role in the modeling of survival data. One major challenge to incorporate longitudinal covariates is that simple approaches, such as the partial likelihood method for the Cox proportional hazards model (Cox, 1972), often require knowledge of the entire longitudinal process. This is often not feasible in reality for follow-up checks at discrete and intermittent time points. A common practice is to impute the values of the missing longitudinal processes and then apply the partial likelihood approach to the imputed data. This is called a two-stage approach, where the longitudinal process is imputed at the first stage before the partial likelihood approach is employed to estimate parameters in the survival model at the second stage. The most common imputation method is to use the last and most recent value of the patient to impute a missing value, the so-called the last-value-carry-forward method, which has been adopted in standard software such as SAS and R. Additional two-stage procedures were developed by Tsiatis, DeGruttola and Wulfsohn (1995) and Dafni and Tsiatis (1998).

It is easy to foresee serious biases with such an imputation method if the follow-up schedule is infrequent over time and also when the longitudinal covariates are contaminated by noises or measurement errors. Both scenarios provide strong motivation to find alternative approaches. The approach developed by Wulfsohn and Tsiatis (1997) to model the survival and longitudinal data simultaneously through their joint likelihood is attractive on two counts: (i) the resulting parametric estimators are semiparametrically efficient when the baseline hazard function is unknown, and (ii) the joint likelihood procedure is often insensitive to the normality assumption on the longitudinal data, if there is a reasonable number of repeated measurements available for the longitudinal processes, see Zeng and Cai (2005) and Dupuy, Grama and Mesbah (2006) for (i) and Song, Davidian and Tsiatis (2002), Tsiatis and Davidian (2004), and Hsieh, Tseng and Wang (2006) for (ii).

The above joint likelihood approach not only successfully removes the biases on the survival component but also leads to efficient estimation. A historical example for the joint likelihood approach is the investigation of CD4 T-cell counts as a biomarker of time-to-death or time-to-AIDS (DeGruttola and Tu, 1994; Wulfsohn and Tsiatis, 1997; Henderson, Diggle and Dobson, 2000). In these and other works, the survival time is subject to the usual right censoring. However, left truncation is common for studies with delayed entry. Specifically, if the recruitment of patients continues after the onset time of a study, those that have already experienced the event are often excluded from the study, which then results in left truncation of the event-time. Patients who remain in the study are further subject to the usual right censoring, so the sample consists of left truncated and right censored (LTRC) survival times. It is well known that left truncation is a biased sampling plan as subjects with shorter survival times tend to be excluded from the sample. As a result, the longitudinal measurements are also sampled with bias.

An example of left truncated and right censored longitudinal study is the Italian multi-center HIV (human immunodeficiency virus) study (Rezza et al. (1989); The-Italian-Seroconversion-Study (1992)), where the primary endpoint is the time from HIV positive to AIDS onset, i.e. the incubation period of AIDS. In this study, patients who have developed AIDS at the time of recruitment were excluded from the study, resulting in left truncation of the survival data, and CD4 counts for those who were HIV positive but ADIS free were measured at each follow-up visit. As there are no procedures available to handle such data properly, we develop in this paper a semiparametric joint likelihood approach to accommodate LTRC survival data with longitudinal covariates that are measured intermittently.

Although there is a sizable literature to jointly model right-censored survival and longitudinal data (see Wulfsohn and Tsiatis (1997), Henderson, Diggle and Dobson (2000), Song, Davidian and Tsiatis (2002) and the review papers by Tsiatis and Davidian (2004)), the extension to LTRC survival data turns out nontrivial due to the left truncation feature of the data. To see this, consider first the simpler case of left truncated data with time-independent covariates or no covariates at all. Lynden-Bell (1971), Woodroofe (1985), and Wang (1987) investigated estimation of the survival function when subjects come from the same population, i.e. there are no covariates involved. Here, one only needs to adjust the risk set for truncated data to reach a suitable extension of the Kaplan-Meier estimator. For time-independent covariates Andersen et al. (1993) considered estimation under the Cox model and showed that the partial likelihood approach for right censored data still works for LTRC survival data when one conditions on the values of the covariates and truncation times.

For time-dependent covariate, Andersen et. al. (1993)’s partial likelihood approach can still be employed if the entire covariate history is available for all subjects. This is not the case for longitudinal covariates that are observed intermittently at discrete time points. Since imputation methods lead to biases of the estimates, bias corrected approaches have been employed in the literature for right censored data with longitudinal covariates. In particular, Wang (2006) proposed a method to correct the bias through the partial score equation. Such an approach is termed “corrected score” methods, which originates from studies of measurement errors. While corrected score methods typically lead to n-consistent estimators for the regression parameters in the Cox model, they are not efficient and easy to derive. Extension of the corrected score methods to LTRC (left-truncated and right censored) data might be feasible but have not been explored. In this paper, we adopt the full and joint likelihood approach of the survival and longitudinal data due to its aforementioned efficiency and robustness features. Unfortunately, direct maximization of the full joint likelihood is much more complicated than the cases with no left truncation. We discovered a modified likelihood that is simpler, yet retains the efficiency of the full likelihood approach, as described in Section 2.

The rest of the paper is organized as follows. In Section 2, we introduce a joint model setting for both the survival time and longitudinal processes and propose a modified likelihood approach for statistical inference. An EM algorithm to maximize the modified likelihood is derived in Section 3, along with the large sample properties of the nonparametric maximum modified likelihood estimator (NPMMLE), including consistency, asymptotic normality, and efficiency. Numerical performance of the proposed estimating procedure is validated through simulation studies in Section 4 and illustrated through the Italian HIV study in Section 5. Section 6 contains some discussion.

2. Joint modeling under LTRC

We consider the setting that the survival time Y* of a subject is subject to random left truncation by T*, so a subject is enrolled in a study only if Y* ≥ T*. Let n be the total number of subjects enrolled in the study. With such a biased sampling plan, to avoid confusion of notations, we denote the survival and truncation time of the ith enrolled subjects as (Yi, Ti), which are sampled from the joint subpopulation of ( Yi,Ti), where YiTi. Upon entering the study, these n subjects are subject to the usual right censorship, so the final observed survival data for the ith subject is a triplet (Ti, Zi, Δi), where Zi = min(Yi, Ci) is the time of the endpoint event or drop-out (censoring) time Ci, whichever occurs first, and Δi = I(YiCi) is the censoring indicator.

In reality, drop-out or censoring only occurs when a subject is enrolled into the study. This fact implies that the right-censoring time Ci is greater than the truncation time Ti, for i = 1, …, n. Therefore, we introduce a positive random variable Ui to represent the time from entry into the study to drop-out from the study, i.e. Ui = CiTi.

In addition to the survival data, baseline and longitudinal covariates are collected intermittently for the ith subject from the time the subject enters the study until the observational limit Zi. This results in ni repeated measurements, denoted by W⃗i = (Wi1, Wi2, …, Wini), where the measurements are taken at time points s⃗i = (si1, si2, …, sini). It is important to make a note here that the observed W⃗i are also subject to the same biased sampling plan as the survival data, so there is a background longitudinal vector, which we will denote as Wi for the ith subject enrolled in the study. Therefore, W⃗i is sampled from the subpopulation of W⃗*, where YiTi and values beyond Zi are not observed. For simplicity of notation, we assume in this section that there is only one longitudinal covariates, but additional longitudinal or baseline covariates can be handled easily and the AIDS data discussed in Section 5 contain two longitudinal covariates, one observed intermittently but the complete history of the other one, the time-dependent treatment indicator, is available.

2.1. The Joint Models

Since repeated measurements from the same subjects are likely to be correlated, we introduce a latent q × 1 random vector Ai to account for their dependency and assume a common parametric density function fA(·|α) with an unknown parameter α for Ai. A linear mixed effects model will be considered for the longitudinal covariate:

Wi=X(si)+εi=g(si)Ai+εi, (2.1)

where g(·) is a known q-dimensional function and the ni × 1 vector εi plays the role of measurement errors, sampled from a multivariate normal distribution with independent marginal distribution 𝒩(0, σ2), and independent of all other aforementioned random variables.

For the survival time Yi, a proportional hazards model is employed, and the hazard rate of Yi at time t given Ai is

λYi(t|Ai)=λ0(t) exp (βXi(t)), (2.2)

where λ0 is the baseline hazard rate and β is the regression coefficient. The truncation time Ti and the time Ui, from entry to drop-out, are assumed to have distribution function FT*(·) and FU(·) respectively. We adopt the standard assumption in survival analysis, that Yi,Ti and Ui are conditionally independent given the covariates. This is equivalent to assuming conditional independence of Yi,Ti, and Ui given the value of Ai. We also assume that Ti and Ui are independent of Ai and the parameters in the models for either the survival or longitudinal parts are noninformative.

2.2. A Modified Likelihood Approach

For the model described in the previous subsection, the parameters of interest are (β, α, σ2 and Λ0(·)), where the first three components are in the Euclidean space whereas Λ0(t)=0tλ0(u) du, the cumulative hazard function, is in a functional space, hence the model is semiparametric. Since a likelihood approach usually provides the most efficient estimating procedure, we first consider the full likelihood function LiO based on the observations (ti, zi, δi, w⃗i) from the ith subject. The derivation of the full likelihood from the ith subject is shown below.

LiO=f(T,Y,Δ,W)(ti,zi,δi,wi)=f(T,Y,Δ,W)(ti,zi,δi,wi)P(YT)={[fY(zi|Ai=ai)]δi[SY(zi|Ai=ai)]1δifW(wi|Ai=ai)fA(ai) dai}fT(ti)P(YiTi)={[fY(zi|Ai=ai)]δi[SY(zi|Ai=ai)]1δiSY(ti|Ai=ai)fW(wi|Ai=ai)SY(ti|A=ai)fA(ai)SY(ti) dai}SY(ti)fT(ti)P(YiTi)={[fY(zi|Yiti,Ai=ai)]δi[SY(zi|Yiti,Ai=ai)]1δifW(wi|Ai=ai)fA(ai|Yiti) dai}fT(ti|YiTi), (2.3)

where fV is the density function of the random variable V in the subscript, and SV is the corresponding survival function. In (2.3), besides the baseline hazard function λ0, the density function fT* also serves as a nonparametric component. Because of these two nonparametric components, the full likelihood function is unbounded, so we resort to the nonparametric maximum likelihood approach, which leads to a similar scenario as in conventional survival analysis that the full likelihood is the same as the conditional likelihood given the left-truncation time. This has been explored in the literature (Andersen et al., 1993; Klein and Moeschberger, 2003) for LTRC data with baseline covariates and was first explored in Wang (1987) for the simpler situation of left truncated data that came from a single population. Following a similar argument as in Wang (1987), we found that the full likelihood can be simplified to the following conditional likelihood for the ith subject as

LiC={[fY(zi|Yiti,Ai=ai)]δi[SY(zi|Yiti,Ai=ai)]1δifW(wi|Ai=ai)fA(ai|Yiti) dai}. (2.4)

Next, we consider the nonparametric maximum likelihood estimators (NPMLE) of the survival component, which, by a similar argument for joint modeling right-censored data and their longitudinal covariates (Zeng and Cai, 2005; Dupuy, Grama and Mesbah, 2006), leads to a piecewise linear baseline cumulative hazard function with jumps at each uncensored event time (i.e. at Yi, whenever Δi = 1). Let nu denote the total number of uncensored events, the baseline cumulative hazard function is thus re-parameterized as a nu-dimensional vector.

So far, the derivation of the likelihood function and NPMLE follows a similar path as the much investigated case of a joint modeling setting with right censored data, where NPMLE’s for the parametric component enjoy nice asymptotic properties and are semiparametrically efficient. Despite these similarities, the left truncation feature triggers complications in the estimation of the finite dimensional parameter in the joint LTRC model. First, as shown in the Appendix, the parameter α associated with the latent variable A* is not identifiable. This is a consequence of the biased sampling plan, since the samples are actually drawn from the subpopulation Y* ≥ T*. Consequently, only E(A*|Y* ≥ T*) and var(A*|Y* ≥ T*) could be identified under the normality assumption. Thus, while it is possible to identify the unknown parameters of Y* and T* based on the joint conditional distribution of (Y*, T*)|Y* ≥ T*, where the notation (·|Y* ≥ T*) stands for a random variable/ vector sampled from the subpopulation with Y* ≥ T*, there is not enough information to recover E(A*) and var(A*) and hence the true longitudinal parameters α.

A second complication is that the score equations for the survival components, β and Λ0, are much more complicated than the situation under a right censored only model and, as shown in Appendix A.1, as they require estimation of the expectations of nonlinear functions of the observed data along with the the parameters of interest. This motivates us to modify the likelihood so as to simplify the estimation of all parameters that are identifiable. Our proposal is to aim at the following modified likelihood, denoted by Lm, as an alternative of the full, also the conditional, likelihood in (2.4). The modified likelihood is

Lm=i=1n{[fY(zi|Yiti,Ai=ai)]δi[SY(zi|Yiti,Ai=ai)]1δifW(wi|Ai=ai)fA(ai)dai}fT(ti|YiTi), (2.5)

where the lower case variables denote the values of the corresponding upper case variables, e.g. δi is the value of Δi. The estimators obtained by maximizing the modified likelihood, where the nonparametric cumulative hard function is replaced by a step function will be referred to as the nonparametric maximum modified likelihood (NPMMLE) hereafter.

The difference between (2.4) and (2.5) is that fA(ai|Yiti) in the full likelihood (2.4) is replaced by fA*(ai) in (2.5). This is motivated by the fact that fA(a|Yt)=SY(t|A=a)SY(t)fA(a) and E[SY(t|A)SY(t)]=1, for any t, and that, as shown in Lemma A.1 in the Appendix, the score functions of the survival parameters from (2.5) are asymptotically the same as those from (2.4). Theoretical results in the next section and numerical evidence in Section 4 demonstrate good performance of estimators of all the survival parameters, (β, Λ0(·)) and of the measurement errors σ2 of the longitudinal component that we derived from this modified likelihood.

3. EM-algorithm and asymptotic properties

Let γ = (β, α, σ2) be the finite dimensional parameter in the joint survival and longitudinal model, and Λ be a step function. The log modified likelihood is

lm(γ,Λ)=i=1nln[Λ{zi} exp βg(zi)ai]δiexp{j:ti<yj0ziΛ{yj0} exp{βg(yj0)ai}}(2πσ2)ni/2exp{j=1mi[wijg(sij)ai]2/(2σ2)}fA(ai)dai,

where Λ{·} is the jump size of Λ at the respective time point in the argument, and yj0 is the jth sorted observed survival time in increasing order. Moreover, τ1 and τ2 denote the lower bound of truncation time and the largest censoring time corresponding to the end of the study.

Since direct maximizing the proposed modified likelihood involves integration of a complex function with respect to the random effects, we employ the expectation-maximization (EM) algorithm (Laird and Ware, 1982) to stabilize the maximization procedure. In the implementation of the EM algorithm, a Monte Carlo integration approach is used to approximate the expectation terms of functions h(A*) appearing in the E-step. A one-step Newton-Raphson method is applied to solve the nonlinear equations in the M-step. The posterior density of the random effects Ai given the observed data from the ith subject, oi = (ti, zi, δi, w⃗i), is of the form

fA|O(a|oi)=f(Y,Δ)|(A,T)(zi,δi|a,ti)×fA|W(a|wi)f(Y,Δ)|(A,T)(zi,δi|a,ti)×fA|W(a|wi)da=[Λ{zi}]δiexp{j:ti<yj0ziΛ{yi0} exp{βg(yj0)a}}×fA|W(a|wi)[Λ{zi}]δiexp{j:ti<yj0ziΛ{yi0} exp{βg(yj0)a}}×fA|W(a|wi)da.

For a simpler implementation of the algorithm, we shall impose a normal assumption on the random effects and assume that Ai, i = 1, …, n, follow a normal distribution N(μ, Σ), where (μ, Σ) plays the role of the parameter α.

By taking the first derivative of the log modified likelihood calculated in the E-step with respect to each parameter, the NPMMLE, β̂, {λ̂k, k = 1, …, nu}, σ̂, μ̂, and Σ̂, can be obtained through the following formulae, where λk is the jump size of Λ at the kth sorted observed survival time,

λ^k=1i:ti<yk0ziE[exp{βg(yk0)Ai}|oi],k=1,,nu,
σ^2=1i=1nnii=1nj=1niE[(wijg(sij)Ai)2|oi],
μ^=1ni=1nE(Ai|oi),
^=1ni=1nE[(Aiμ^)(Aiμ^)T|oi],

and β̂ is the root of the score s(β), which is solved by an one-step Newton-Raphson method with the updating rule

βnew=βolds(βold)s(βold),

where

s(β)=i=1nδi[g(zi)E(Ai|oi)j:tj<zizjE(g(zi)Ajexp{βg(zi)Aj}|oj)j:tj<zizjE(exp{βg(zi)Aj}|oj)],
s(β)=i=1nδi{[j:tj<zizjE(g(zi)Ajexp{βg(zi)Aj}|oj)j:tj<zizjE(exp{βg(zi)Aj}|oj)]2j:tj<zizjE((g(zi)Aj)2exp{βg(zi)Aj}|oj)j:tj<zizjE(exp{βg(zi)Aj}|oj)}.

Except for α, the proposed nonparametric maximum modified likelihood estimates (NPMMLE) of the parameters enjoy nice properties that are similar to the NPMLE, as illustrated in the next two theorems. Below we listed some regularity conditions needed for the theorems.

  • C1

    The parameter space of the finite dimensional parameters, Sγ, is bounded and closed on Euclidean space. The true value γ0 is an interior point of Sγ.

  • C2

    On the parameter space of β, (exp{βg(S)A*}|Y* ≥ T*) is bounded below by m and above by M with probability 1.

  • C3

    P(Tτ1 and Yτ2) > 0. This ensures that not all data are truncated or censored.

  • C4

    Eθ0{exp[β0g(u)A*]I(T* < uY*)|Y* ≥ T*} is bounded away from 0 on the parameter space of β. Here Eθ0(·) stands for the expectation taken under the true value of the parameter θ0.

  • C5

    g(t) is of uniformly bounded variation on [τ1, τ2], and there exists a constant D such that P(niD) = 1, ∀ i.

  • C6

    The distribution fA*(·|α) is continuous with respect to α and has continuous second derivative with respect to α. Moreover, the Fisher information matrix obtained from fA* for α is positive definite.

Theorem 1

Consistency of the estimators. Under the regularity conditions C1–C5, the NPMMLE of (β0, σ02, Λ0), denoted as (β̂n, σ̂2, Λ̂n), is consistent under the Euclidean norm |·| and supremum norm ‖·‖ on [τ1, τ2] respectively.

For H = {h = (h1, h2, h3)} and 0 < p < ∞, let Hp = {hH :‖h1‖, |h2|, ‖h3υp}, be a collection of directions that are used in the Appendix. The notation ‖·‖υ denotes the the total variation of the function in the norm plus the absolute value of this function evaluated at 0. The next theorem shows that the NPMMLE converges in distribution to a Gaussian element in the parameter space at a n-rate.

Theorem 2

Asymptotic normality and efficiency. Under the regularity conditions C1–C6, the process n(α^nE(α^n),σ^n2σ02,β^nβ0,Λ^nΛ0) converges in distribution to a mean zero Gaussian process G in the functional space l(Hp) on Hp. Moreover, the NPMMLE β̂ is semiparametrically efficient for β0.

Proofs of these two theorems are provided in the Appendix.

For estimating the standard errors of the NPMMLE, we recommend to use the bootstrap procedure instead of the profile likelihood approach in Murphy and van der Vaart (2000) and Zeng and Cai (2005), which did not work well for LTRC data due to the high fluctuation of the estimated profile likelihood function and possibly negative estimate of the standard error. The performance of the bootstrap procedure for estimating the standard errors of the NPMLE under joint modeling with right-censoring cases has been studied by Tseng, Hsieh and Wang (2005) for the accelerated failure time model, and by Hsieh, Tseng and Wang (2006) for the Cox model. The results in these two papers and support the validity of the bootstrap method in the scope of joint modeling. Our simulation results reported in Section 4 also supports the use of the bootstrap approach. In comparison, the bootstrap method is more reliable than the profile likelihood method at a higher computational cost.

4. Simulation Study

To verify numerically the validity of the proposed procedure, we conducted simulations under five different settings. Since there is an intrinsic bias on the longitudinal component, the simulations focus on the performance of the estimate of β and how it would be affected by the level of contamination from the measurement errors and the variation of the random effects. As a benchmark setting, we considered a linear trend in time with random effects on the longitudinal covariate and assess the influence of the variance of the random slope on the accuracy of estimating β. The left-truncation times are generated from an exponential distribution with parameter 1, while the right-censoring times are from an exponential distribution with parameter 3. The baseline hazard rate is from an exponential distribution with mean 1. All 5 simulation settings have sample size n = 200 with true values β = 1, μ = (2, 0.5) and (σ11, σ12) = (0.5, −0.001). The values of (σ22, σ2) are different for the five settings and set as: (0.01,0.1), (0.01,0.4), (0.01,0.025), (0.0025,0.1) and (0.04,0.1). The first three settings demonstrate the impact of contaminations by measurement errors while the last two illustrate the effect of the variation of the random slope.

Simulation results based on 100 Monte Carlo samples are reported in Table 1. Results under the first three settings suggest that β can be estimated unbiasedly and measurement errors affect the precision, but not the magnitude of the biases. As expected, higher level of noise contamination leads to less precise estimate of β and higher chance of divergence in the algorithm. In all three settings, the variance of measurement errors can be estimated with high accuracy and precision. Comparing with the results under the first, fourth and fifth setting from Table 1, we observe that the variance of the random slopes has little effect on the performance of β̂.

Table 1.

Simulation results under five settings with sample size 200 and varying values of σ22 and σ2. The actual targets of the longitudinal estimates are conditional quantities marked as μ1 and μ2 etc. and are listed next to the true longitudinal value in the first column, The mean and SD of the estimates based on 100 Monte Carlo samples are reported in the second and third column.

Case Parameter Average of NPMMLE SE(MC) MSE Convergence rate
1 β(1) 0.9923 0.1633 0.0267 98%
σ2(0.1) 0.0998 0.0021 5e-6
μ1/μ1(2/1.73) 1.7461 0.0478 0.0668
μ2/μ2(0.50/0.50) 0.4545 0.0985 0.0118
σ11/σ11(0.50/0.45) 0.4634 0.0527 0.0041
σ12/σ12(−0.001/−0.001) −0.0424 0.0453 0.0038
σ22/σ22(0.01/0.01) 0.0738 0.0409 0.0057

2 β(1) 0.9185 0.1765 0.0378 72%
σ2(0.4) 0.4003 0.0086 7e-5
μ1/μ1(2/1.74) 1.7455 0.0531 0.0676
μ2/μ2(0.50/0.50) 0.3801 0.1640 0.413
σ11/σ11(0.5/0.45) 0.4730 0.0505 0.0033
σ12/σ12(−0.001/−0.001) −0.1122 0.0917 0.0208
σ22/σ22(0.01/0.01) 0.1856 0.1156 0.0442

3 β(1) 1.0380 0.1548 0.0254 96%
σ2(0.025) 0.0250 4.8283e-4 ≃0
μ1/μ1(2/1.73) 1.7443 0.0468 0.0676
μ2/μ2(0.50/0.50) 0.4900 0.0643 0.0042
σ11/σ11(0.50/0.45) 0.4520 0.0534 0.0052
σ12/σ12(−0.0004/−0.0004) −0.0193 0.0338 0.0015
σ22/σ22(0.01/0.01) 0.0571 0.0219 0.0027

4 β(1) 0.9684 0.1504 0.0236 98%
σ2(0.1) 0.0997 0.0023 5e-6
μ1/μ1(2/1.74) 1.7464 0.0460 0.0664
μ2/μ2(0.50/0.50) 0.4491 0.0948 0.0116
σ11/σ11(0.5/0.45) 0.4497 0.0423 0.0043
σ12/σ12(−0.001/−0.0007) −0.0439 0.0518 0.0045
σ22/σ22(0.0025/0.0025) 0.0797 0.0479 0.0072

5 β(1) 0.9934 0.1567 0.0246 95%
σ2(0.1) 0.0996 0.0020 4e-6
μ1/μ1(2/1.74) 1.7522 0.0464 0.0636
μ2/μ2(0.50/0.50) 0.4498 0.1168 0.0162
σ11/σ11(0.50/0.45) 0.4559 0.0442 0.0039
σ12/σ12(−0.001/−0.002) −0.0433 0.0692 0.0066
σ22/σ22(0.04/0.04) 0.1186 0.0642 0.0159

The results for the longitudinal part echo the above discussion of the non-identifiability of the parameter α, as the means of the random intercept and random slopes (shown in the second column of Table 1) are consistently underestimated. The actual targets of the estimates are the conditional quantities marked as μ1 and μ2 etc. in the first column of Table 1. The sizes of the biases vary with the level of truncation probability and size of measurement errors and can be very small for the mean of the random slope, e.g. in setting 3, where the measurement error is small. Thus, this bias problem in estimating the longitudinal component may elude researchers, while it is a cause of substantial concern in settings with large error variances.

To make statistical inference about the parameters of interest, it is necessary to get an estimate of the standard error of the NPMMLE, especially for β. We tried the approach in Murphy and van der Vaart (2000) and Louis (1982), but neither works, so we propose to use a bootstrap method (Tseng, Hsieh and Wang (2005) for estimating the standard error of the NPMMLE and present the results in Table 2. Only the results for estimating the standard errors of β̂ and σ̂2 are shown, since they are estimable. Table 2 supports the use of the bootstrap procedure, as the estimated standard error from the bootstrap method is close to the standard deviation from the 100 Monte Carlo samples, even when the degree of error contamination is large or the random slopes vary widely.

Table 2.

Performance of estimated variance, SE(BT), of β̂ and σ̂2 through bootstrap with 50 resamples.

Case Parameter SE(MC) SE(BT)
1 β(1) 0.1633 0.1692
σ2(0.1) 0.0021 0.0020

2 β(1) 0.1765 0.1813
σ2(0.4) 0.0086 0.0091

3 β(1) 0.1548 0.1523
σ2(0.025) 4.8283e-4 5e-4

4 β(1) 0.1504 0.1539
σ2(0.1) 0.0023 0.0020

5 β(1) 0.1567 0.1531
σ2(0.1) 0.0020 0.0021

5. Data example: multi-center HIV study

In this section, we conduct an analysis on the data from a multi-center HIV study in Italy. Details of the study design and a previous analysis can be found in Rezza et al. (1989) and The-Italian-Seroconversion-Study (1992). There were 448 HIV-positive patients in the data. The primary event of interest is the incubation period of acquired immunodeficiency syndrome (AIDS), i.e. time (in years) from detection of HIV-infection until the onset of AIDS. There were 140 patients who received the HAART treatment at various times, resulting in a longitudinal treatment indicator that is fully observable, so no modeling of this process is necessary. However, there is a second longitudinal covariate, the CD4 counts, that are observed only intermittently at follow-up visits, motivating the need to model the survival and longitudinal covariates jointly. The main biomedical interest lies in determining the effect of the HAART treatment on reducing the risk of developing AIDS, and the association between the incubation period of AIDS and CD4 T-cell counts in HIV-infected subjects.

For each of the 448 subjects in the study, the longitudinal measurements of CD4 T-cell counts were recorded intermittently along with the time to AIDS or dropout from the study. The total number of longitudinal measurements is 4442 and the average number of longitudinal measurements is 9.92 per patient.

One feature of this data is that the incubation period is subject to left-truncation and right-censoring, since patients were recruited to the study at various times after the study began, and only patients who have not developed AIDS at the time of recruitment are included in the study. Moreover, only 147 out of the 448 patients (about 33%) developed AIDS by the end of the study, so the right censoring rate is quite high for this data.

To model the longitudinal CD4 counts, we adopt a linear mixed effects model on log(CD4 + 1) with changing intercepts and slopes at the time of HAART treatment. Thus,

Wi(sij)=Xi(sij)+εij=Ai0+Ai1sij+Ai2I(sij>Vi)+Ai3sijI(sij>Vi)+εij,

where εij is from a normal distribution N(0, σ2), Ai=(Ai0,Ai1,Ai2,Ai3) is from a 4-dimensional multivariate normal distribution with a 4 × 1 mean vector μ and a 4 × 4 covariance matrix Σ, and Vi represents relative age since HIV-positive of receiving HAART. For those who have never received HAART, Vi is defined to be infinity. For the time-to-AIDS, we assume a Cox model with Xi(t), CD4 counts, as an time-dependent covariate along with another time-dependent treatment indicator, I(t > Vi), which is completely observed. The resulting model is:

λ(t|Ai)=λ0(t) exp (β1Xi(t)+β2I(t>Vi)).

From the EM algorithm with Monte Carlo approximation, the slope, β̂1, for the underlying log(CD4 + 1) process is estimated to be −0.5762 (p-value < 0.001), while the slope, β̂2, for the longitudinal treatment indicator is estimated to be −1.2189 (p-value < 0.001). As expected, CD4 counts are negatively associated with the risk of AIDS. One unit of decline on log(CD4+1) is associated with an increasing risk of AIDS by 78%. In addition to its effect on CD4 counts, HAART has an additional effect on reducing the risk of AIDS. It significantly reduces the risk of developing AIDS by 70% after controlling for the CD4 counts. Through the analysis, we confirm that HAART effectively reduces the risk of developing AIDS both through a positive association with patients’ CD4 counts and the risk to develop AIDS.

6. Conclusions and discussion

We have shown, both theoretically and empirically, that joint modeling the time-to-event and longitudinal covariates is an effective modeling approach when the time-to-event is subject to both left truncation and right censoring. However, the extension from right-censorship to LTRC is not trivial. By modifying the joint likelihood, we have shown that NPMMLE leads to consistent and asymptotically efficient estimation of the survival component and measurement error variance under the setting of a semiparametric Cox model. We have also demonstrated that the corresponding EM algorithm to locate the NPMMLE has good empirical performance and asymptotic properties under the assumption of normal random effects. It is not only computational effective but also robust against departures from the normality assumption.

However, one caveat is the estimability of the longitudinal component. Although we can recover the conditional distribution of the longitudinal parameter, α, given YT, the parameter α itself can not be estimated properly though the modified likelihood due to the biased sampling plan. Additional strong and possibly unverifiable assumptions might be needed in order to recover the parameter α of the random effects. What we have accomplished in this paper is to successfully remove the bias for the estimation of the survival components attributed to the discrete measurement schedule and measurement errors of the longitudinal covariates, thus permitting asymptotically valid and efficient inference for the survival related parameters, which are crucial for the evaluation of therapies.

A final issue of interest is the prediction of survival probabilities. In the presence of time-dependent covariate, the concept of the hazard rate function itself is based on a conditional probability formulated as

λ(t|X¯(t))Δt=P(T[t,t+Δt]|X¯(t),Tt).

Here both internal and external covariates can be included, although the internal covariate up to time t implicitly contains the information that this subject survives up to time t. This does not cause any problem in the definition of the hazard function since it is conditional on Tt. However if we consider the (subject-specified) survival probability, P(Tt| (t)), then its value is actually 1 (Sec. 6.3.2, Kalbfleisch and Prentice 2002), since the process X(t) itself contains the information that Tt. This undesirable feature can be avoided under the framework of joint modeling, as the latent longitudinal covariate X(t) is completely determined by the random effect A and the time point t through the submodel in the longitudinal part. Consequently, the survival probability should be defined as P(Tt|A). In fact, it is meaningful to predict future survival probabilities, P(Ts|A), for any s > t. This is one of the benefits of joint modeling in the presence of (internal) longitudinal covariates as it is possible to make predictions, albeit with some errors, whereas this is not possible under the partial likelihood approach. In Summary, the joint modeling approach affords a more meaningful definition of the present and future survival probability through P(Ts|A), where A is the random term linking the two submodels together. Evaluating the prediction errors and associated statistical inference could be an interesting future research project.

Acknowledgments

The authors thank the Associate Editor and reviewers for insightful comments. This work is partially supported by an NIH grant 1R01AG025218-01.

Appendix

A.1. Likelihood and the score equations

By imposing a normality assumption N(μ, Σ) on the random effects Ai, the full likelihood in (2.3) from the ith subject becomes

LiOfT(ti)σniλ0(zi)δiexp{δiβg(zi)aij=1ni[wijg(sij)ai]2/(2σ2)}Q1(zi,ai)dai0Q1(t,ai)fT(t)daidt,

where Q1(u,a)=exp{0uexp[βg(t)a]dΛ0(t)(aμ)T1(aμ)/2}. Following similar arguments as in Wang (1987) and combining with Vardi (1985), we can prove that the NPMLE’s of all finite-dimensional parameters are the same as those from the conditional likelihood of (zi, δi, w⃗i) given ( Yi>ti). Moreover, by a proof similar to that of the classical Cox model for right censored data, the NPMLE from the conditional likelihood is attained by discrete baseline hazard functions that assign positive masses only at uncensored survival times, ( y10,,ynu0).

Let oi = (ti, zi, δi, w⃗i) denote the observed data for the ith subject. The first derivative of the log full likelihood leads to the following score functions:

sσ2o=i=1n{j=1niE[wijAig(sij)|oi]2niσ2}/σ3,
sμo=1i=1nE{(Aiμ)[E(Ai|YiTi)μ]|oi}=1i=1n[E(Ai|oi)E(Ai|YiTi)],
so=121i=1n{E[(Aiμ)(Aiμ)T|oi]E[(Aiμ)(Aiμ)T|YiTi]}1,
sΛko=1Λki:ti<yk0ziE{exp[βg(yk0)A]|oi}Q2(yk0),
sβo=i=1nδig(yi)E(Ai|oi)i=1nj:ti<yj0ziΛjE{g(yj0)Ai exp[βg(yj0)Ai]|oi}Q3,

where

Q2(y)=i:ytiE{exp[βg(y)Ai]|oi}nE{exp[βg(y)Ai]I(yTi)|YiTi},
Q3=i=1nj:yj0tiΛjE{g(yj0)Ai exp[βg(yj0)Ai]|oi}nE{j:yj0tiΛjg(yj0)Ai exp[βg(yj0)Ai]|YiTi}.

The score equations, sμo and so, corresponding to the longitudinal data reveal that the estimable terms are the conditional mean and covariance matrix of the random effects given that Y* ≥ T rather than μ and Σ.

The score functions for λk, k = 1, …, nu, and β have more complicated forms than those from a partial likelihood under standard Cox model subject to LTRC. The complication is due to the additional terms Q2 and Q3, which require estimation of the expectations of nonlinear functions of the observed data along with the the parameters of interest. If we drop these two terms from sΛko and sβo, the modified score functions, sΛk=sΛko+Q2(yk0) and sβ=sβo+Q3, are exactly the score functions from the modified likelihood. The next Lemma validates the use of the modified likelihood (2.5).

Lemma 1

  1. Eθ0(sΛk)=Eθ0(sΛko) and Eθ0(sβ)=Eθ0(sβo). This provides Fisher consistency of the estimators (2.5).

  2. Under the regularity conditions for law of large numbers and Slutsky theorem, n1(sΛksΛko)=op(1) and n1(sβsβo)=op(1).

Proof

The proof follows from simple derivation and applications of the Law of Large Number along with Slutsky theorem.

This lemma demonstrates the asymptotic equivalence of the score functions for the survival-related parameters from (2.3) and (2.5). The latter is computationally simpler to maximize and thus more attractive than the full likelihood.

A.2. Proof of the consistency of the NPMMLE

The proof of consistency includes four major steps and is elaborated below.

STEP 1. Existence of the NPMMLE of (γ, Λ)

We will begin the proof that the candidates for the maximizer, Λnu, have a finite and bounded jump at each observed survival time. For simplicity, we use a vector form λ⃗nu = (λ1, …, λnu) to express the jump sizes of Λnu at ordered survival times. The boundedness of the jump sizes can be demonstrated by proving the existence of an upper bound B ∈ ℝ through apagoge. Suppose that for any arbitrary B ∈ ℝ, there exists λ⃗nu,B = (λ1,B, …, λnu,B) ∈ ℝnu\[0, B]nu and γBSγ such that Lm(γB, λ⃗nu,B) > Lm(γ, λ⃗nu) for all (γ, λ⃗nu) ∈ Sγ × [0, B]nu. The first part in Lm(γB, λ⃗nu,B) contributed by the ith subject is bounded above by

(Λnu,B{zi}M)δi×exp{mj:ti<yj0ziλj,B},

where m, M is defined in assumption C.2. Since λ⃗nu,B ∈ ℝnu\[0, B]nu, at least one jump size, say λi0,B, is greater than B. It induces that j:ti0<yj0zi0λj,B>B, and then implies that Lm(γB, λ⃗nu,B) → 0 as B → ∞. Thus Lm(γ, λnu) = 0, for all (γ, λnu) ∈ Sγ ∈ ℝnu, which is a contradiction. It demonstrates the boundedness of the jump sizes of Λnu. Along with the compactness of Sγ provided by assumption C.1., we accomplished the existence of the NPMMLE of (γ, Λ).

STEP 2. Almost surely boundedness of Λ̂(τ2) as n → ∞

For any fixed sample size n, the estimated cumulative hazard function evaluated at the endpoint of the study can be expressed as

Λ^(τ2)=k=1nδkI(zkτ2)i=1nEθ^[exp{β^g(zk)Ai}|oi]I(ti<zkzi)k=1nδkI(zkτ2)mi=1nI(ti<zkzi)k=1nδkI(zkτ2)mi=1nI(tiτi)I(τ2zi), (A.1)

where m is the lower bound of exp{βg(Y)A*}|Y* ≥ T*, which exists under assumption C2. By the Law of Large Number and the continuous mapping theorem, we have the following two limits as n → ∞:

1nk=1nδkI(zkτ2)E(ΔI(Yτ2))<1,and11ni=1nI(tiτ1 and τ2zi)1P(Tτ1 and Yτ2)<, (A.2)

where the finiteness of the second limit is following assumption C3. Therefore, there exists an upper bound of Λ̂(τ2) even when n goes to infinity. Moreover, since the terms inside the summation in (A.1) are all strictly positive, Λ̂(τ2) is always greater than 0. Thus Λ̂(τ2) has been shown to be bounded almost surely as n → ∞.

STEP 3. Uniform convergence of ( σ^n2,β^n,Λ^n) to ( σ02,β0,Λ0)

We have shown in step 2 that Λ̂(τ2) is finite, combining with the fact that Λ̂ is a right-continuous and nondecreasing step function along with the Helly selection theorem, there exists a subsequence of Λ̂ converging pointwisely to a right continuous and monotone function Λ* with probability 1. Moreover, by the Balzonno-Weierstrass theorem, there is a sub-subsequence of γ̂ which converges to some γ*. Therefore, there exists a sub-subsequence of θ̂n, denoted by θ̂η(n), that converges to θ* = (γ*, Λ*). We next show that θ=(α0,σ02,β0,Λ0), where α0 is the limit of α̂. Here a new term, defined as

Λ¯n(t)=1nk=1nδkI(zkt)1ni=1nEθ0[exp{β0g(zk)Ai}|oi]I(ti<zkzi),

is introduced to serve as a bridge between Λ̂n and Λ0.

We first show the convergence of Λ̄n to Λ0 as follows. We will use a property that the class of all functions from a closed set to ℝ, which are uniformly bounded and of bounded variation, is Glivenko-Cantelli. Consider the denominator of Λ̄n. The assumptions imply that functions of the form uEθ0[exp{β0g(u)A*}I(T < uY)|o], where o denotes the observed data of a subject, are uniformly bounded and of bounded variation, so the class of these functions is Glivenko-Cantelli. Therefore,

1ni=1nEθ0[exp{β0g(u)Ai}|oi]I(ti<uzi)Eθ0[exp{β0g(u)A]I(T<uY)|YT} (A.3)

uniformly on [τ1, τ2]. Along with the assumption C4, the uniform convergence of the inverse of the right-hand side to the inverse of the left-hand side in (A.3) holds. Moreover, uniform boundedness and bounded variation of functions t → ΔI(Yt) imply the Glivenko-Cantelli property of the class consisting of them. Thus, we also have

1ni=1nΔiI(Yi<t)Eθ0[ΔI(Y<t)] (A.4)

uniformly on [τ1, τ2]. Since Λ0(t)=E[ΔI(Yt)E{exp[β0g(Y)A]I(TuY)|YT}|u=Y], combining the convergence of the inverse of both sides in (A.3) and (A.4), we obtain Λ̄n converges uniformly to Λ0 on [τ1, τ2]. By considering the uniform convergence of the ratio of Λ̂{u}/Λ̄{u} to dΛ*(u)/dΛ0(u) for u ∈ [τ1, τ2], as demonstrated on pages 2146–2147 in Zeng and Cai (2005), the uniform convergence of Λ̂ to Λ* is established. The remaining task is to prove the equivalence of θ* = (γ*, Λ*) and θ0=(β0,σ0,α,Λ0). This can be done by considering the empirical mean of the distance between lim(θ^n) and lim(β0,σ0,α,Λ¯n) and demonstrating that Eθ0[lm(θ)/lm(θ0)]=0 almost surely as shown on page 910 in Dupuy, Grama and Mesbah (2006). Thus (σ̂2, β̂, Λ̂0) converges uniformly to ( σ02,β0,Λ0).

A.3. Proof of asymptotic normality of the NPMMLE

We will apply Theorem 3.3.1 in van der Vaart and Wellner (1996) to prove the asymptotic normality of the NPMMLE (γ̂, Λ̂). The proof consists of four steps to verify each of the four conditions in their theorem.

STEP 1. Fréchet differentiability of the score functions

For notation simplification, the parameter σ2 will be combined with α into γ1 = (σ2, α) so that the single parameter γ1 denotes the parameter of the measurement error ε and the latent random variable A*. Thus, the new parameter vector is θ = (γ1, β, Λ).

Consider a one-dimensional submodel along the direction (h1, h2, h3) of the form

θt=(γ1+th1,β+th2,Λt(h3)),

where

Λt(h3)(·)=0·(1+th3(u))dΛ(u),

h1 ∈ ℝd, h2 ∈ ℝ, and h3 is a bounded-variation function on [0, τ2]. Let H = {h = (h1, h2, h3)} and Hp = {hH :‖h1‖, |h2|, ‖h3υp}. The notation ‖·‖υ denotes the absolute value evaluated at 0 plus the total variation of the argument. The imputed log-likelihood contributed by the ith subject evaluated at θ, given the current value of parameter denoted as θ̃, is denoted by lθ̃,i (θ). The corresponding score function of the local parameter t is

tlθ,i(θt)=h2Eθ[δig(zi)Aitizig(u)Ai exp{(β+th2)g(u)Ai}(1+th3(u))dΛ(u)|oi]Eθ[tizih3(u) exp{(β+th2)g(u)Ai}dΛ(u)|oi]+h1Eθ[tfε,A(ε,Ai|γ1+th1)|oi]+δih3(zi)1+th3(zi).

Thus the imputed score function of t contributed by the n subjects evaluated at t = 0 is

Sn,θ(θ)(h)=1ni=1ntlθ,i(θt)|t=0=h1TSn,θ,1(θ)+h2Sn,θ,2(θ)+Sn,θ,3(θ)(h3), (A.5)

where

Sn,θ,1(θ)=1ni=1nEθ[γ1log fε,A(εi,Ai|γ1)|oi],
Sn,θ,2(θ)=1ni=1nEθ[δig(zi)Aitizig(u)Ai exp{βg(u)Ai}dΛu|oi],
Sn,θ,3(θ)(h3)=1ni=1n{δih3(zi)Eθ[tizih3(u) exp{βg(u)Ai}dΛ(u)|oi]}.

By defining θ(h)=(γ1,β,Λ)(h1,h2,h3)=h1Tγ1+h2β+0τ2h3(u)dΛ(u), where hHp, the parameter θ can be regarded as a functional on Hp, the parameter space Θ = {θ} is a subspace of L(Hp) and the score in (A.5) is a random map from Θ to a Banach space which contains functions (operations) of h.

Besides the above imputed score, we also need the mean imputed score function of t under the true value θ0 and denote it as

Sθ(θ)(h)=Eθ0[tlθ(θt)|t=0]=h1TSnθ,1(θ)+h2Sθ,2(θ)+Sθ,3(θ)(h3),

where

Sθ,1(θ)=Eθ0{Eθ[γ1log fε,A(εi,Ai|γ1)|oi]},
Sθ,2(θ)=Eθ0{Eθ[Δig(Yi)AiTiYig(u)Ai exp{βg(u)Ai}dΛ(u)|oi]},
Sθ,3(θ)(h3)=Eθ0{Δih3(Yi)Eθ[TiYih3(u) exp{βg(u)Ai}dΛ(u)|oi]}.

To prove the Fréchet differentiability of the map, θSθ0(θ) at θ0, where θ0=(γ10,β0,Λ0) with γ10=(σ02,α), we need to calculate the corresponding derivative. First, we introduce a notation θSθ(θ0)=tSθ(θ0+tθ)|t=0, where θ0+tθ=(α0+tα,β0+tβ,Λ0(·)+tΛ(·)). Then,

θSθ(θ0)(h)=tSθ(θ0+tθ)(h)|t=0=tEθ0{h1TEθ[(γ10+tγ1)log fε,A(εi,Ai|γ10+tγ1)|oi]+h2Eθ[Δig(Yi)AiTiYig(u)Ai exp{β0+tβg(u)Ai}(dΛ0(u)+tdΛ(u))|oi]+Δih3(Yi)Eθ[TiYih3(u) exp{(β0+tβ)g(u)Ai}(dΛ0(u)+tdΛ(u))|oi]}|t=0. (A.6)

Using the chain rule, equation (A.6) can be simplified as

γ1Tσθ,1(h)βσθ,2(h)0τ2σθ,3(h)(u)dΛ(u),

where

σθ,1(h)=Eθ0{h1TEθ[2γ1γ1Tlog fε,A(εi,Ai|γ10)|oi]}, (A.7)
σθ,2(h)=Eθ0{Eθ[0τ2[h2g(u)Ai+h3(u)]g(u)Ai exp{β0g(u)Ai}I(Ti<uYi)dΛ0(u)|oi]}, (A.8)
σθ,3(h)(u)=Eθ0{Eθ[[h2g(u)Ai+h3(u)] exp{β0g(u)Ai}I(Ti<uYi)|oi]}. (A.9)

Evaluating (A.6) at the true value θ0 leads to

θSθ0(θ0)(h)=γ1Tσθ0,1(h)βσθ0,2(h)0τ2σθ0,3(h)(u)dΛ(u), (A.10)

where each of the σ-function has similar forms as the corresponding function listed in (A.7), (A.8), or (A.9) with the double expectation Eθ0{Eθ{[·|oi]} replaced by Eθ0{·}. Now apply the Taylor expansion of exp{(β0+tβ)g(u)Ai} at t = 0, to get

Sθ0(θ0+tθ)Sθ0(θ0)θSθ0(θ0)=o(t),

where the small-o function does not depend on θ. Therefore,

Sθ0(θ0+tθ)Sθ0(θ0)θSθ0(θ0)pt0,  as t0

uniformly in θ = (γ1, β, Λ). Thus the Fréchet derivative of the mapping θSθ0(θ) evaluated at θ0 takes the form (A.10). We will use the notation S˙θ0(θ0)(θ) to denote it.

STEP 2. Continuous invertibility of S˙θ0(θ0)(θ)

The continuous invertibility of the Fréchet derivative can be established by showing that there exists some number c > 0 such that

infθlinΘS˙θ0(θ0)l(H)θl(H)>c. (A.11)

Since S˙θ0(θ0)(θ) can be expressed as a linear combination of the three σ-operators according to (A.6), it is necessary to check the continuous invertibility of those σ-operators. The proof is similar to the arguments in the Appendix of Zeng and Cai (2005). Through the continuous invertibility of σθ0, the lower bound c can be found as q3p, where q satisfies σθ01(Hq)Hp. Details to find the lower bound are analogous to the approach in Dupuy, Grama and Mesbah (2006) (page 915). Thus the derivative S˙θ0(θ0) is continuously invertible.

STEP 3. Convergence in distribution to a tight element

In this step, the convergence of n(Sn,θ^nSθ0)(θ0) in distribution will be demonstrated. Since Sθ0(θ0) is the mean of the score function evaluated at the true value of θ, it is equal to zero. Then

[Sn,θ^nSθ0](θ0)(h)=1ni=1n[Di,1(h)+Di,2(h)++δih3(yi)+Di,3(h)],

where

Di,1(h)=h1TEθ^n[αlog fε,A(εi,Ai|γ10|oi)],
Di,2(h)=h2Eθ^n[δig(zi)Aitizig(u)Ai exp{β0g(u)Ai}dΛ0u|oi],
Di,3(h)=Eθ^n[tizih3(u) exp{β0g(u)Ai}dΛ0(u)|oi]}.

The class {1n(Di,1+Di,2)(h):h1+|h2|p} is bounded Donsker, since it is a finite dimensional class of measurable score functions. Moreover, since any class of real-valued functions on [0, τ2] that are uniformly bounded and bounded in variation is Donsker, the class {δh3(y) : h3BVp} is Donsker. The Donsker property of the class {1n(Di,3(h):h3BVp} also follows from this fact. We have thus shown that the class {[Sn,θ^nSθ0](θ0)(h):h1+|h2|p,h3BVp} is Donsker, since the sum of bounded Donsker classes is also Donsker. This implies

n(Sn,θ^nSθ0)(θ0)DZ,

a tight Gaussian process in l(Hp).

STEP 4. Verification of conditions 1 and 4

Condition 4 holds by the consistency of the estimator θ̂n. Condition 1 can be verified by considering the Donsker property of the class {S·,θ(θ)(h)S·,θ0(θ0)(h):θθ0p<ν,hHp} for some ν > 0, where S·,θ(θ)(h) is the general form of Si,θ(θ)(h)=tlθ,i(θt)|t=0. We omit the details since they are similar to those for the case of right-censored data, considered in Zeng and Cai (2005).

We have verified the four conditions needed for the asymptotic distribution of the NPMMLE θ̂n, and therefore

n(θ^nθ0)DS˙θ0(θ0)Z,

as n → ∞.

using the form of the Fréchet derivative in (A.6), one finds that there exists a linear operator σ=(σθ0,1,σθ0,2,σθ0,3) that maps Hp to ℝd+1 × BVp, such that

S˙θ0(θ0)(θ1θ2)(h)=(γ11γ12)Tσθ0,1(h)(β1β2)σθ0,2(h)0τ2σθ0,3(h)(u)d(Λ1Λ2)(u).

The continuous invertibility of the σ operator has been shown already, so its inverse operator, denoted by σ−1, exists. Since

nS˙θ0(θ0)(γ^1γ10,β^β0,Λ^Λ0)(h)=n{Sn,θ0(h)Sθ0(θ0)(h)}+op(1),

by applying the inverse operator σ−1 on both sides we obtain that

n{(γ^1γ10)Th1(β^β0)h20τ2h3(u)d(Λ^Λ0)(u)}=n{Sn,θ0(h)Sθ0(θ0)(h)}+op(1), (A.12)

where = (1, 2, 3) = σ−1(h). If h1 and h3 in (A.12) are chosen to be 0, then this reduces to

n{(β^β0)h2}=n{Sn,θ0(h)Sθ0(θ0)(h)}+op(1),

where the latter term is in the form of linear combinations of score functions for the parameters. Since score functions derived from the modified likelihood is asymptotically equivalent to those from the full likelihood by Lemma 1, the influence function is the same as the efficient influence function for β0h2 by its uniqueness in the linear span of the scores. Thus the estimator β̂ is efficient for β0.

References

  1. Andersen PK, Borgan O, Gill RD, Keiding N. Statistical models based on counting processes. Springer - Verlag; 1993. [Google Scholar]
  2. Cox DR. Regression models and life tables (with discussion) Journal of the Royal Statistical Society Series B - Statistical Methodology. 1972;34:187–220. [Google Scholar]
  3. Dafni UG, Tsiatis AA. Evaluating surrogate markers of clinical outcome measured with error. Biometrics. 1998;54:1445–62. [PubMed] [Google Scholar]
  4. DeGruttola V, Tu X. Modeling progression of CD4-lymphocyte count and its relationship to survival time. Biometrics. 1994;50:1003–1014. [PubMed] [Google Scholar]
  5. Dupuy JF, Grama I, Mesbah M. Asymptotic theory for the cox model with missing time-depedent covariate. Annals of Statistics. 2006;34:903–924. [Google Scholar]
  6. Henderson R, Diggle P, Dobson A. Joint modelling of longitudinal measurements and event time data. Biostatistics. 2000;1:465–480. doi: 10.1093/biostatistics/1.4.465. [DOI] [PubMed] [Google Scholar]
  7. Hsieh F, Tseng YK, Wang JL. Joint Modelling of Survival and Longitudinal Data Likelihood Approach Revisited. Biometrics. 2006;62:1037–1043. doi: 10.1111/j.1541-0420.2006.00570.x. [DOI] [PubMed] [Google Scholar]
  8. Klein JP, Moeschberger ML. Survival analysis: techniques for censored and truncated data. Springer; 2003. [Google Scholar]
  9. Laird NM, Ware JH. Random-effects Models for longitudinal data. Biometrics. 1982;38:963–974. [PubMed] [Google Scholar]
  10. Louis TA. Finding the observed information matrix when using the em algorithm. Journal of the Royal Statistical Society Series B - Statistical Methodology. 1982;44:226–233. [Google Scholar]
  11. Lynden-Bell D. A method of allowing for known observational selection in small samples applied to 3CR quasars. Monthly Notices of the Royal Astronomy Society. 1971;155:95–118. [Google Scholar]
  12. Murphy SA, van der Vaart AW. On profile likelihood. Journal of American Statistical Association. 2000;95:449–485. [Google Scholar]
  13. Rezza G, Lazzarin A, Angarano G, Sinicco A, Pristerá R, Tirelli U, Salassa B, Ricchi E, Aiuti F, Menniti-lppolito F. Tje natural history of HIV infection in intravenous drug users: risk of disease progression in a cohort of serconverters. AIDS. 1989;3:87–90. doi: 10.1097/00002030-198902000-00006. [DOI] [PubMed] [Google Scholar]
  14. Song X, Davidian M, Tsiatis AA. A semiparametric likelihood approach to joint modeling of longitudinal and time-to-event data. Biometrics. 2002;58:742–753. doi: 10.1111/j.0006-341x.2002.00742.x. [DOI] [PubMed] [Google Scholar]
  15. The-Italian-Seroconversion-Study. Disease progression and early predictors of AIDS in HIV-seroconverted injecting drug users. AIDS. 1992;6:421–426. [PubMed] [Google Scholar]
  16. Tseng YK, Hsieh F, Wang JL. Joint modelling of accelerated failure time and longitudinal data. Biometrika. 2005;92:587–603. [Google Scholar]
  17. Tsiatis AA, Davidian M. Joint modeling of longitudinal and time-to-event data: An overview. Statistica Sinica. 2004;14:809–834. [Google Scholar]
  18. Tsiatis AA, DeGruttola V, Wulfsohn M. Modeling the relationship of survival to longitudinal data measured with error: Applications to survival and CD4 counts in patients with AIDS. Journal of American Statistical Association. 1995;90:23–37. [Google Scholar]
  19. van der Vaart AW, Wellner JA. Weak convergence and empirical processes. Springer; New York: 1996. [Google Scholar]
  20. Vardi Y. Empirical distributions in selection bias models. Annals of Statistics. 1985;13:178–203. [Google Scholar]
  21. Wang MC. Product limit estimates: a generalized maximum likelihood study. Communications in Statistics - Theory and Methods. 1987;16:3117–3132. [Google Scholar]
  22. Wang CY. Corrected score estimator for joint modeling of longitudinal and failure time data. Statistica Sinica. 2006;16:235–253. [Google Scholar]
  23. Woodroofe M. Estimating a distribution function with truncated data. Annals of Statistics. 1985;13:163–177. [Google Scholar]
  24. Wulfsohn MS, Tsiatis AA. A joint model for survival and longitudinal data measured with error. Biometrics. 1997;53:330–339. [PubMed] [Google Scholar]
  25. Zeng D, Cai J. Asymptotic Results for maximum likelihood estimators in joint analysis of repeated measurements and survival time. Annals of Statistics. 2005;33:2132–2163. [Google Scholar]

RESOURCES