MODELING LEFT-TRUNCATED AND RIGHT-CENSORED SURVIVAL DATA WITH LONGITUDINAL COVARIATES

Yu-Ru Su; Jane-Ling Wang

doi:10.1214/12-AOS996

. Author manuscript; available in PMC: 2018 Feb 22.

Published in final edited form as: Ann Stat. 2012 Sep 5;40(3):1465–1488. doi: 10.1214/12-AOS996

MODELING LEFT-TRUNCATED AND RIGHT-CENSORED SURVIVAL DATA WITH LONGITUDINAL COVARIATES

Yu-Ru Su ¹, Jane-Ling Wang ²

PMCID: PMC5822752 NIHMSID: NIHMS937870 PMID: 29479122

Abstract

There is a surge in medical follow-up studies that include longitudinal covariates in the modeling of survival data. So far, the focus has been largely on right censored survival data. We consider survival data that are subject to both left truncation and right censoring. Left truncation is well known to produce biased sample. The sampling bias issue has been resolved in the literature for the case which involves baseline or time-varying covariates that are observable. The problem remains open however for the important case where longitudinal covariates are present in survival models. A joint likelihood approach has been shown in the literature to provide an effective way to overcome those difficulties for right censored data, but this approach faces substantial additional challenges in the presence of left truncation. Here we thus propose an alternative likelihood to overcome these difficulties and show that the regression coefficient in the survival component can be estimated unbiasedly and efficiently. Issues about the bias for the longitudinal component are discussed. The new approach is illustrated numerically through simulations and data from a multi-center AIDS cohort study.

Keywords and phrases: Likelihood approach, Semiparametric efficiency, Biased sample, EM algorithm, Monte Carlo integration

1. Introduction

Since the seminal paper by Wulfsohn and Tsiatis (1997), longitudinal covariates have played an increasingly important role in the modeling of survival data. One major challenge to incorporate longitudinal covariates is that simple approaches, such as the partial likelihood method for the Cox proportional hazards model (Cox, 1972), often require knowledge of the entire longitudinal process. This is often not feasible in reality for follow-up checks at discrete and intermittent time points. A common practice is to impute the values of the missing longitudinal processes and then apply the partial likelihood approach to the imputed data. This is called a two-stage approach, where the longitudinal process is imputed at the first stage before the partial likelihood approach is employed to estimate parameters in the survival model at the second stage. The most common imputation method is to use the last and most recent value of the patient to impute a missing value, the so-called the last-value-carry-forward method, which has been adopted in standard software such as SAS and R. Additional two-stage procedures were developed by Tsiatis, DeGruttola and Wulfsohn (1995) and Dafni and Tsiatis (1998).

It is easy to foresee serious biases with such an imputation method if the follow-up schedule is infrequent over time and also when the longitudinal covariates are contaminated by noises or measurement errors. Both scenarios provide strong motivation to find alternative approaches. The approach developed by Wulfsohn and Tsiatis (1997) to model the survival and longitudinal data simultaneously through their joint likelihood is attractive on two counts: (i) the resulting parametric estimators are semiparametrically efficient when the baseline hazard function is unknown, and (ii) the joint likelihood procedure is often insensitive to the normality assumption on the longitudinal data, if there is a reasonable number of repeated measurements available for the longitudinal processes, see Zeng and Cai (2005) and Dupuy, Grama and Mesbah (2006) for (i) and Song, Davidian and Tsiatis (2002), Tsiatis and Davidian (2004), and Hsieh, Tseng and Wang (2006) for (ii).

The above joint likelihood approach not only successfully removes the biases on the survival component but also leads to efficient estimation. A historical example for the joint likelihood approach is the investigation of CD4 T-cell counts as a biomarker of time-to-death or time-to-AIDS (DeGruttola and Tu, 1994; Wulfsohn and Tsiatis, 1997; Henderson, Diggle and Dobson, 2000). In these and other works, the survival time is subject to the usual right censoring. However, left truncation is common for studies with delayed entry. Specifically, if the recruitment of patients continues after the onset time of a study, those that have already experienced the event are often excluded from the study, which then results in left truncation of the event-time. Patients who remain in the study are further subject to the usual right censoring, so the sample consists of left truncated and right censored (LTRC) survival times. It is well known that left truncation is a biased sampling plan as subjects with shorter survival times tend to be excluded from the sample. As a result, the longitudinal measurements are also sampled with bias.

An example of left truncated and right censored longitudinal study is the Italian multi-center HIV (human immunodeficiency virus) study (Rezza et al. (1989); The-Italian-Seroconversion-Study (1992)), where the primary endpoint is the time from HIV positive to AIDS onset, i.e. the incubation period of AIDS. In this study, patients who have developed AIDS at the time of recruitment were excluded from the study, resulting in left truncation of the survival data, and CD4 counts for those who were HIV positive but ADIS free were measured at each follow-up visit. As there are no procedures available to handle such data properly, we develop in this paper a semiparametric joint likelihood approach to accommodate LTRC survival data with longitudinal covariates that are measured intermittently.

Although there is a sizable literature to jointly model right-censored survival and longitudinal data (see Wulfsohn and Tsiatis (1997), Henderson, Diggle and Dobson (2000), Song, Davidian and Tsiatis (2002) and the review papers by Tsiatis and Davidian (2004)), the extension to LTRC survival data turns out nontrivial due to the left truncation feature of the data. To see this, consider first the simpler case of left truncated data with time-independent covariates or no covariates at all. Lynden-Bell (1971), Woodroofe (1985), and Wang (1987) investigated estimation of the survival function when subjects come from the same population, i.e. there are no covariates involved. Here, one only needs to adjust the risk set for truncated data to reach a suitable extension of the Kaplan-Meier estimator. For time-independent covariates Andersen et al. (1993) considered estimation under the Cox model and showed that the partial likelihood approach for right censored data still works for LTRC survival data when one conditions on the values of the covariates and truncation times.

For time-dependent covariate, Andersen et. al. (1993)’s partial likelihood approach can still be employed if the entire covariate history is available for all subjects. This is not the case for longitudinal covariates that are observed intermittently at discrete time points. Since imputation methods lead to biases of the estimates, bias corrected approaches have been employed in the literature for right censored data with longitudinal covariates. In particular, Wang (2006) proposed a method to correct the bias through the partial score equation. Such an approach is termed “corrected score” methods, which originates from studies of measurement errors. While corrected score methods typically lead to $\sqrt{n}$ -consistent estimators for the regression parameters in the Cox model, they are not efficient and easy to derive. Extension of the corrected score methods to LTRC (left-truncated and right censored) data might be feasible but have not been explored. In this paper, we adopt the full and joint likelihood approach of the survival and longitudinal data due to its aforementioned efficiency and robustness features. Unfortunately, direct maximization of the full joint likelihood is much more complicated than the cases with no left truncation. We discovered a modified likelihood that is simpler, yet retains the efficiency of the full likelihood approach, as described in Section 2.

The rest of the paper is organized as follows. In Section 2, we introduce a joint model setting for both the survival time and longitudinal processes and propose a modified likelihood approach for statistical inference. An EM algorithm to maximize the modified likelihood is derived in Section 3, along with the large sample properties of the nonparametric maximum modified likelihood estimator (NPMMLE), including consistency, asymptotic normality, and efficiency. Numerical performance of the proposed estimating procedure is validated through simulation studies in Section 4 and illustrated through the Italian HIV study in Section 5. Section 6 contains some discussion.

2. Joint modeling under LTRC

We consider the setting that the survival time Y* of a subject is subject to random left truncation by T*, so a subject is enrolled in a study only if Y* ≥ T*. Let n be the total number of subjects enrolled in the study. With such a biased sampling plan, to avoid confusion of notations, we denote the survival and truncation time of the ith enrolled subjects as (Y_i, T_i), which are sampled from the joint subpopulation of ( $Y_{i}^{*}, T_{i}^{*}$ ), where $Y_{i}^{*} \geq T_{i}^{*}$ . Upon entering the study, these n subjects are subject to the usual right censorship, so the final observed survival data for the ith subject is a triplet (T_i, Z_i, Δ_i), where Z_i = min(Y_i, C_i) is the time of the endpoint event or drop-out (censoring) time C_i, whichever occurs first, and Δ_i = I(Y_i ≤ C_i) is the censoring indicator.

In reality, drop-out or censoring only occurs when a subject is enrolled into the study. This fact implies that the right-censoring time C_i is greater than the truncation time T_i, for i = 1, …, n. Therefore, we introduce a positive random variable U_i to represent the time from entry into the study to drop-out from the study, i.e. U_i = C_i − T_i.

In addition to the survival data, baseline and longitudinal covariates are collected intermittently for the ith subject from the time the subject enters the study until the observational limit Z_i. This results in n_i repeated measurements, denoted by W⃗_i = (W_i1, W_i2, …, W_{in_i}), where the measurements are taken at time points s⃗_i = (s_i1, s_i2, …, s_{in_i}). It is important to make a note here that the observed W⃗_i are also subject to the same biased sampling plan as the survival data, so there is a background longitudinal vector, which we will denote as ${\vec{W}}_{i}^{*}$ for the ith subject enrolled in the study. Therefore, W⃗_i is sampled from the subpopulation of W⃗*, where $Y_{i}^{*} \geq T_{i}^{*}$ and values beyond Z_i are not observed. For simplicity of notation, we assume in this section that there is only one longitudinal covariates, but additional longitudinal or baseline covariates can be handled easily and the AIDS data discussed in Section 5 contain two longitudinal covariates, one observed intermittently but the complete history of the other one, the time-dependent treatment indicator, is available.

2.1. The Joint Models

Since repeated measurements from the same subjects are likely to be correlated, we introduce a latent q × 1 random vector $A_{i}^{*}$ to account for their dependency and assume a common parametric density function $f_{A}^{*} (\cdot | α)$ with an unknown parameter α for $A_{i}^{*}$ . A linear mixed effects model will be considered for the longitudinal covariate:

{\vec{W}}_{i}^{*} = X ({\vec{s}}_{i}) + ε_{i} = g ({\vec{s}}_{i}) A_{i}^{*} + ε_{i},

(2.1)

where g(·) is a known q-dimensional function and the n_i × 1 vector ε_i plays the role of measurement errors, sampled from a multivariate normal distribution with independent marginal distribution 𝒩(0, σ²), and independent of all other aforementioned random variables.

For the survival time $Y_{i}^{*}$ , a proportional hazards model is employed, and the hazard rate of $Y_{i}^{*}$ at time t given $A_{i}^{*}$ is

λ_{Y_{i}^{*}} (t | A_{i}^{*}) = λ_{0} (t) exp (β X_{i} (t)),

(2.2)

where λ₀ is the baseline hazard rate and β is the regression coefficient. The truncation time $T_{i}^{*}$ and the time U_i, from entry to drop-out, are assumed to have distribution function F_T*(·) and F_U(·) respectively. We adopt the standard assumption in survival analysis, that $Y_{i}^{*}, T_{i}^{*}$ and U_i are conditionally independent given the covariates. This is equivalent to assuming conditional independence of $Y_{i}^{*}, T_{i}^{*}$ , and U_i given the value of $A_{i}^{*}$ . We also assume that $T_{i}^{*}$ and U_i are independent of $A_{i}^{*}$ and the parameters in the models for either the survival or longitudinal parts are noninformative.

2.2. A Modified Likelihood Approach

For the model described in the previous subsection, the parameters of interest are (β, α, σ² and Λ₀(·)), where the first three components are in the Euclidean space whereas $Λ_{0} (t) = \int_{0}^{t} λ_{0} (u) du$ , the cumulative hazard function, is in a functional space, hence the model is semiparametric. Since a likelihood approach usually provides the most efficient estimating procedure, we first consider the full likelihood function $L_{i}^{O}$ based on the observations (t_i, z_i, δ_i, w⃗_i) from the ith subject. The derivation of the full likelihood from the ith subject is shown below.

L_{i}^{O} = f_{(T, Y, Δ, W)} (t_{i}, z_{i,} δ_{i}, w_{i}) = \frac{f_{(T^{*}, Y^{*}, Δ^{*}, W^{*})} (t_{i}, z_{i,} δ_{i}, w_{i})}{P (Y^{*} \geq T^{*})} = \frac{{\int {[f_{Y^{*}} (z_{i} | A_{i}^{*} = a_{i})]}^{δ_{i}} {[S_{Y^{*}} (z_{i} | A_{i}^{*} = a_{i})]}^{1 - δ_{i}} f_{W^{*}} (w_{i} | A_{i}^{*} = a_{i}) f_{A^{*}} (a_{i}) {da}_{i}} f_{T^{*}} (t_{i})}{P (Y_{i}^{*} \geq T_{i}^{*})} = {\int \frac{{[f_{Y^{*}} (z_{i} | A_{i}^{*} = a_{i})]}^{δ_{i}} {[S_{Y^{*}} (z_{i} | A_{i}^{*} = a_{i})]}^{1 - δ_{i}}}{S_{Y^{*}} (t_{i} | A_{i}^{*} = a_{i})} f_{W^{*}} (w_{i} | A_{i}^{*} = a_{i}) \frac{S_{Y^{*}} (t_{i} | A^{*} = a_{i}) f_{A^{*}} (a_{i})}{S_{Y^{*}} (t_{i})} {da}_{i}} \frac{S_{Y^{*}} (t_{i}) f_{T^{*}} (t_{i})}{P (Y_{i}^{*} \geq T_{i}^{*})} = {\int {[f_{Y^{*}} (z_{i} | Y_{i}^{*} \geq t_{i}, A_{i}^{*} = a_{i})]}^{δ_{i}} {[S_{Y^{*}} (z_{i} | Y_{i}^{*} \geq t_{i}, A_{i}^{*} = a_{i})]}^{1 - δ_{i}} f_{W^{*}} (w_{i} | A_{i}^{*} = a_{i}) f_{A^{*}} (a_{i} | Y_{i}^{*} \geq t_{i}) {da}_{i}} f_{T^{*}} (t_{i} | Y_{i}^{*} \geq T_{i}^{*}),

(2.3)

where f_V is the density function of the random variable V in the subscript, and S_V is the corresponding survival function. In (2.3), besides the baseline hazard function λ₀, the density function f_T* also serves as a nonparametric component. Because of these two nonparametric components, the full likelihood function is unbounded, so we resort to the nonparametric maximum likelihood approach, which leads to a similar scenario as in conventional survival analysis that the full likelihood is the same as the conditional likelihood given the left-truncation time. This has been explored in the literature (Andersen et al., 1993; Klein and Moeschberger, 2003) for LTRC data with baseline covariates and was first explored in Wang (1987) for the simpler situation of left truncated data that came from a single population. Following a similar argument as in Wang (1987), we found that the full likelihood can be simplified to the following conditional likelihood for the ith subject as

L_{i}^{C} = {\int {[f_{Y^{*}} (z_{i} | Y_{i}^{*} \geq t_{i}, A_{i}^{*} = a_{i})]}^{δ_{i}} {[S_{Y^{*}} (z_{i} | Y_{i}^{*} \geq t_{i}, A_{i}^{*} = a_{i})]}^{1 - δ_{i}} f_{W^{*}} (w_{i} | A_{i}^{*} = a_{i}) f_{A^{*}} (a_{i} | Y_{i}^{*} \geq t_{i}) {da}_{i}} .

(2.4)

Next, we consider the nonparametric maximum likelihood estimators (NPMLE) of the survival component, which, by a similar argument for joint modeling right-censored data and their longitudinal covariates (Zeng and Cai, 2005; Dupuy, Grama and Mesbah, 2006), leads to a piecewise linear baseline cumulative hazard function with jumps at each uncensored event time (i.e. at Y_i, whenever Δ_i = 1). Let n_u denote the total number of uncensored events, the baseline cumulative hazard function is thus re-parameterized as a n_u-dimensional vector.

So far, the derivation of the likelihood function and NPMLE follows a similar path as the much investigated case of a joint modeling setting with right censored data, where NPMLE’s for the parametric component enjoy nice asymptotic properties and are semiparametrically efficient. Despite these similarities, the left truncation feature triggers complications in the estimation of the finite dimensional parameter in the joint LTRC model. First, as shown in the Appendix, the parameter α associated with the latent variable A* is not identifiable. This is a consequence of the biased sampling plan, since the samples are actually drawn from the subpopulation Y* ≥ T*. Consequently, only E(A*|Y* ≥ T*) and var(A*|Y* ≥ T*) could be identified under the normality assumption. Thus, while it is possible to identify the unknown parameters of Y* and T* based on the joint conditional distribution of (Y*, T*)|Y* ≥ T*, where the notation (·|Y* ≥ T*) stands for a random variable/ vector sampled from the subpopulation with Y* ≥ T*, there is not enough information to recover E(A*) and var(A*) and hence the true longitudinal parameters α.

A second complication is that the score equations for the survival components, β and Λ₀, are much more complicated than the situation under a right censored only model and, as shown in Appendix A.1, as they require estimation of the expectations of nonlinear functions of the observed data along with the the parameters of interest. This motivates us to modify the likelihood so as to simplify the estimation of all parameters that are identifiable. Our proposal is to aim at the following modified likelihood, denoted by L^m, as an alternative of the full, also the conditional, likelihood in (2.4). The modified likelihood is

L^{m} = \prod_{i = 1}^{n} {\int {[f_{Y^{*}} (z_{i} | Y_{i}^{*} \geq t_{i}, A_{i}^{*} = a_{i})]}^{δ_{i}} {[S_{Y^{*}} (z_{i} | Y_{i}^{*} \geq t_{i}, A_{i}^{*} = a_{i})]}^{1 - δ_{i}} f_{W^{*}} (w_{i} | A_{i}^{*} = a_{i}) f_{A^{*}} (a_{i}) {da}_{i}} f_{T^{*}} (t_{i} | Y_{i}^{*} \geq T_{i}^{*}),

(2.5)

where the lower case variables denote the values of the corresponding upper case variables, e.g. δ_i is the value of Δ_i. The estimators obtained by maximizing the modified likelihood, where the nonparametric cumulative hard function is replaced by a step function will be referred to as the nonparametric maximum modified likelihood (NPMMLE) hereafter.

The difference between (2.4) and (2.5) is that $f_{A^{*}} (a_{i} | Y_{i}^{*} \geq t_{i})$ in the full likelihood (2.4) is replaced by f_A*(a_i) in (2.5). This is motivated by the fact that $f_{A^{*}} (a | Y^{*} \geq t) = \frac{S_{Y^{*}} (t | A^{*} = a)}{S_{Y^{*}} (t)} f_{A^{*}} (a)$ and $E [\frac{S_{Y^{*}} (t | A^{*})}{S_{Y^{*}} (t)}] = 1$ , for any t, and that, as shown in Lemma A.1 in the Appendix, the score functions of the survival parameters from (2.5) are asymptotically the same as those from (2.4). Theoretical results in the next section and numerical evidence in Section 4 demonstrate good performance of estimators of all the survival parameters, (β, Λ₀(·)) and of the measurement errors σ² of the longitudinal component that we derived from this modified likelihood.

3. EM-algorithm and asymptotic properties

Let γ = (β, α, σ²) be the finite dimensional parameter in the joint survival and longitudinal model, and Λ be a step function. The log modified likelihood is

l^{m} (γ, Λ) = \sum_{i = 1}^{n} ln \int {[Λ {z_{i}} exp β g (z_{i}) a_{i}]}^{δ_{i}} exp {- \sum_{j : t_{i} < y_{j}^{0} \leq z_{i}} Λ {y_{j}^{0}} exp {β g (y_{j}^{0}) a_{i}}} {(2 π σ^{2})}^{- n_{i} / 2} exp {- \sum_{j = 1}^{m_{i}} {[w_{ij} - g (s_{ij}) a_{i}]}^{2} / (2 σ^{2})} f_{A^{*}} (a_{i}) {da}_{i},

where Λ{·} is the jump size of Λ at the respective time point in the argument, and $y_{j}^{0}$ is the jth sorted observed survival time in increasing order. Moreover, τ₁ and τ₂ denote the lower bound of truncation time and the largest censoring time corresponding to the end of the study.

Since direct maximizing the proposed modified likelihood involves integration of a complex function with respect to the random effects, we employ the expectation-maximization (EM) algorithm (Laird and Ware, 1982) to stabilize the maximization procedure. In the implementation of the EM algorithm, a Monte Carlo integration approach is used to approximate the expectation terms of functions h(A*) appearing in the E-step. A one-step Newton-Raphson method is applied to solve the nonlinear equations in the M-step. The posterior density of the random effects $A_{i}^{*}$ given the observed data from the ith subject, o_i = (t_i, z_i, δ_i, w⃗_i), is of the form

f_{A^{*} | O} (a | o_{i}) = \frac{f_{(Y, Δ) | (A, T)} (z_{i}, δ_{i} | a, t_{i}) \times f_{A^{*} | W^{*}} (a | {\vec{w}}_{i})}{\int f_{(Y, Δ) | (A, T)} (z_{i}, δ_{i} | a, t_{i}) \times f_{A^{*} | W^{*}} (a | {\vec{w}}_{i}) da} = \frac{{[Λ {z_{i}}]}^{δ_{i}} exp {- \sum_{j : t_{i} < y_{j}^{0} \leq z_{i}} Λ {y_{i}^{0}} exp {β g (y_{j}^{0}) a}} \times f_{A^{*} | W^{*}} (a | {\vec{w}}_{i})}{\int {[Λ {z_{i}}]}^{δ_{i}} exp {- \sum_{j : t_{i} < y_{j}^{0} \leq z_{i}} Λ {y_{i}^{0}} exp {β g (y_{j}^{0}) a}} \times f_{A^{*} | W^{*}} (a | {\vec{w}}_{i}) da} .

For a simpler implementation of the algorithm, we shall impose a normal assumption on the random effects and assume that $A_{i}^{*}$ , i = 1, …, n, follow a normal distribution N(μ, Σ), where (μ, Σ) plays the role of the parameter α.

By taking the first derivative of the log modified likelihood calculated in the E-step with respect to each parameter, the NPMMLE, β̂, {λ̂_k, k = 1, …, n_u}, σ̂, μ̂, and Σ̂, can be obtained through the following formulae, where λ_k is the jump size of Λ at the kth sorted observed survival time,

{\hat{λ}}_{k} = \frac{1}{\sum_{i : t_{i} < y_{k}^{0} \leq z_{i}} E [exp {β g (y_{k}^{0}) A_{i}^{*}} | o_{i}]}, k = 1, \dots, n_{u},

{\hat{σ}}^{2} = \frac{1}{\sum_{i = 1}^{n} n_{i}} \sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} E [{(w_{ij} - g (s_{ij}) A_{i}^{*})}^{2} | o_{i}],

\hat{μ} = \frac{1}{n} \sum_{i = 1}^{n} E (A_{i}^{*} | o_{i}),

\sum^{^} = \frac{1}{n} \sum_{i = 1}^{n} E [(A_{i}^{*} - \hat{μ}) {(A_{i}^{*} - \hat{μ})}^{T} | o_{i}],

and β̂ is the root of the score s(β), which is solved by an one-step Newton-Raphson method with the updating rule

β_{new} = β_{old} - \frac{s (β_{old})}{s' (β_{old})},

where

s (β) = \sum_{i = 1}^{n} δ_{i} [g (z_{i}) E (A_{i}^{*} | o_{i}) - \frac{\sum_{j : t_{j} < z_{i} \leq z_{j}} E (g (z_{i}) A_{j}^{*} exp {β g (z_{i}) A_{j}^{*}} | o_{j})}{\sum_{j : t_{j} < z_{i} \leq z_{j}} E (exp {β g (z_{i}) A_{j}^{*}} | o_{j})}],

s' (β) = \sum_{i = 1}^{n} δ_{i} {{[\frac{\sum_{j : t_{j} < z_{i} \leq z_{j}} E (g (z_{i}) A_{j}^{*} exp {β g (z_{i}) A_{j}^{*}} | o_{j})}{\sum_{j : t_{j} < z_{i} \leq z_{j}} E (exp {β g (z_{i}) A_{j}^{*}} | o_{j})}]}^{2} - \frac{\sum_{j : t_{j} < z_{i} \leq z_{j}} E {((g (z_{i}) A_{j}^{*})}^{2} exp {β g (z_{i}) A_{j}^{*}} | o_{j})}{\sum_{j : t_{j} < z_{i} \leq z_{j}} E (exp {β g (z_{i}) A_{j}^{*}} | o_{j})}} .

Except for α, the proposed nonparametric maximum modified likelihood estimates (NPMMLE) of the parameters enjoy nice properties that are similar to the NPMLE, as illustrated in the next two theorems. Below we listed some regularity conditions needed for the theorems.

C1
The parameter space of the finite dimensional parameters, S_γ, is bounded and closed on Euclidean space. The true value γ₀ is an interior point of S_γ.
C2
On the parameter space of β, (exp{βg(S)A*}|Y* ≥ T*) is bounded below by m and above by M with probability 1.
C3
P(T ≤ τ₁ and Y ≥ τ₂) > 0. This ensures that not all data are truncated or censored.
C4
E_θ₀{exp[β₀g(u)A*]I(T* < u ≤ Y*)|Y* ≥ T*} is bounded away from 0 on the parameter space of β. Here E_θ₀(·) stands for the expectation taken under the true value of the parameter θ₀.
C5
g(t) is of uniformly bounded variation on [τ₁, τ₂], and there exists a constant D such that P(n_i ≤ D) = 1, ∀ i.
C6
The distribution f_A*(·|α) is continuous with respect to α and has continuous second derivative with respect to α. Moreover, the Fisher information matrix obtained from f_A* for α is positive definite.

Theorem 1

Consistency of the estimators. Under the regularity conditions C1–C5, the NPMMLE of (β₀, $σ_{0}^{2}$ , Λ₀), denoted as (β̂_n, σ̂², Λ̂ⁿ), is consistent under the Euclidean norm |·| and supremum norm ‖·‖_∞ on [τ₁, τ₂] respectively.

For H = {h = (h₁, h₂, h₃)} and 0 < p < ∞, let H_p = {h ∈ H :‖h₁‖, |h₂|, ‖h₃‖_υ ≤ p}, be a collection of directions that are used in the Appendix. The notation ‖·‖_υ denotes the the total variation of the function in the norm plus the absolute value of this function evaluated at 0. The next theorem shows that the NPMMLE converges in distribution to a Gaussian element in the parameter space at a $\sqrt{n}$ -rate.

Theorem 2

Asymptotic normality and efficiency. Under the regularity conditions C1–C6, the process $\sqrt{n} ({\hat{α}}_{n} - E ({\hat{α}}_{n}), {\hat{σ}}_{n}^{2} - σ_{0}^{2}, {\hat{β}}_{n} - β_{0}, {\hat{Λ}}_{n} - Λ_{0})$ converges in distribution to a mean zero Gaussian process G in the functional space l_∞(H_p) on H_p. Moreover, the NPMMLE β̂ is semiparametrically efficient for β₀.

Proofs of these two theorems are provided in the Appendix.

For estimating the standard errors of the NPMMLE, we recommend to use the bootstrap procedure instead of the profile likelihood approach in Murphy and van der Vaart (2000) and Zeng and Cai (2005), which did not work well for LTRC data due to the high fluctuation of the estimated profile likelihood function and possibly negative estimate of the standard error. The performance of the bootstrap procedure for estimating the standard errors of the NPMLE under joint modeling with right-censoring cases has been studied by Tseng, Hsieh and Wang (2005) for the accelerated failure time model, and by Hsieh, Tseng and Wang (2006) for the Cox model. The results in these two papers and support the validity of the bootstrap method in the scope of joint modeling. Our simulation results reported in Section 4 also supports the use of the bootstrap approach. In comparison, the bootstrap method is more reliable than the profile likelihood method at a higher computational cost.

4. Simulation Study

To verify numerically the validity of the proposed procedure, we conducted simulations under five different settings. Since there is an intrinsic bias on the longitudinal component, the simulations focus on the performance of the estimate of β and how it would be affected by the level of contamination from the measurement errors and the variation of the random effects. As a benchmark setting, we considered a linear trend in time with random effects on the longitudinal covariate and assess the influence of the variance of the random slope on the accuracy of estimating β. The left-truncation times are generated from an exponential distribution with parameter 1, while the right-censoring times are from an exponential distribution with parameter 3. The baseline hazard rate is from an exponential distribution with mean 1. All 5 simulation settings have sample size n = 200 with true values β = 1, μ = (2, 0.5) and (σ₁₁, σ₁₂) = (0.5, −0.001). The values of (σ₂₂, σ²) are different for the five settings and set as: (0.01,0.1), (0.01,0.4), (0.01,0.025), (0.0025,0.1) and (0.04,0.1). The first three settings demonstrate the impact of contaminations by measurement errors while the last two illustrate the effect of the variation of the random slope.

Simulation results based on 100 Monte Carlo samples are reported in Table 1. Results under the first three settings suggest that β can be estimated unbiasedly and measurement errors affect the precision, but not the magnitude of the biases. As expected, higher level of noise contamination leads to less precise estimate of β and higher chance of divergence in the algorithm. In all three settings, the variance of measurement errors can be estimated with high accuracy and precision. Comparing with the results under the first, fourth and fifth setting from Table 1, we observe that the variance of the random slopes has little effect on the performance of β̂.

Table 1.

Simulation results under five settings with sample size 200 and varying values of σ₂₂ and σ². The actual targets of the longitudinal estimates are conditional quantities marked as $μ_{1}^{*}$ and $μ_{2}^{*}$ etc. and are listed next to the true longitudinal value in the first column, The mean and SD of the estimates based on 100 Monte Carlo samples are reported in the second and third column.

Case	Parameter	Average of NPMMLE	SE(MC)	MSE	Convergence rate
1	β(1)	0.9923	0.1633	0.0267	98%
	σ²(0.1)	0.0998	0.0021	5e-6
	$μ_{1} / μ_{1}^{*}$ (2/1.73)	1.7461	0.0478	0.0668
	$μ_{2} / μ_{2}^{*}$ (0.50/0.50)	0.4545	0.0985	0.0118
	$σ_{11} / σ_{11}^{*}$ (0.50/0.45)	0.4634	0.0527	0.0041
	$σ_{12} / σ_{12}^{*}$ (−0.001/−0.001)	−0.0424	0.0453	0.0038
	$σ_{22} / σ_{22}^{*}$ (0.01/0.01)	0.0738	0.0409	0.0057

2	β(1)	0.9185	0.1765	0.0378	72%
	σ²(0.4)	0.4003	0.0086	7e-5
	$μ_{1} / μ_{1}^{*}$ (2/1.74)	1.7455	0.0531	0.0676
	$μ_{2} / μ_{2}^{*}$ (0.50/0.50)	0.3801	0.1640	0.413
	$σ_{11} / σ_{11}^{*}$ (0.5/0.45)	0.4730	0.0505	0.0033
	$σ_{12} / σ_{12}^{*}$ (−0.001/−0.001)	−0.1122	0.0917	0.0208
	$σ_{22} / σ_{22}^{*}$ (0.01/0.01)	0.1856	0.1156	0.0442

3	β(1)	1.0380	0.1548	0.0254	96%
	σ²(0.025)	0.0250	4.8283e-4	≃0
	$μ_{1} / μ_{1}^{*}$ (2/1.73)	1.7443	0.0468	0.0676
	$μ_{2} / μ_{2}^{*}$ (0.50/0.50)	0.4900	0.0643	0.0042
	$σ_{11} / σ_{11}^{*}$ (0.50/0.45)	0.4520	0.0534	0.0052
	$σ_{12} / σ_{12}^{*}$ (−0.0004/−0.0004)	−0.0193	0.0338	0.0015
	$σ_{22} / σ_{22}^{*}$ (0.01/0.01)	0.0571	0.0219	0.0027

4	β(1)	0.9684	0.1504	0.0236	98%
	σ²(0.1)	0.0997	0.0023	5e-6
	$μ_{1} / μ_{1}^{*}$ (2/1.74)	1.7464	0.0460	0.0664
	$μ_{2} / μ_{2}^{*}$ (0.50/0.50)	0.4491	0.0948	0.0116
	$σ_{11} / σ_{11}^{*}$ (0.5/0.45)	0.4497	0.0423	0.0043
	$σ_{12} / σ_{12}^{*}$ (−0.001/−0.0007)	−0.0439	0.0518	0.0045
	$σ_{22} / σ_{22}^{*}$ (0.0025/0.0025)	0.0797	0.0479	0.0072

5	β(1)	0.9934	0.1567	0.0246	95%
	σ²(0.1)	0.0996	0.0020	4e-6
	$μ_{1} / μ_{1}^{*}$ (2/1.74)	1.7522	0.0464	0.0636
	$μ_{2} / μ_{2}^{*}$ (0.50/0.50)	0.4498	0.1168	0.0162
	$σ_{11} / σ_{11}^{*}$ (0.50/0.45)	0.4559	0.0442	0.0039
	$σ_{12} / σ_{12}^{*}$ (−0.001/−0.002)	−0.0433	0.0692	0.0066
	$σ_{22} / σ_{22}^{*}$ (0.04/0.04)	0.1186	0.0642	0.0159

Open in a new tab

The results for the longitudinal part echo the above discussion of the non-identifiability of the parameter α, as the means of the random intercept and random slopes (shown in the second column of Table 1) are consistently underestimated. The actual targets of the estimates are the conditional quantities marked as $μ_{1}^{*}$ and $μ_{2}^{*}$ etc. in the first column of Table 1. The sizes of the biases vary with the level of truncation probability and size of measurement errors and can be very small for the mean of the random slope, e.g. in setting 3, where the measurement error is small. Thus, this bias problem in estimating the longitudinal component may elude researchers, while it is a cause of substantial concern in settings with large error variances.

To make statistical inference about the parameters of interest, it is necessary to get an estimate of the standard error of the NPMMLE, especially for β. We tried the approach in Murphy and van der Vaart (2000) and Louis (1982), but neither works, so we propose to use a bootstrap method (Tseng, Hsieh and Wang (2005) for estimating the standard error of the NPMMLE and present the results in Table 2. Only the results for estimating the standard errors of β̂ and σ̂² are shown, since they are estimable. Table 2 supports the use of the bootstrap procedure, as the estimated standard error from the bootstrap method is close to the standard deviation from the 100 Monte Carlo samples, even when the degree of error contamination is large or the random slopes vary widely.

Table 2.

Performance of estimated variance, SE(BT), of β̂ and σ̂² through bootstrap with 50 resamples.

Case	Parameter	SE(MC)	SE(BT)
1	β(1)	0.1633	0.1692
	σ²(0.1)	0.0021	0.0020

2	β(1)	0.1765	0.1813
	σ²(0.4)	0.0086	0.0091

3	β(1)	0.1548	0.1523
	σ²(0.025)	4.8283e-4	5e-4

4	β(1)	0.1504	0.1539
	σ²(0.1)	0.0023	0.0020

5	β(1)	0.1567	0.1531
	σ²(0.1)	0.0020	0.0021

Open in a new tab

5. Data example: multi-center HIV study

In this section, we conduct an analysis on the data from a multi-center HIV study in Italy. Details of the study design and a previous analysis can be found in Rezza et al. (1989) and The-Italian-Seroconversion-Study (1992). There were 448 HIV-positive patients in the data. The primary event of interest is the incubation period of acquired immunodeficiency syndrome (AIDS), i.e. time (in years) from detection of HIV-infection until the onset of AIDS. There were 140 patients who received the HAART treatment at various times, resulting in a longitudinal treatment indicator that is fully observable, so no modeling of this process is necessary. However, there is a second longitudinal covariate, the CD4 counts, that are observed only intermittently at follow-up visits, motivating the need to model the survival and longitudinal covariates jointly. The main biomedical interest lies in determining the effect of the HAART treatment on reducing the risk of developing AIDS, and the association between the incubation period of AIDS and CD4 T-cell counts in HIV-infected subjects.

For each of the 448 subjects in the study, the longitudinal measurements of CD4 T-cell counts were recorded intermittently along with the time to AIDS or dropout from the study. The total number of longitudinal measurements is 4442 and the average number of longitudinal measurements is 9.92 per patient.

One feature of this data is that the incubation period is subject to left-truncation and right-censoring, since patients were recruited to the study at various times after the study began, and only patients who have not developed AIDS at the time of recruitment are included in the study. Moreover, only 147 out of the 448 patients (about 33%) developed AIDS by the end of the study, so the right censoring rate is quite high for this data.

To model the longitudinal CD4 counts, we adopt a linear mixed effects model on log(CD4 + 1) with changing intercepts and slopes at the time of HAART treatment. Thus,

W_{i}^{*} (s_{ij}) = X_{i} (s_{ij}) + ε_{ij} = A_{i 0}^{*} + A_{i 1}^{*} s_{ij} + A_{i 2}^{*} I (s_{ij} > V_{i}) + A_{i 3}^{*} s_{ij} I (s_{ij} > V_{i}) + ε_{ij},

where ε_ij is from a normal distribution N(0, σ²), ${\vec{A}}_{i}^{*} = (A_{i 0}^{*}, A_{i 1}^{*}, A_{i 2}^{*}, A_{i 3}^{*})$ is from a 4-dimensional multivariate normal distribution with a 4 × 1 mean vector μ and a 4 × 4 covariance matrix Σ, and V_i represents relative age since HIV-positive of receiving HAART. For those who have never received HAART, V_i is defined to be infinity. For the time-to-AIDS, we assume a Cox model with X_i(t), CD4 counts, as an time-dependent covariate along with another time-dependent treatment indicator, I(t > V_i), which is completely observed. The resulting model is:

λ (t | {\vec{A}}_{i}^{*}) = λ_{0} (t) exp (β_{1} X_{i} (t) + β_{2} I (t > V_{i})) .

From the EM algorithm with Monte Carlo approximation, the slope, β̂¹, for the underlying log(CD4 + 1) process is estimated to be −0.5762 (p-value < 0.001), while the slope, β̂², for the longitudinal treatment indicator is estimated to be −1.2189 (p-value < 0.001). As expected, CD4 counts are negatively associated with the risk of AIDS. One unit of decline on log(CD4+1) is associated with an increasing risk of AIDS by 78%. In addition to its effect on CD4 counts, HAART has an additional effect on reducing the risk of AIDS. It significantly reduces the risk of developing AIDS by 70% after controlling for the CD4 counts. Through the analysis, we confirm that HAART effectively reduces the risk of developing AIDS both through a positive association with patients’ CD4 counts and the risk to develop AIDS.

6. Conclusions and discussion

We have shown, both theoretically and empirically, that joint modeling the time-to-event and longitudinal covariates is an effective modeling approach when the time-to-event is subject to both left truncation and right censoring. However, the extension from right-censorship to LTRC is not trivial. By modifying the joint likelihood, we have shown that NPMMLE leads to consistent and asymptotically efficient estimation of the survival component and measurement error variance under the setting of a semiparametric Cox model. We have also demonstrated that the corresponding EM algorithm to locate the NPMMLE has good empirical performance and asymptotic properties under the assumption of normal random effects. It is not only computational effective but also robust against departures from the normality assumption.

However, one caveat is the estimability of the longitudinal component. Although we can recover the conditional distribution of the longitudinal parameter, α, given Y ≥ T, the parameter α itself can not be estimated properly though the modified likelihood due to the biased sampling plan. Additional strong and possibly unverifiable assumptions might be needed in order to recover the parameter α of the random effects. What we have accomplished in this paper is to successfully remove the bias for the estimation of the survival components attributed to the discrete measurement schedule and measurement errors of the longitudinal covariates, thus permitting asymptotically valid and efficient inference for the survival related parameters, which are crucial for the evaluation of therapies.

A final issue of interest is the prediction of survival probabilities. In the presence of time-dependent covariate, the concept of the hazard rate function itself is based on a conditional probability formulated as

λ (t | \bar{X} (t)) Δ t = P (T \in [t, t + Δ t] | \bar{X} (t), T \geq t) .

Here both internal and external covariates can be included, although the internal covariate up to time t implicitly contains the information that this subject survives up to time t. This does not cause any problem in the definition of the hazard function since it is conditional on T ≥ t. However if we consider the (subject-specified) survival probability, P(T ≥ t|X̄ (t)), then its value is actually 1 (Sec. 6.3.2, Kalbfleisch and Prentice 2002), since the process X(t) itself contains the information that T ≥ t. This undesirable feature can be avoided under the framework of joint modeling, as the latent longitudinal covariate X(t) is completely determined by the random effect A and the time point t through the submodel in the longitudinal part. Consequently, the survival probability should be defined as P(T ≥ t|A). In fact, it is meaningful to predict future survival probabilities, P(T ≥ s|A), for any s > t. This is one of the benefits of joint modeling in the presence of (internal) longitudinal covariates as it is possible to make predictions, albeit with some errors, whereas this is not possible under the partial likelihood approach. In Summary, the joint modeling approach affords a more meaningful definition of the present and future survival probability through P(T ≥ s|A), where A is the random term linking the two submodels together. Evaluating the prediction errors and associated statistical inference could be an interesting future research project.

Acknowledgments

The authors thank the Associate Editor and reviewers for insightful comments. This work is partially supported by an NIH grant 1R01AG025218-01.

Appendix

A.1. Likelihood and the score equations

By imposing a normality assumption N(μ, Σ) on the random effects $A_{i}^{*}$ , the full likelihood in (2.3) from the ith subject becomes

L_{i}^{O} \propto \frac{f_{T^{*}} (t_{i}) σ^{- n_{i}} λ_{0} {(z_{i})}^{δ_{i}} \int_{- \infty}^{\infty} exp {δ_{i} β g (z_{i}) a_{i} - \sum_{j = 1}^{n_{i}} {[w_{ij} - g (s_{ij}) a_{i}]}^{2} / (2 σ^{2})} Q_{1} (z_{i}, a_{i}) {da}_{i}}{\int_{0}^{\infty} \int_{- \infty}^{\infty} Q_{1} (t, a_{i}) f_{T} (t) {da}_{i} dt},

where $Q_{1} (u, a) = exp {- \int_{0}^{u} exp [β g (t) a] d Λ_{0} (t) - {(a - μ)}^{T} \sum^{- 1} (a - μ) / 2}$ . Following similar arguments as in Wang (1987) and combining with Vardi (1985), we can prove that the NPMLE’s of all finite-dimensional parameters are the same as those from the conditional likelihood of (z_i, δ_i, w⃗_i) given ( $Y_{i}^{*} > t_{i}$ ). Moreover, by a proof similar to that of the classical Cox model for right censored data, the NPMLE from the conditional likelihood is attained by discrete baseline hazard functions that assign positive masses only at uncensored survival times, ( $y_{1}^{0}, \dots, y_{n_{u}}^{0}$ ).

Let o_i = (t_i, z_i, δ_i, w⃗_i) denote the observed data for the ith subject. The first derivative of the log full likelihood leads to the following score functions:

s_{σ^{2}}^{o} = \sum_{i = 1}^{n} {\sum_{j = 1}^{n_{i}} E {[w_{ij} - A_{i}^{*} g (s_{ij}) | o_{i}]}^{2} - n_{i} σ^{2}} / σ^{- 3},

s_{μ}^{o} = \sum^{- 1} \sum_{i = 1}^{n} E {(A_{i}^{*} - μ) - [E (A_{i}^{*} | Y_{i}^{*} \geq T_{i}^{*}) - μ] | o_{i}} = \sum^{- 1} \sum_{i = 1}^{n} [E (A_{i}^{*} | o_{i}) - E (A_{i}^{*} | Y_{i}^{*} \geq T_{i}^{*})],

s_{\sum}^{o} = \frac{1}{2} \sum^{- 1} \sum_{i = 1}^{n} {E [(A_{i}^{*} - μ) {(A_{i}^{*} - μ)}^{T} | o_{i}] - E [(A_{i}^{*} - μ) {(A_{i}^{*} - μ)}^{T} | Y_{i}^{*} \geq T_{i}^{*}]} \sum^{- 1},

s_{Λ_{k}}^{o} = \frac{1}{Λ_{k}} - \sum_{i : t_{i} < y_{k}^{0} \leq z_{i}} E {exp [β g (y_{k}^{0}) A^{*}] | o_{i}} - Q_{2} (y_{k}^{0}),

s_{β}^{o} = \sum_{i = 1}^{n} δ_{i} g (y_{i}) E (A_{i}^{*} | o_{i}) - \sum_{i = 1}^{n} \sum_{j : t_{i} < y_{j}^{0} \leq z_{i}} Λ_{j} E {g (y_{j}^{0}) A_{i}^{*} exp [β g (y_{j}^{0}) A_{i}^{*}] | o_{i}} - Q_{3},

where

Q_{2} (y) = \sum_{i : y \leq t_{i}} E {exp [β g (y) A_{i}^{*}] | o_{i}} - nE {exp [β g (y) A_{i}^{*}] I (y \leq T_{i}) | Y_{i}^{*} \geq T_{i}^{*}},

Q_{3} = \sum_{i = 1}^{n} \sum_{j : y_{j}^{0} \leq t_{i}} Λ_{j} E {g (y_{j}^{0}) A_{i}^{*} exp [β g (y_{j}^{0}) A_{i}^{*}] | o_{i}} - nE {\sum_{j : y_{j}^{0} \leq t_{i}} Λ_{j} g (y_{j}^{0}) A_{i}^{*} exp [β g (y_{j}^{0}) A_{i}^{*}] | Y_{i}^{*} \geq T_{i}^{*}} .

The score equations, $s_{μ}^{o}$ and $s_{\sum}^{o}$ , corresponding to the longitudinal data reveal that the estimable terms are the conditional mean and covariance matrix of the random effects given that Y* ≥ T rather than μ and Σ.

The score functions for λ_k, k = 1, …, n_u, and β have more complicated forms than those from a partial likelihood under standard Cox model subject to LTRC. The complication is due to the additional terms Q₂ and Q₃, which require estimation of the expectations of nonlinear functions of the observed data along with the the parameters of interest. If we drop these two terms from $s_{Λ_{k}}^{o}$ and $s_{β}^{o}$ , the modified score functions, $s_{Λ_{k}} = s_{Λ_{k}}^{o} + Q_{2} (y_{k}^{0})$ and $s_{β} = s_{β}^{o} + Q_{3}$ , are exactly the score functions from the modified likelihood. The next Lemma validates the use of the modified likelihood (2.5).

Lemma 1

$E_{θ_{0}} (s_{Λ_{k}}) = E_{θ_{0}} (s_{Λ_{k}}^{o})$ and $E_{θ_{0}} (s_{β}) = E_{θ_{0}} (s_{β}^{o})$ . This provides Fisher consistency of the estimators (2.5).
Under the regularity conditions for law of large numbers and Slutsky theorem, $n^{- 1} (s_{Λ_{k}} - s_{Λ_{k}}^{o}) = o_{p} (1)$ and $n^{- 1} (s_{β} - s_{β}^{o}) = o_{p} (1)$ .

Proof

The proof follows from simple derivation and applications of the Law of Large Number along with Slutsky theorem.

This lemma demonstrates the asymptotic equivalence of the score functions for the survival-related parameters from (2.3) and (2.5). The latter is computationally simpler to maximize and thus more attractive than the full likelihood.

A.2. Proof of the consistency of the NPMMLE

The proof of consistency includes four major steps and is elaborated below.

STEP 1. Existence of the NPMMLE of (γ, Λ)

We will begin the proof that the candidates for the maximizer, Λ_{n_u}, have a finite and bounded jump at each observed survival time. For simplicity, we use a vector form λ⃗_{n_u} = (λ₁, …, λ_{n_u}) to express the jump sizes of Λ_{n_u} at ordered survival times. The boundedness of the jump sizes can be demonstrated by proving the existence of an upper bound B ∈ ℝ through apagoge. Suppose that for any arbitrary B ∈ ℝ, there exists λ⃗_{n_u,B} = (λ_1,B, …, λ_{n_u,B}) ∈ ℝ^n_u\[0, B]^n_u and γ_B ∈ S_γ such that L^m(γ_B, λ⃗_{n_u,B}) > L_m(γ, λ⃗_{n_u}) for all (γ, λ⃗_{n_u}) ∈ S_γ × [0, B]^n_u. The first part in L^m(γ_B, λ⃗_{n_u,B}) contributed by the ith subject is bounded above by

{(Λ_{n_{u}, B} {z_{i}} M)}^{δ_{i}} \times exp {- m \sum_{j : t_{i} < y_{j}^{0} \leq z_{i}} λ_{j, B}},

where m, M is defined in assumption C.2. Since λ⃗_{n_u,B} ∈ ℝ^n_u\[0, B]^n_u, at least one jump size, say λ_i₀,B, is greater than B. It induces that $\sum_{j : t_{i_{0}} < y_{j}^{0} \leq z_{i_{0}}} λ_{j, B} > B$ , and then implies that L^m(γ_B, λ⃗_{n_u,B}) → 0 as B → ∞. Thus L^m(γ, λ_{n_u}) = 0, for all (γ, λ_{n_u}) ∈ S_γ ∈ ℝ^n_u, which is a contradiction. It demonstrates the boundedness of the jump sizes of Λ_{n_u}. Along with the compactness of S_γ provided by assumption C.1., we accomplished the existence of the NPMMLE of (γ, Λ).

STEP 2. Almost surely boundedness of Λ̂(τ₂) as n → ∞

For any fixed sample size n, the estimated cumulative hazard function evaluated at the endpoint of the study can be expressed as

\hat{Λ} (τ_{2}) = \sum_{k = 1}^{n} \frac{δ_{k} I (z_{k} \leq τ_{2})}{\sum_{i = 1}^{n} E_{\hat{θ}} [exp {\hat{β} g (z_{k}) A_{i}^{*}} | o_{i}] I (t_{i} < z_{k} \leq z_{i})} \leq \sum_{k = 1}^{n} \frac{δ_{k} I (z_{k} \leq τ_{2})}{m \sum_{i = 1}^{n} I (t_{i} < z_{k} \leq z_{i})} \leq \frac{\sum_{k = 1}^{n} δ_{k} I (z_{k} \leq τ_{2})}{m \sum_{i = 1}^{n} I (t_{i} \leq τ_{i}) I (τ_{2} \leq z_{i})},

(A.1)

where m is the lower bound of exp{βg(Y)A*}|Y* ≥ T*, which exists under assumption C2. By the Law of Large Number and the continuous mapping theorem, we have the following two limits as n → ∞:

\begin{matrix} \frac{1}{n} \sum_{k = 1}^{n} δ_{k} I (z_{k} \leq τ_{2}) \to E (Δ I (Y \leq τ_{2})) < 1, \\ and \\ \frac{1}{\frac{1}{n} \sum_{i = 1}^{n} I (t_{i} \leq τ_{1} and τ_{2} \leq z_{i})} \to \frac{1}{P (T \leq τ_{1} and Y \geq τ_{2})} < \infty, \end{matrix}

(A.2)

where the finiteness of the second limit is following assumption C3. Therefore, there exists an upper bound of Λ̂(τ₂) even when n goes to infinity. Moreover, since the terms inside the summation in (A.1) are all strictly positive, Λ̂(τ₂) is always greater than 0. Thus Λ̂(τ₂) has been shown to be bounded almost surely as n → ∞.

STEP 3. Uniform convergence of ( ${\hat{σ}}_{n}^{2}, {\hat{β}}_{n}, {\hat{Λ}}_{n}$ ) to ( $σ_{0}^{2}, β_{0}, Λ_{0}$ )

We have shown in step 2 that Λ̂(τ₂) is finite, combining with the fact that Λ̂ is a right-continuous and nondecreasing step function along with the Helly selection theorem, there exists a subsequence of Λ̂ converging pointwisely to a right continuous and monotone function Λ* with probability 1. Moreover, by the Balzonno-Weierstrass theorem, there is a sub-subsequence of γ̂ which converges to some γ*. Therefore, there exists a sub-subsequence of θ̂_n, denoted by θ̂_η(n), that converges to θ* = (γ*, Λ*). We next show that $θ^{*} = (α_{0}^{*}, σ_{0}^{2}, β_{0}, Λ_{0})$ , where $α_{0}^{*}$ is the limit of α̂. Here a new term, defined as

{\bar{Λ}}_{n} (t) = \frac{1}{n} \sum_{k = 1}^{n} \frac{δ_{k} I (z_{k} \leq t)}{\frac{1}{n} \sum_{i = 1}^{n} E_{θ_{0}} [exp {β_{0} g (z_{k}) A_{i}^{*}} | o_{i}] I (t_{i} < z_{k} \leq z_{i})},

is introduced to serve as a bridge between Λ̂_n and Λ₀.

We first show the convergence of Λ̄_n to Λ₀ as follows. We will use a property that the class of all functions from a closed set to ℝ, which are uniformly bounded and of bounded variation, is Glivenko-Cantelli. Consider the denominator of Λ̄_n. The assumptions imply that functions of the form u → E_θ₀[exp{β₀g(u)A*}I(T < u ≤ Y)|o], where o denotes the observed data of a subject, are uniformly bounded and of bounded variation, so the class of these functions is Glivenko-Cantelli. Therefore,

\frac{1}{n} \sum_{i = 1}^{n} E_{θ_{0}} [exp {β_{0} g (u) A_{i}^{*}} | o_{i}] I (t_{i} < u \leq z_{i}) \to E_{θ_{0}} [exp {β_{0} g (u) A^{*}] I (T < u \leq Y) | Y^{*} \geq T^{*}}

(A.3)

uniformly on [τ₁, τ₂]. Along with the assumption C4, the uniform convergence of the inverse of the right-hand side to the inverse of the left-hand side in (A.3) holds. Moreover, uniform boundedness and bounded variation of functions t → ΔI(Y ≤ t) imply the Glivenko-Cantelli property of the class consisting of them. Thus, we also have

\frac{1}{n} \sum_{i = 1}^{n} Δ_{i} I (Y_{i} < t) \to E_{θ_{0}} [Δ I (Y < t)]

(A.4)

uniformly on [τ₁, τ₂]. Since $Λ_{0} (t) = E [\frac{Δ I (Y \leq t)}{E {exp [β_{0} g (Y) A^{*}] I (T \leq u \leq Y) | Y^{*} \geq T^{*}} | u = Y}]$ , combining the convergence of the inverse of both sides in (A.3) and (A.4), we obtain Λ̄_n converges uniformly to Λ₀ on [τ1, τ₂]. By considering the uniform convergence of the ratio of Λ̂{u}/Λ̄{u} to dΛ*(u)/dΛ₀(u) for u ∈ [τ₁, τ₂], as demonstrated on pages 2146–2147 in Zeng and Cai (2005), the uniform convergence of Λ̂ to Λ* is established. The remaining task is to prove the equivalence of θ* = (γ*, Λ*) and $θ_{0}^{*} = (β_{0}, σ^{0}, α^{*}, Λ_{0})$ . This can be done by considering the empirical mean of the distance between $l_{i}^{m} ({\hat{θ}}_{n})$ and $l_{i}^{m} (β_{0}, σ^{0}, α^{*}, {\bar{Λ}}_{n})$ and demonstrating that $E_{θ_{0}^{*}} [l_{m} (θ^{*}) / l_{m} (θ_{0}^{*})] = 0$ almost surely as shown on page 910 in Dupuy, Grama and Mesbah (2006). Thus (σ̂², β̂, Λ̂₀) converges uniformly to ( $σ_{0}^{2}, β_{0}, Λ_{0}$ ).

A.3. Proof of asymptotic normality of the NPMMLE

We will apply Theorem 3.3.1 in van der Vaart and Wellner (1996) to prove the asymptotic normality of the NPMMLE (γ̂, Λ̂). The proof consists of four steps to verify each of the four conditions in their theorem.

STEP 1. Fréchet differentiability of the score functions

For notation simplification, the parameter σ² will be combined with α into γ₁ = (σ², α) so that the single parameter γ₁ denotes the parameter of the measurement error ε and the latent random variable A*. Thus, the new parameter vector is θ = (γ₁, β, Λ).

Consider a one-dimensional submodel along the direction (h₁, h₂, h₃) of the form

θ_{t} = (γ_{1} + {th}_{1}, β + {th}_{2}, Λ_{t} (h_{3})),

where

Λ_{t} (h_{3}) (\cdot) = \int_{0}^{\cdot} (1 + {th}_{3} (u)) d Λ (u),

h₁ ∈ ℝ^d, h₂ ∈ ℝ, and h₃ is a bounded-variation function on [0, τ₂]. Let H = {h = (h₁, h₂, h₃)} and H_p = {h ∈ H :‖h₁‖, |h₂|, ‖h₃‖_υ ≤ p}. The notation ‖·‖_υ denotes the absolute value evaluated at 0 plus the total variation of the argument. The imputed log-likelihood contributed by the ith subject evaluated at θ, given the current value of parameter denoted as θ̃, is denoted by l_θ̃,i (θ). The corresponding score function of the local parameter t is

\frac{\partial}{\partial t} l_{\tilde{θ}, i} (θ_{t}) = h_{2} E_{\tilde{θ}} [δ_{i} g (z_{i}) A_{i}^{*} - \int_{t_{i}}^{z_{i}} g (u) A_{i}^{*} exp {(β + {th}_{2}) g (u) A_{i}^{*}} (1 + {th}_{3} (u)) d Λ (u) | o_{i}] - E_{\tilde{θ}} [\int_{t_{i}}^{z_{i}} h_{3} (u) exp {(β + {th}_{2}) g (u) A_{i}^{*}} d Λ (u) | o_{i}] + h_{1} E_{\tilde{θ}} [\frac{\partial}{\partial t} f_{ε, A^{*}} (ε, A_{i}^{*} | γ_{1} + {th}_{1}) | o_{i}] + \frac{δ_{i} h_{3} (z_{i})}{1 + {th}_{3} (z_{i})} .

Thus the imputed score function of t contributed by the n subjects evaluated at t = 0 is

S_{n, \tilde{θ}} (θ) (h) = \frac{1}{n} \sum_{i = 1}^{n} {\frac{\partial}{\partial t} l_{\tilde{θ}, i} (θ_{t}) |}_{t = 0} = h_{1}^{T} S_{n, \tilde{θ}, 1} (θ) + h_{2} S_{n, \tilde{θ}, 2} (θ) + S_{n, \tilde{θ}, 3} (θ) (h_{3}),

(A.5)

where

S_{n, \tilde{θ}, 1} (θ) = \frac{1}{n} \sum_{i = 1}^{n} E_{\tilde{θ}} [\frac{\partial}{\partial γ_{1}} log f_{ε, A^{*}} (ε_{i}, A_{i}^{*} | γ_{1}) | o_{i}],

S_{n, \tilde{θ}, 2} (θ) = \frac{1}{n} \sum_{i = 1}^{n} E_{\tilde{θ}} [δ_{i} g (z_{i}) A_{i}^{*} - \int_{t_{i}}^{z_{i}} g (u) A_{i}^{*} exp {β g (u) A_{i}^{*}} d Λ u | o_{i}],

S_{n, \tilde{θ}, 3} (θ) (h_{3}) = \frac{1}{n} \sum_{i = 1}^{n} {δ_{i} h_{3} (z_{i}) - E_{\tilde{θ}} [\int_{t_{i}}^{z_{i}} h_{3} (u) exp {β g (u) A_{i}^{*}} d Λ (u) | o_{i}]} .

By defining $θ (h) = (γ_{1}, β, Λ) (h_{1}, h_{2}, h_{3}) = h_{1}^{T} γ_{1} + h_{2} β + \int_{0}^{τ 2} h_{3} (u) d Λ (u)$ , where h ∈ H_p, the parameter θ can be regarded as a functional on H_p, the parameter space Θ = {θ} is a subspace of L^∞(H_p) and the score in (A.5) is a random map from Θ to a Banach space which contains functions (operations) of h.

Besides the above imputed score, we also need the mean imputed score function of t under the true value θ₀ and denote it as

S_{\tilde{θ}} (θ) (h) = E_{θ_{0}} [{\frac{\partial}{\partial t} l_{\tilde{θ}} (θ_{t}) |}_{t = 0}] = h_{1}^{T} S_{n \tilde{θ}, 1} (θ) + h_{2} S_{\tilde{θ}, 2} (θ) + S_{\tilde{θ}, 3} (θ) (h_{3}),

where

S_{\tilde{θ}, 1} (θ) = E_{θ_{0}} {E_{\tilde{θ}} [\frac{\partial}{\partial γ_{1}} log f_{ε, A^{*}} (ε_{i}, A_{i}^{*} | γ_{1}) | o_{i}]},

S_{\tilde{θ}, 2} (θ) = E_{θ_{0}} {E_{\tilde{θ}} [Δ_{i} g (Y_{i}) A_{i}^{*} - \int_{T_{i}}^{Y_{i}} g (u) A_{i}^{*} exp {β g (u) A_{i}^{*}} d Λ (u) | o_{i}]},

S_{\tilde{θ}, 3} (θ) (h_{3}) = E_{θ_{0}} {Δ_{i} h_{3} (Y_{i}) - E_{\tilde{θ}} [\int_{T_{i}}^{Y_{i}} h_{3} (u) exp {β g (u) A_{i}^{*}} d Λ (u) | o_{i}]} .

To prove the Fréchet differentiability of the map, $θ \to S_{θ_{0}^{*}} (θ)$ at $θ_{0}^{*}$ , where $θ_{0}^{*} = (γ_{10}^{*}, β_{0}, Λ_{0})$ with $γ_{10}^{*} = (σ_{0}^{2}, α^{*})$ , we need to calculate the corresponding derivative. First, we introduce a notation $\nabla_{θ} S_{\tilde{θ}} (θ_{0}^{*}) = {\frac{\partial}{\partial t} S_{\tilde{θ}} (θ_{0}^{*} + t θ) |}_{t = 0}$ , where $θ_{0}^{*} + t θ = (α_{0} + t α, β_{0} + t β, Λ_{0} (\cdot) + t Λ (\cdot))$ . Then,

\nabla_{θ} S_{\tilde{θ}} (θ_{0}^{*}) (h) = {\frac{\partial}{\partial t} S_{\tilde{θ}} (θ_{0}^{*} + t θ) (h) |}_{t = 0} = {\frac{\partial}{\partial t} E_{θ_{0}^{*}} {h_{1}^{T} E_{\tilde{θ}} [\frac{\partial}{\partial (γ_{10}^{*} + t γ_{1})} log f_{ε, A^{*}} (ε_{i}, A_{i}^{*} | γ_{10}^{*} + t γ_{1}) | o_{i}] + h_{2} E_{\tilde{θ}} [Δ_{i} g (Y_{i}) A_{i}^{*} - \int_{T_{i}}^{Y_{i}} g (u) A_{i}^{*} exp {β_{0} + t β g (u) A_{i}^{*}} (d Λ_{0} (u) + td Λ (u)) | o_{i}] + Δ_{i} h_{3} (Y_{i}) - E_{\tilde{θ}} [\int_{T_{i}}^{Y_{i}} h_{3} (u) exp {(β_{0} + t β) g (u) A_{i}^{*}} (d Λ_{0} (u) + td Λ (u)) | o_{i}]} |}_{t = 0} .

(A.6)

Using the chain rule, equation (A.6) can be simplified as

- γ_{1}^{T} σ_{\tilde{θ}, 1} (h) - β σ_{\tilde{θ}, 2} (h) - \int_{0}^{τ_{2}} σ_{\tilde{θ}, 3} (h) (u) d Λ (u),

where

σ_{\tilde{θ}, 1} (h) = - E_{θ_{0}^{*}} {h_{1}^{T} E_{\tilde{θ}} [\frac{\partial^{2}}{\partial γ_{1} \partial γ_{1}^{T}} log f_{ε, A^{*}} (ε_{i}, A_{i}^{*} | γ_{10}^{*}) | o_{i}]},

(A.7)

σ_{\tilde{θ}, 2} (h) = E_{θ_{0}^{*}} {E_{\tilde{θ}} [\int_{0}^{τ_{2}} [h_{2} g (u) A_{i}^{*} + h_{3} (u)] g (u) A_{i}^{*} exp {β_{0} g (u) A_{i}^{*}} I (T_{i} < u \leq Y_{i}) d Λ_{0} (u) | o_{i}]},

(A.8)

σ_{\tilde{θ}, 3} (h) (u) = E_{θ_{0}^{*}} {E_{\tilde{θ}} [[h_{2} g (u) A_{i}^{*} + h_{3} (u)] exp {β_{0} g (u) A_{i}^{*}} I (T_{i} < u \leq Y_{i}) | o_{i}]} .

(A.9)

Evaluating (A.6) at the true value $θ_{0}^{*}$ leads to

\nabla_{θ} S_{θ_{0}^{*}} (θ_{0}^{*}) (h) = - γ_{1}^{T} σ_{θ_{0}^{*}, 1} (h) - β σ_{θ_{0}^{*}, 2} (h) - \int_{0}^{τ_{2}} σ_{θ_{0}^{*}, 3} (h) (u) d Λ (u),

(A.10)

where each of the σ-function has similar forms as the corresponding function listed in (A.7), (A.8), or (A.9) with the double expectation $E_{θ_{0}^{*}} {E_{\tilde{θ}} {[\cdot | o_{i}]}$ replaced by $E_{θ_{0}^{*}} {\cdot}$ . Now apply the Taylor expansion of $exp {(β_{0} + t β) g (u) A_{i}^{*}}$ at t = 0, to get

S_{θ_{0}^{*}} (θ_{0}^{*} + t θ) - S_{θ_{0}^{*}} (θ_{0}^{*}) - \nabla_{θ} S_{θ_{0}^{*}} (θ_{0}^{*}) = o (t),

where the small-o function does not depend on θ. Therefore,

\frac{{‖ S_{θ_{0}^{*}} (θ_{0}^{*} + t θ) - S_{θ_{0}^{*}} (θ_{0}^{*}) - \nabla_{θ} S_{θ_{0}^{*}} (θ_{0}^{*}) ‖}_{p}}{t} \to 0, as t \to 0

uniformly in θ = (γ₁, β, Λ). Thus the Fréchet derivative of the mapping $θ \to S_{θ_{0}^{*}} (θ)$ evaluated at $θ_{0}^{*}$ takes the form (A.10). We will use the notation ${\dot{S}}_{θ_{0}^{*}} (θ_{0}^{*}) (θ)$ to denote it.

STEP 2. Continuous invertibility of ${\dot{S}}_{θ_{0}^{}} (θ_{0}^{}) (θ)$

The continuous invertibility of the Fréchet derivative can be established by showing that there exists some number c > 0 such that

inf_{θ \in lin Θ} \frac{{‖ {\dot{S}}_{θ_{0}^{*}} (θ_{0}^{*}) ‖}_{l^{\infty} (H)}}{{‖ θ ‖}_{l^{\infty} (H)}} > c .

(A.11)

Since ${\dot{S}}_{θ_{0}^{*}} (θ_{0}^{*}) (θ)$ can be expressed as a linear combination of the three σ-operators according to (A.6), it is necessary to check the continuous invertibility of those σ-operators. The proof is similar to the arguments in the Appendix of Zeng and Cai (2005). Through the continuous invertibility of $σ_{θ_{0}^{*}}$ , the lower bound c can be found as $\frac{q}{3 p}$ , where q satisfies $σ_{θ_{0}^{*}}^{- 1} (H_{q}) \subseteq H_{p}$ . Details to find the lower bound are analogous to the approach in Dupuy, Grama and Mesbah (2006) (page 915). Thus the derivative ${\dot{S}}_{θ_{0}^{*}} (θ_{0}^{*})$ is continuously invertible.

STEP 3. Convergence in distribution to a tight element

In this step, the convergence of $\sqrt{n} (S_{n, {\hat{θ}}_{n}} - S_{θ_{0}^{*}}) (θ_{0}^{*})$ in distribution will be demonstrated. Since $S_{θ_{0}^{*}} (θ_{0}^{*})$ is the mean of the score function evaluated at the true value of θ, it is equal to zero. Then

[S_{n, {\hat{θ}}_{n}} - S_{θ_{0}^{*}}] (θ_{0}^{*}) (h) = \frac{1}{n} \sum_{i = 1}^{n} [D_{i, 1} (h) + D_{i, 2} (h) + + δ_{i} h_{3} (y_{i}) + D_{i, 3} (h)],

where

D_{i, 1} (h) = h_{1}^{T} E_{{\hat{θ}}_{n}} [\frac{\partial}{\partial α} log f_{ε, A^{*}} (ε_{i}, A_{i}^{*} | γ_{10}^{*} | o_{i})],

D_{i, 2} (h) = h_{2} E_{{\hat{θ}}_{n}} [δ_{i} g (z_{i}) A_{i}^{*} - \int_{t_{i}}^{z_{i}} g (u) A_{i}^{*} exp {β_{0} g (u) A_{i}^{*}} d Λ_{0} u | o_{i}],

D_{i, 3} (h) = - E_{{\hat{θ}}_{n}} [\int_{t_{i}}^{z_{i}} h_{3} (u) exp {β_{0} g (u) A_{i}^{*}} d Λ_{0} (u) | o_{i}]} .

The class ${\frac{1}{n} \sum (D_{i, 1} + D_{i, 2}) (h) : ‖ h_{1} ‖ + | h_{2} | \leq p}$ is bounded Donsker, since it is a finite dimensional class of measurable score functions. Moreover, since any class of real-valued functions on [0, τ₂] that are uniformly bounded and bounded in variation is Donsker, the class {δh₃(y) : h₃ ∈ BV_p} is Donsker. The Donsker property of the class ${\frac{1}{n} \sum (D_{i, 3} (h) : h_{3} \in {BV}_{p}}$ also follows from this fact. We have thus shown that the class ${[S_{n, {\hat{θ}}_{n}} - S_{θ_{0}^{*}}] (θ_{0}^{*}) (h) : ‖ h_{1} ‖ + | h_{2} | \leq p, h_{3} \in {BV}_{p}}$ is Donsker, since the sum of bounded Donsker classes is also Donsker. This implies

\sqrt{n} (S_{n, {\hat{θ}}_{n}} - S_{θ_{0}^{*}}) (θ_{0}^{*}) \overset{D}{\to} Z,

a tight Gaussian process in l_∞(H_p).

STEP 4. Verification of conditions 1 and 4

Condition 4 holds by the consistency of the estimator θ̂_n. Condition 1 can be verified by considering the Donsker property of the class ${S_{\cdot, θ} (θ) (h) - S_{\cdot, θ_{0}^{*}} (θ_{0}^{*}) (h) : {‖ θ - θ_{0}^{*} ‖}_{p} < ν, h \in H_{p}}$ for some ν > 0, where S_·,θ(θ)(h) is the general form of ${S_{i, θ} (θ) (h) = \frac{\partial}{\partial t} l_{θ, i} (θ_{t}) |}_{t = 0}$ . We omit the details since they are similar to those for the case of right-censored data, considered in Zeng and Cai (2005).

We have verified the four conditions needed for the asymptotic distribution of the NPMMLE θ̂_n, and therefore

\sqrt{n} ({\hat{θ}}_{n} - θ_{0}^{*}) \overset{D}{\to} - {\dot{S}}_{θ_{0}^{*}} (θ_{0}^{*}) Z,

as n → ∞.

using the form of the Fréchet derivative in (A.6), one finds that there exists a linear operator $σ = (σ_{θ_{0}^{*}, 1}, σ_{θ_{0}^{*}, 2}, σ_{θ_{0}^{*}, 3})$ that maps H_p to ℝ^d+1 × BV_p, such that

{\dot{S}}_{θ_{0}^{*}} (θ_{0}^{*}) (θ_{1} - θ_{2}) (h) = - {(γ_{11} - γ_{12})}^{T} σ_{θ_{0}^{*}, 1} (h) - (β_{1} - β_{2}) σ_{θ_{0}^{*}, 2} (h) - \int_{0}^{τ 2} σ_{θ_{0}^{*}, 3} (h) (u) d (Λ_{1} - Λ_{2}) (u) .

The continuous invertibility of the σ operator has been shown already, so its inverse operator, denoted by σ⁻¹, exists. Since

\sqrt{n} {\dot{S}}_{θ_{0}^{*}} (θ_{0}^{*}) ({\hat{γ}}_{1} - γ_{10}^{*}, \hat{β} - β_{0}, \hat{Λ} - Λ_{0}) (h) = \sqrt{n} {S_{n, θ_{0}^{*}} (h) - S_{θ_{0}^{*}} (θ_{0}^{*}) (h)} + o_{p} (1),

by applying the inverse operator σ⁻¹ on both sides we obtain that

\sqrt{n} {- {({\hat{γ}}_{1} - γ_{10}^{*})}^{T} h_{1} - (\hat{β} - β_{0}) h_{2} - \int_{0}^{τ 2} h_{3} (u) d (\hat{Λ} - Λ_{0}) (u)} = \sqrt{n} {S_{n, θ_{0}^{*}} (\tilde{h}) - S_{θ_{0}^{*}} (θ_{0}^{*}) (\tilde{h})} + o_{p} (1),

(A.12)

where h̃ = (h̃₁, h̃₂, h̃₃) = σ⁻¹(h). If h₁ and h₃ in (A.12) are chosen to be 0, then this reduces to

\sqrt{n} {- (\hat{β} - β_{0}) h_{2}} = \sqrt{n} {S_{n, θ_{0}^{*}} (\tilde{h}) - S_{θ_{0}^{*}} (θ_{0}^{*}) (\tilde{h})} + o_{p} (1),

where the latter term is in the form of linear combinations of score functions for the parameters. Since score functions derived from the modified likelihood is asymptotically equivalent to those from the full likelihood by Lemma 1, the influence function is the same as the efficient influence function for β₀h₂ by its uniqueness in the linear span of the scores. Thus the estimator β̂ is efficient for β₀.

References

Andersen PK, Borgan O, Gill RD, Keiding N. Statistical models based on counting processes. Springer - Verlag; 1993. [Google Scholar]
Cox DR. Regression models and life tables (with discussion) Journal of the Royal Statistical Society Series B - Statistical Methodology. 1972;34:187–220. [Google Scholar]
Dafni UG, Tsiatis AA. Evaluating surrogate markers of clinical outcome measured with error. Biometrics. 1998;54:1445–62. [PubMed] [Google Scholar]
DeGruttola V, Tu X. Modeling progression of CD4-lymphocyte count and its relationship to survival time. Biometrics. 1994;50:1003–1014. [PubMed] [Google Scholar]
Dupuy JF, Grama I, Mesbah M. Asymptotic theory for the cox model with missing time-depedent covariate. Annals of Statistics. 2006;34:903–924. [Google Scholar]
Henderson R, Diggle P, Dobson A. Joint modelling of longitudinal measurements and event time data. Biostatistics. 2000;1:465–480. doi: 10.1093/biostatistics/1.4.465. [DOI] [PubMed] [Google Scholar]
Hsieh F, Tseng YK, Wang JL. Joint Modelling of Survival and Longitudinal Data Likelihood Approach Revisited. Biometrics. 2006;62:1037–1043. doi: 10.1111/j.1541-0420.2006.00570.x. [DOI] [PubMed] [Google Scholar]
Klein JP, Moeschberger ML. Survival analysis: techniques for censored and truncated data. Springer; 2003. [Google Scholar]
Laird NM, Ware JH. Random-effects Models for longitudinal data. Biometrics. 1982;38:963–974. [PubMed] [Google Scholar]
Louis TA. Finding the observed information matrix when using the em algorithm. Journal of the Royal Statistical Society Series B - Statistical Methodology. 1982;44:226–233. [Google Scholar]
Lynden-Bell D. A method of allowing for known observational selection in small samples applied to 3CR quasars. Monthly Notices of the Royal Astronomy Society. 1971;155:95–118. [Google Scholar]
Murphy SA, van der Vaart AW. On profile likelihood. Journal of American Statistical Association. 2000;95:449–485. [Google Scholar]
Rezza G, Lazzarin A, Angarano G, Sinicco A, Pristerá R, Tirelli U, Salassa B, Ricchi E, Aiuti F, Menniti-lppolito F. Tje natural history of HIV infection in intravenous drug users: risk of disease progression in a cohort of serconverters. AIDS. 1989;3:87–90. doi: 10.1097/00002030-198902000-00006. [DOI] [PubMed] [Google Scholar]
Song X, Davidian M, Tsiatis AA. A semiparametric likelihood approach to joint modeling of longitudinal and time-to-event data. Biometrics. 2002;58:742–753. doi: 10.1111/j.0006-341x.2002.00742.x. [DOI] [PubMed] [Google Scholar]
The-Italian-Seroconversion-Study. Disease progression and early predictors of AIDS in HIV-seroconverted injecting drug users. AIDS. 1992;6:421–426. [PubMed] [Google Scholar]
Tseng YK, Hsieh F, Wang JL. Joint modelling of accelerated failure time and longitudinal data. Biometrika. 2005;92:587–603. [Google Scholar]
Tsiatis AA, Davidian M. Joint modeling of longitudinal and time-to-event data: An overview. Statistica Sinica. 2004;14:809–834. [Google Scholar]
Tsiatis AA, DeGruttola V, Wulfsohn M. Modeling the relationship of survival to longitudinal data measured with error: Applications to survival and CD4 counts in patients with AIDS. Journal of American Statistical Association. 1995;90:23–37. [Google Scholar]
van der Vaart AW, Wellner JA. Weak convergence and empirical processes. Springer; New York: 1996. [Google Scholar]
Vardi Y. Empirical distributions in selection bias models. Annals of Statistics. 1985;13:178–203. [Google Scholar]
Wang MC. Product limit estimates: a generalized maximum likelihood study. Communications in Statistics - Theory and Methods. 1987;16:3117–3132. [Google Scholar]
Wang CY. Corrected score estimator for joint modeling of longitudinal and failure time data. Statistica Sinica. 2006;16:235–253. [Google Scholar]
Woodroofe M. Estimating a distribution function with truncated data. Annals of Statistics. 1985;13:163–177. [Google Scholar]
Wulfsohn MS, Tsiatis AA. A joint model for survival and longitudinal data measured with error. Biometrics. 1997;53:330–339. [PubMed] [Google Scholar]
Zeng D, Cai J. Asymptotic Results for maximum likelihood estimators in joint analysis of repeated measurements and survival time. Annals of Statistics. 2005;33:2132–2163. [Google Scholar]

[R1] Andersen PK, Borgan O, Gill RD, Keiding N. Statistical models based on counting processes. Springer - Verlag; 1993. [Google Scholar]

[R2] Cox DR. Regression models and life tables (with discussion) Journal of the Royal Statistical Society Series B - Statistical Methodology. 1972;34:187–220. [Google Scholar]

[R3] Dafni UG, Tsiatis AA. Evaluating surrogate markers of clinical outcome measured with error. Biometrics. 1998;54:1445–62. [PubMed] [Google Scholar]

[R4] DeGruttola V, Tu X. Modeling progression of CD4-lymphocyte count and its relationship to survival time. Biometrics. 1994;50:1003–1014. [PubMed] [Google Scholar]

[R5] Dupuy JF, Grama I, Mesbah M. Asymptotic theory for the cox model with missing time-depedent covariate. Annals of Statistics. 2006;34:903–924. [Google Scholar]

[R6] Henderson R, Diggle P, Dobson A. Joint modelling of longitudinal measurements and event time data. Biostatistics. 2000;1:465–480. doi: 10.1093/biostatistics/1.4.465. [DOI] [PubMed] [Google Scholar]

[R7] Hsieh F, Tseng YK, Wang JL. Joint Modelling of Survival and Longitudinal Data Likelihood Approach Revisited. Biometrics. 2006;62:1037–1043. doi: 10.1111/j.1541-0420.2006.00570.x. [DOI] [PubMed] [Google Scholar]

[R8] Klein JP, Moeschberger ML. Survival analysis: techniques for censored and truncated data. Springer; 2003. [Google Scholar]

[R9] Laird NM, Ware JH. Random-effects Models for longitudinal data. Biometrics. 1982;38:963–974. [PubMed] [Google Scholar]

[R10] Louis TA. Finding the observed information matrix when using the em algorithm. Journal of the Royal Statistical Society Series B - Statistical Methodology. 1982;44:226–233. [Google Scholar]

[R11] Lynden-Bell D. A method of allowing for known observational selection in small samples applied to 3CR quasars. Monthly Notices of the Royal Astronomy Society. 1971;155:95–118. [Google Scholar]

[R12] Murphy SA, van der Vaart AW. On profile likelihood. Journal of American Statistical Association. 2000;95:449–485. [Google Scholar]

[R13] Rezza G, Lazzarin A, Angarano G, Sinicco A, Pristerá R, Tirelli U, Salassa B, Ricchi E, Aiuti F, Menniti-lppolito F. Tje natural history of HIV infection in intravenous drug users: risk of disease progression in a cohort of serconverters. AIDS. 1989;3:87–90. doi: 10.1097/00002030-198902000-00006. [DOI] [PubMed] [Google Scholar]

[R14] Song X, Davidian M, Tsiatis AA. A semiparametric likelihood approach to joint modeling of longitudinal and time-to-event data. Biometrics. 2002;58:742–753. doi: 10.1111/j.0006-341x.2002.00742.x. [DOI] [PubMed] [Google Scholar]

[R15] The-Italian-Seroconversion-Study. Disease progression and early predictors of AIDS in HIV-seroconverted injecting drug users. AIDS. 1992;6:421–426. [PubMed] [Google Scholar]

[R16] Tseng YK, Hsieh F, Wang JL. Joint modelling of accelerated failure time and longitudinal data. Biometrika. 2005;92:587–603. [Google Scholar]

[R17] Tsiatis AA, Davidian M. Joint modeling of longitudinal and time-to-event data: An overview. Statistica Sinica. 2004;14:809–834. [Google Scholar]

[R18] Tsiatis AA, DeGruttola V, Wulfsohn M. Modeling the relationship of survival to longitudinal data measured with error: Applications to survival and CD4 counts in patients with AIDS. Journal of American Statistical Association. 1995;90:23–37. [Google Scholar]

[R19] van der Vaart AW, Wellner JA. Weak convergence and empirical processes. Springer; New York: 1996. [Google Scholar]

[R20] Vardi Y. Empirical distributions in selection bias models. Annals of Statistics. 1985;13:178–203. [Google Scholar]

[R21] Wang MC. Product limit estimates: a generalized maximum likelihood study. Communications in Statistics - Theory and Methods. 1987;16:3117–3132. [Google Scholar]

[R22] Wang CY. Corrected score estimator for joint modeling of longitudinal and failure time data. Statistica Sinica. 2006;16:235–253. [Google Scholar]

[R23] Woodroofe M. Estimating a distribution function with truncated data. Annals of Statistics. 1985;13:163–177. [Google Scholar]

[R24] Wulfsohn MS, Tsiatis AA. A joint model for survival and longitudinal data measured with error. Biometrics. 1997;53:330–339. [PubMed] [Google Scholar]

[R25] Zeng D, Cai J. Asymptotic Results for maximum likelihood estimators in joint analysis of repeated measurements and survival time. Annals of Statistics. 2005;33:2132–2163. [Google Scholar]

PERMALINK

MODELING LEFT-TRUNCATED AND RIGHT-CENSORED SURVIVAL DATA WITH LONGITUDINAL COVARIATES

Yu-Ru Su, PhD

Jane-Ling Wang

Abstract

1. Introduction

2. Joint modeling under LTRC

2.1. The Joint Models

2.2. A Modified Likelihood Approach

3. EM-algorithm and asymptotic properties

Theorem 1

Theorem 2

4. Simulation Study

Table 1.

Table 2.

5. Data example: multi-center HIV study

6. Conclusions and discussion

Acknowledgments

Appendix

A.1. Likelihood and the score equations

Lemma 1

Proof

A.2. Proof of the consistency of the NPMMLE

STEP 1. Existence of the NPMMLE of (γ, Λ)

STEP 2. Almost surely boundedness of Λ̂(τ₂) as n → ∞

STEP 3. Uniform convergence of ( ${\hat{σ}}_{n}^{2}, {\hat{β}}_{n}, {\hat{Λ}}_{n}$ ) to ( $σ_{0}^{2}, β_{0}, Λ_{0}$ )

A.3. Proof of asymptotic normality of the NPMMLE

STEP 1. Fréchet differentiability of the score functions

STEP 2. Continuous invertibility of ${\dot{S}}_{θ_{0}^{}} (θ_{0}^{}) (θ)$

STEP 3. Convergence in distribution to a tight element

STEP 4. Verification of conditions 1 and 4

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

MODELING LEFT-TRUNCATED AND RIGHT-CENSORED SURVIVAL DATA WITH LONGITUDINAL COVARIATES

Yu-Ru Su, PhD

Jane-Ling Wang

Abstract

1. Introduction

2. Joint modeling under LTRC

2.1. The Joint Models

2.2. A Modified Likelihood Approach

3. EM-algorithm and asymptotic properties

Theorem 1

Theorem 2

4. Simulation Study

Table 1.

Table 2.

5. Data example: multi-center HIV study

6. Conclusions and discussion

Acknowledgments

Appendix

A.1. Likelihood and the score equations

Lemma 1

Proof

A.2. Proof of the consistency of the NPMMLE

STEP 1. Existence of the NPMMLE of (γ, Λ)

STEP 2. Almost surely boundedness of Λ̂(τ2) as n → ∞

STEP 3. Uniform convergence of ( σ^n2,β^n,Λ^n) to ( σ02,β0,Λ0)

A.3. Proof of asymptotic normality of the NPMMLE

STEP 1. Fréchet differentiability of the score functions

STEP 2. Continuous invertibility of S˙θ0∗(θ0∗)(θ)

STEP 3. Convergence in distribution to a tight element

STEP 4. Verification of conditions 1 and 4

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

STEP 2. Almost surely boundedness of Λ̂(τ₂) as n → ∞

STEP 3. Uniform convergence of ( ${\hat{σ}}_{n}^{2}, {\hat{β}}_{n}, {\hat{Λ}}_{n}$ ) to ( $σ_{0}^{2}, β_{0}, Λ_{0}$ )

STEP 2. Continuous invertibility of ${\dot{S}}_{θ_{0}^{}} (θ_{0}^{}) (θ)$