Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2019 Mar 6.
Published in final edited form as: J Am Stat Assoc. 2017 Nov 13;113(521):357–368. doi: 10.1080/01621459.2016.1255637

Conditional modeling of longitudinal data with terminal event

Shengchun Kong 1, Bin Nan 2, John D Kalbfleisch 3, Rajiv Saran 4, Richard Hirth 5,
PMCID: PMC6402357  NIHMSID: NIHMS1502792  PMID: 30853735

Abstract

We consider a random effects model for longitudinal data with the occurrence of an informative terminal event that is subject to right censoring. Existing methods for analyzing such data include the joint modeling approach using latent frailty and the marginal estimating equation approach using inverse probability weighting; in both cases the effect of the terminal event on the response variable is not explicit and thus not easily interpreted. In contrast, we treat the terminal event time as a covariate in a conditional model for the longitudinal data, which provides a straight-forward interpretation while keeping the usual relationship of interest between the longitudinally measured response variable and covariates for times that are far from the terminal event. A two-stage semiparametric likelihood-based approach is proposed for estimating the regression parameters; first, the conditional distribution of the right-censored terminal event time given other covariates is estimated and then the likelihood function for the longitudinal event given the terminal event and other regression parameters is maximized. The method is illustrated by numerical simulations and by analyzing medical cost data for patients with end-stage renal disease. Desirable asymptotic properties are provided.

Keywords: Cox regression, Empirical process, Mixed effects model, Pseudo-maximum likelihood estimation

1 Introduction

In longitudinal studies, the collection of information can be stopped at the end of the study, at the time of dropout of a study participant, or at the time of a terminal event. Death, the most common terminal event, often occurs in cohort studies of older populations and in fatal disease follow-up studies, e.g., organ failure or cancer studies. Other types of terminal events also exist, for example, the final menstrual period is a terminal event for menstrual cycle data.

The current literature has primarily focused on modeling the longitudinally measured response variable and covariates given that the terminal event has not yet happened; see e.g. Tsiatis and Davidian (2004), Hsieh et al. (2006), Ding and Wang (2008), Albert and Shih (2010). If the terminal event is ignorable (Little and Rubin, 2002), then a likelihood-based estimation of regression parameters is straightforward. Oftentimes, however, the terminal event time is non-ignorable. Two types of approaches are widely used for longitudinal data analysis with non-ignorable terminal events: the joint modeling approach using latent frailty and the marginal estimating equation approach using inverse probability weighting. In the former, the relationship between the terminal event and the longitudinal data is indirectly modeled through the shared random effect. The latter approach is appropriate when the terminal event is simply censoring the observations of the longitudinal process, which is in fact continuing but unobserved; its use when the terminal event stops the longitudinal process is more controversial. Similar approaches have also been used in the context of recurrent events correlated with a terminal event; for example, see Ghosh and Lin (2002), Huang and Wang (2004), Zeng and Lin (2009), Albert and Shih (2010), Kalbfleisch et al. (2013), among many others.

These modeling strategies, however, are not as useful as one might wish for many longitudinal studies where the explicit effect of the terminal event time on the longitudinal measures is of interest. For example, medical payments in dialysis patients (Liu et al., 2007) and cancer patients (Chan and Wang, 2010) tend to increase when patients approach death; functional limitations in an aging population (Sowers et al., 2007) become more severe when people are closer to the end of life; and menstrual cycles become longer and more variable when women approach menopause (Harlow et al., 2008). In these cases, a question of interest is how does the impending terminal event affect the longitudinal measures, and for this question, a model for the longitudinal event conditional on the terminal event seems particularly useful and appropriate.

In this article, we propose a random effects model for repeated measures which includes the event time as an additional (fixed effect) covariate, and thus provides a more intuitive and meaningful interpretation of the effect of the terminal event time. The proposed conditional modeling strategy keeps the usual relationship of interest between the longitudinally measured response variable and covariates when the data collection time is far from the occurrence of the terminal event, but the response variable becomes increasingly dependent on the terminal event time when the data collection time is close to the terminal event. Since the terminal event time is subject to right censoring, the regression model with the terminal event time as a covariate falls into a general framework of regression with censored covariate. For this situation, the complete case analysis by dropping observations with censored event times will be shown to be a valid estimating approach under the usual noninformative conditional independent censoring assumption for the censoring time.

We propose a semiparametric, likelihood-based approach for parameter estimation in a linear regression model with a nonlinear component for the censored covariate, that utilizes both the complete and censored data. The proposed method is shown to be consistent and asymptotically normal under a set of mild regularity conditions, and is more efficient than the complete case analysis. The proofs of the asymptotic properties rely heavily on empirical process theory. A referee drew our attention to Li et al. (2013), which has a similar aim of recovering information from censored data. We comment further on this work later in the Discussion Section.

The rest of the article is organized as follows. We describe the proposed model in Section 2 and the two-stage estimating method in Section 3. The asymptotic properties are outlined in Section 4 with proofs given in the Appendix. Section 5 contains numerical results followed by a brief discussion. Detailed technical preparations are provided in the online Supplementary Material.

2 A Nonlinear Regression Model with Mixed Effects and Censored Covariate

2.1 Complete data model with observed terminal event time

For a subject i, denote the terminal event time by Si, the baseline covariates by a vector Xi where the first element is 1, the longitudinal response by Yij, and the prespecified visit time by tij, where i = 1, ⋯, n and j = 1, ⋯, ni. For given Si, we model Yij with the following mixed effect model for longitudinal data:

Yij=Xiβ+g(Sitij,ξ)+Zibi+Ui(tij)+εij, (1)

where β is a vector of regression coefficients with length p1, bi is an independent random effects vector of length q1 associated with covariates Zi, Ui(t) is an independent stochastic processes, εij, j = 1, …, ni, are independent measurement errors, g is a known function that satisfies Condition 1 in Appendix, and ξ is a vector with length p2. The function g(t, ξ) → 0 when t → ∞ so that model (1) reduces to a simpler relationship of interest between the longitudinally measured response variable Yij and covariates Xi, when tij is distant from Si, and should become increasingly related to the terminal event when tij is close to the terminal event Si. Motivated by figure 5 in Chan and Wang (2010), we can choose g(St, ξ) to be a normal kernel where g(St, ξ) = ξ1e−(Stξ2)2ξ3, ξ = (ξ1, ξ2). Other examples include an exponential kernel where g(t, ξ) = ξ1e−(tξ2).

We make the following additional assumptions: (i) bi follows a normal distribution N(0, D(φ)), where D is a positive definite matrix depending on a parameter vector φ with length q2; (ii) Ui(t) is a mean zero Gaussian process with a given covariance function cov(Ui(t1), Ui(t2)) = κ(ν, ρ; t1, t2) that depends on a parameter vector ν with length q3 and a scalar ρ; for example, Ui(t) can be the nonhomogeneous Ornstein-Uhlenbeck (NOU) process satisfying var(Ui(t)) = ν(t) with log(ν(t)) = ν0 + ν1t and corr(Ui(t1), Ui(t2)) = ρ|t1t2|; (iii) εij follows a normal distribution N(0, σ2); and (iv) bi, Ui(t), and εij are mutually independent.

For a vector t = (t1, ⋯, tm), denote g(t, ξ) = (g(t1, ξ), ⋯, g(tm, ξ))′. Let Yi = (Yi1, ⋯, Yini)′, ti = (ti1, ⋯, tini), Xi=(Xi,,Xi)p1×ni and Zi=(Zi,,Zi)q1×ni. When Si is observed, from (1) we have

fθ,ϕ(Yi|Si,Xi)=(2π)ni/2|i|1/2exp {(YiXiβg(Si1iti,ξ))i1(YiXiβg(Si1iti,ξ))/2}, (2)

where 1i = (1, ⋯, 1)′ with length ni, θ = (β, ξ)′ with length p = p1 + p2, ϕ = (φ, ν, ρ, σ2)′ with length q = q2 + q3 + 2, and Σi = ZiDZi′ + Γi + σ2Ii, where Ii is the ni × ni identity matrix and Γi is the covariance matrix of (U(ti1), ⋯, U(tini))′.

A semiparametric mixed effects model could also be considered, where g is an unknown function that can be estimated by smoothing splines. We focus on the parametric model (1) to more simply illustrate the proposed methodology.

2.2 Observed data model with potentially censored terminal event time

Let Ci be the censoring time for the ith subject. If SiCi, then Si is observed; otherwise Si is right-censored by Ci. We denote the observed time by Vi = min(Si, Ci) and the censoring indicator by Δi = 1(SiCi). Note that tijVi, for all i = 1, ⋯, n, j = 1⋯, ni. Here, we assume that Ci and (Si, Yi) are conditionally independent given Xi.

For notational simplicity, assume that the random effect Z is a sub-vector of X. For a single subject, we observe (V, Δ, Y, X). The likelihood function for the observed data (V, Δ, Y, X) can be factored into

f1(V,Δ,Y,X)=f2(V,Δ|Y,X)f3(Y|X)f4(X),

where f1 denotes the joint density of (V, Δ, Y, X), f2 denotes the conditional density of (V, Δ) given (Y, X), f3 denotes the conditional density of Y given X, and f4 denotes the marginal density of X. Since the conditional independence of C and (S, Y) given X implies that C and S are conditionally independent given (Y, X), we have

f2(V,Δ|Y,X)={fS(S|Y,X)G¯C(S|Y,X)}Δ{F¯S(C|Y,X)gC(C|Y,X)}1Δ, (3)

where fS denotes the conditional density of S given (Y, X), gC denotes the conditional density of C given (Y, X), with S and C being the corresponding conditional survival functions. Further assuming noninformative censoring, we can drop gC(C|Y, X) and C(C|Y, X). Going through conditional arguments using the Bayes’ rule and dropping f4(X), we obtain the likelihood function

L(V,Δ,Y,X)={fθ,ϕ(Y|S,X)f5(S|X)}Δ{Cfθ,ϕ(Y|s,X)dF5(s|X)}1Δ, (4)

where f5(S|X) is the conditional density of S given X, and F5(S|X) is the corresponding cumulative distribution function. In (4), only fθ,ϕ contains the parameter of interest θ and nuisance parameter ϕ, whereas F5 (or f5) is an additional nuisance parameter.

In (4), {fθ,ϕ(Y |S, X)f5(S|X)}Δ is for a subject with observed terminal event time, which yields the fully observed data likelihood, and {Cfθ,ϕ(Y|s,X)dF5(s|X)}1Δ is for a subject with censored terminal event time. In section 4, we show that the complete case analysis by dropping the second part in (4) yields a consistent and asymptotically normally distributed estimator, but is inefficient compared to an approach that also utilizes the censored data. From the second part in (4), we see that the amount of efficiency gain depends on how well we can estimate the right tail of the conditional distribution F5(s|X) beyond C. We consider a semiparametric approach that allows reliable extrapolation beyond C and is robust against any parametric assumption.

Since Ci and (Si, Yi) are conditionally independent given Xi and Ci is random, all the commonly used semiparametric models for right-censored data allow extrapolation beyond Ci. Here, we propose the most widely used Cox regression model (Cox, 1972). Other viable models include the accelerated failure time model, the additive hazard model, and the transformation model (Kalbfleisch and Prentice, 2002). Suppose the hazard function of S given X has the following form:

λ(s|X)=λ(s) exp(αX), (5)

where α is the regression parameter with an unknown true value α0, and λ(·) is the baseline hazard function. The conditional cumulative distribution function is then given by

η(s;X)F5(s|X)=1exp{Λ(s) exp(αX)},

where Λ(s)=0sλ(u)du is the cumulative baseline hazard function with an unknown true value Λ0. Note that X appears in both models (2) and (5), but these two instances may refer to different regressions. For example, X1 might be a covariate in (2) whereas X12 is a covariate in (5). The same X is used to denote all fully observed covariates for notational simplicity. The log-likelihood function then becomes

log L=Δ log fθ,ϕ(Y|S,X)+Δ log η˙(S;X)+(1Δ) log Cfθ,ϕ(Y|u,X)dη(u;X). (6)

A similar idea has been used by Lu et al. (2010), but for a different problem. Lu et al. (2010) considered longitudinal data analysis with an event time, which does not terminate the observed data.

3 The Pseudo-likelihood Method

The log likelihood function (6) involves an unknown distribution function η and the corresponding density function η̇. Hence a maximum likelihood estimate, if it exists, can be complicated. We propose a tractable two-stage pseudo-likelihood approach in which the nuisance parameters (ϕ, η) are estimated in stage 1, and the parameter of interest θ is then estimated by maximizing (6) in stage 2 with nuisance parameters replaced by their estimators obtained in stage 1 (Kong and Nan, 2016). Details are given below:

  • Stage 1. Nuisance parameter estimation. The dispersion parameter ϕ is estimated by the complete case analysis of the nonlinear regression model (2); the Cox model regression coefficient α is estimated by maximizing the partial likelihood, and the cumulative baseline hazard Λ is estimated with the Breslow estimator (Breslow, 1972). Denote the estimators by ϕ̃n, α̃n, and Λ̃n, respectively. The c.d.f η(s; X) is estimated by ηn(s;X)=1exp{Λn(s) exp(αnX)}, which is asymptotically equivalent to the product integral expression. It can be shown that all the estimates obtained in Stage 1 have desirable statistical properties. In particular, η̃n is n1/2-consistent in a finite interval, see Lemma A.3 in Supplementary Material; ϕ̃n obtained from the complete case analysis is n1/2-consistent, see Theorem 4.1 in section 4.

  • Stage 2. Pseudo-likelihood estimation of θ. Replacing (ϕ, η) by their Stage 1 estimates (ϕ̃n, η̃n) in the log likelihood function yields the following log pseudo-likelihood function for a random sample of n subjects:
    pln(θ)=1ni=1n{Δi log fθ,ϕn(Yi|Si,Xi)+(1Δi) log Cifθ,ϕn(Yi|u,Xi)dηn(u;Xi)} (7)
    Note that the term Δ log η̇ in (6) is dropped because it does not involve θ. However, if one wants to maximize the log-likelihood directly without using the two-stage approach, then this term cannot be omitted.

Let θ̂n denote the pseudo-likelihood estimator. Since it is obtained by maximizing the objective function (7), its asymptotic properties can be obtained from M-estimation theory, see van der Vaart (2002), Wellner and Zhang (2007) and Li and Nan (2011).

The estimates (η̃n, Λ̃n) are obtained using a standard package for the Cox regression model. The estimates (θ̃n, ϕ̃n) from complete case analysis are obtained by maximizing 1ni=1n{Δi log fθ,ϕ(Yi|Si,Xi)} using a Newton-Raphson algorithm, where multiple initial values are tried. The two-stage estimator θ̂n is also obtained from a Newton-Raphson algorithm with the complete case analysis estimator θ̃n as the initial value.

4 Asymptotic Properties

Let l0(θ, ϕ; Y, X, Δ, V) = Δ log fθ,ϕ (Y |S, X). This is the first part in the log-likelihood for the observed data. Then

l(θ,ϕ,η;Y,X,Δ,V)l(θ,ϕ,η(α,Λ);Y,X,Δ,V)=l0(θ,ϕ;Y,X,Δ,V)+(1Δ) log Cfθ,ϕ(Y|u,X)dη(u;X)=l0(θ,ϕ;Y,X,Δ,V)+(1Δ) log Cfθ,ϕ(Y|u,X)d[1exp{Λ(u) exp(αX)}],

which is (6) with Δ log η̇ dropped.

A set of regularity conditions is introduced in the Appendix. Some conditions are commonly assumed for the Cox regression model; other conditions are for the mixed effects model, which are easily verified for a smooth function g and the NOU process. We will use standard empirical process notation from now on. In particular, ℙn is the empirical measure and Pf = ∫ fdP for a probability measure P and a function f.

Under the conditional independent censoring assumption, the estimators from the complete case analysis and the two-stage procedure, respectively, are consistent and asymptotically normal. These results are given in the following two theorems.

Theorem 4.1. (Complete case)

Assume that C and (S, Y) are independent given X. Under Conditions 1, 2(a), and 3–5, the complete case analysis estimator (θ̃n, ϕ̃n) that maximizes ℙnl0(θ, ϕ; Y, X, Δ, V) converges in outer probability to (θ0, ϕ0); and n((θn,ϕn)(θ0,ϕ0)) converges in distribution to a mean zero normal random variable with variance J11Q1J11, where J1 and Q1 are provided in the Appendix.

Theorem 4.2. (Two-stage)

Assume that C and (S, Y) are independent given X. Under Conditions 1–8, the two-stage pseudo-likelihood estimator θ̂n that maximizes (7) converges in outer probability to θ0; and n(θ^nθ0) converges in distribution to a mean zero normal random variable with variance J21Q2J21, where J2 and Q2 are defined in the Appendix

The proof of consistency is similar to Li and Nan (2011) and van der Vaart (2002). The proof of asymptotic normality is given in the appendix, and is based on the general M-estimation theory similar to Li and Nan (2011) and Wellner and Zhang (2007). The detailed proof relies heavily on empirical process theory and is given in the Appendix.

Because the asymptotic variance of θ̂n has a very complicated expression that does not yield a simply computed estimate from the observed data, we use the bootstrap variance estimator.

5 Numerical Results

5.1 Simulations

We conduct simulations to investigate the finite sample performance of the proposed method. Simulation data sets are generated from the nonlinear model with mixed effects,

Yij=β0+β1X1i+β2X2i+γ exp{(SiTijμ)2ξ}+bi+Ui(Tij)+εij,

where β0 = 1, β1 = 1, β2 = −3, μ = 1, and γ = 4. The random effect bi ~ N(0, exp(−0.5)), the error term εij ~ N(0, exp(−0.1)), and Ui(t) is an NOU process with ν0 = 1, ν1 = −1 and ρ = exp(−1)/(1 + exp(−1)). The two fully observed covariates are X1i and X2i, where X1i ~ Bernoulli(0.5) and X2i ~ N(0, 1) truncated at ±3. The terminal event time is Si = 4 + S0i, where S0i follows an exponential distribution with conditional hazard function exp(−1 − 6X1i + 4X2i). To generate the censoring time Ci, we first generate C0i=κC0i, where C0i follows an exponential distribution with conditional hazard function exp(−3 − X1i + X2i), then set Ci = tij, where j satisfies tijC0i and tij+1 > C0i assuming tini+1 = ∞. The constant κ is chosen to yield 40% censoring. For each subject i, there are 10 scheduled visit times, and the first visit time ti1 is 0. There are two different settings to generate the subsequent visit times: (1) equally spaced time intervals with tij = j − 1, j = 2, ⋯, 10; (2) non-equally spaced time intervals with the subsequent visit times generated recursively from tij = tij−1 + min(4, Wi) for j = 2,⋯, 10, where Wi follows an exponential distribution with conditional hazard function exp(−3 − X1i + X2i). In each setting, ξ takes two different values, 1.2 and 0.2, corresponding to a flat and a sharp nonlinear predictor in the regression model, respectively.

We simulate 500 replications for each scenario with sample size 300. The biases and variances of the proposed method are compared with those of full data and complete case analyses. The full data analysis represents the case that all data are available; in other words, there is no censoring, which has more visits and serves as a benchmark. The complete case analysis simply eliminates subjects with censored terminal event time. For the proposed two-stage method, we report the 90% and 95% coverage proportions for which the variances estimators are obtained from 100 bootstrap samples. The results are presented in Tables 14.

Table 1.

Simulation results for equally spaced time interval with sharp nonlinear term. varb=boostrap variance estimator; CR=coverage rate

β0 = 1 β1 = 1 β2 = −3 μ = 1 γ = 4 ξ = 1.2
Full data bias −0.0064 0.0011 −0.0002 0.0001 −0.0032 0.0031
var 0.0801 0.0115 0.0082 0.0002 0.0140 0.0032
Two-stage bias −0.0153 0.0045 0.0003 −0.0010 −0.0030 0.0039
var 0.0973 0.0144 0.0098 0.0003 0.0166 0.0040
varb 0.1094 0.0161 0.0102 0.0003 0.0151 0.0043
90% CR 0.904 0.876 0.898 0.918 0.896 0.912
95% CR 0.966 0.944 0.960 0.972 0.952 0.946
Complete case bias −0.0092 0.0010 0.0041 −0.0008 −0.0024 0.0024
var 0.1217 0.0208 0.0130 0.0003 0.0242 0.0053

Table 4.

Simulation results for non-equally spaced time interval with at nonlinear term. varb=boostrap variance estimator; CR=coverage rate

β0 = 1 β1 = 1 β2 = −3 μ =1 γ = 4 ξ = 0.2
Full data bias −0.0141 −0.0044 0.0336 −0.0005 0.0015 0.0037
var 0.1951 0.0189 0.0789 0.0036 0.0187 0.0013
Two-stage bias −0.0371 0.0035 0.0401 −0.0055 −0.0053 0.0048
var 0.2471 0.0239 0.1019 0.0054 0.0225 0.0018
varb 0.3158 0.0227 0.1485 0.0063 0.0220 0.0019
90% CR 0.916 0.896 0.882 0.910 0.926 0.886
95% CR 0.966 0.952 0.946 0.954 0.956 0.940
Complete case bias −0.0306 −0.0133 0.0772 −0.0059 −0.0021 0.0049
var 0.4136 0.0388 0.1896 0.0078 0.0356 0.0026

The results suggest that the biases for the proposed two-stage method are minimal and comparable to both the full data analysis and the complete case analysis. From the tables, it can be seen that the proposed method is much more efficient than the complete case analysis, and the bootstrap method performs well in estimating the variance, yielding reasonable coverage rates for all the scenarios.

We run additional simulations to further investigate the impact of survival model misspecification in (5). The results are provided in the Supplementary Material, where it is shown that misspecification of the Cox regression model can yield biased results, and that the bias increases as the severity of misspecification of (5) grows; this indicates the importance of model-checking before implementing the proposed two-stage method.

5.2 End-stage renal disease

We consider data on inpatient hospital costs of patients with end-stage renal disease (ESRD) as reported in an analysis file provided by the United States Renal Data System (USRDS); this provides an illustrative example of longitudinal data with a terminal event. These costs are of substantial interest, since Medicare paid about $10.5 billion in 2012 for inpatient costs (USRDS 2014 annual report). We focus on the monthly inpatient costs paid by Medicare; these costs are terminated by the occurrence of death, and Chan and Wang (2010) and Liu et al. (2007) suggested that the medical payment pattern changes when patients approach death. We explore this issue taking account of patient level covariates.

For illustrative purposes, we selected a 2% random sample of the white and black patients whose service started in the calendar year 2007, and who were 65 years or older at baseline. The average age at baseline was 76.1 and follow-up ended on December 31st, 2010. Of the 840 patients selected for analysis, 65.5% died during the follow-up period. Others were censored through loss to follow up or at the end of the study. The average follow-up time for medical payment was 23.4 months. For convenience, we assume that the inpatient cost rate is constant within each hospitalization. For example, if a hospitalization starts from April 21st and ends on May 10th with the amount $3,000, then the Medical payment is $2,000 for April and $1,000 for May. Usually the month of death is shorter than other months. For example, a death on April 15th only has a half month to accrue spending. We consider “spending rate” for the month in which death occurs. For example, for a death on April 15th with the April Medical payment amount $3,000, we scale up the payment to $6,000 for that month in the analysis. Age, log transformed body mass index (BMI), heart disease and lung disease are used as predictors for the death hazard. All of them are significant with p-values < 0.0001, 0.0011, < 0.0001 and 0.0127, respectively. The goodness of fit for the Cox regression model is checked in Figure 1. Dotted lines in the first row of Figure 1 are the plots of 20 realizations from the distributions of the score processes. The observed score processes are presented with solid lines which randomly fluctuate around zero. From Figure 1, we see that the proportional hazards model for age and log transformed BMI fits the data reasonably well, with respective goodness-of-fit empirical p-values of 0.485, and 0.284, respectively, based on 1000 simulated martingale residual score processes (Lin et al., 1993). A plot of log Λ̂0(t) versus log t is displayed in the lower panel of Figure 1. The approximate parallelism of the curves suggests that the proportional hazards model for lung disease and heart disease provides a reasonably good approximation, all except for early times with lung disease.

Figure 1.

Figure 1

Goodness of fit of the Cox model for the ESRD data.

Since the distribution of monthly Medicare payment, Y, is highly skewed, we consider a log transformation log(Y/1000 + 1). Figure 2 shows the final six-month trajectories of monthly inpatient costs (log transformed) for 30 randomly selected patients who died during follow-up (dotted lines). Many show an increasing and then decreasing pattern before death. We consider a normal kernel in the nonlinear mixed model.

Figure 2.

Figure 2

Monthly inpatient costs (log transformed). The solid line is the average of the estimated log transformed monthly cost. The shaded area is its 95% pointwise confidence band. The dotted lines are 30 randomly selected subjects with terminal event

Exploration of the data showed a similar pattern after entry as described in Liu et al. (2007). Inpatient costs tended to increase over the first two months after entry, and then showed an approximately linear decreasing pattern through to the eighth month. Hence, we create three variables to capture this effect, where Start1 = 1(Month = 1), Start2 = Month × 1(2 ≤ Month ≤ 7) and Start3 = 1(Month ≥ 8). Diabetes, heart disease and race are also the covariates of interest, whereas age, BMI, sex and lung disease are not significantly associated. The final models are

log(Yi(tij)/1000+1)=β0+β1Start1+β2Start2+β3Start3+β4Diabetes+β5Heart+β6Race+γ exp{ξ(Sitijμ)2}+bi+Ui(tij)+εij
λi(s)=λ0(s) exp(α1Age+α2 log(BMI)+α3Lung+α4Heart).

Table 5 shows the regression coefficient estimates, where we see that the proposed two-stage method yields similar point estimates with smaller estimated variances compared to the complete case analysis, indicating the efficiency gain of the proposed method.

Table 5.

Longitudinal data analysis results for the inpatient cost paid by Medicare with death as a covariate.

Complete Case Two-Stage
estimate var (×10−3) p-value estimate var (×10−3) p-value
Start1 −0.65 4.10 < 0.0001 −0.59 2.00 < 0.0001
Start2 −0.07 0.11 < 0.0001 −0.07 0.04 < 0.0001
Start3 −0.46 3.54 < 0.0001 −0.50 2.78 < 0.0001
Diabetes 0.06 2.51 0.20 0.09 1.46 0.01
Heart 0.12 3.31 0.04 0.11 1.80 0.01
Race 0.14 2.71 0.007 0.11 1.81 0.01
γ 1.54 3.67 < 0.0001 1.57 3.59 < 0.0001
ξ 0.99 42.90 < 0.0001 0.96 30.95 < 0.0001
μ 0.78 5.92 < 0.0001 0.76 5.30 < 0.0001

The estimated averages of log transformed Medicare payments are presented with a solid line in Figure 2. A Q-Q plot is used to check the normal error assumption, where only one residual is randomly selected for each patient to avoid the correlation within subjects, see Figure 3. The linear pattern is consistent with the nonlinear mixed model with normal kernel. We checked many random selections, and the plot is typical.

Figure 3.

Figure 3

QQ plot for inpatient cost model.

6 Discussion

We consider the identity link and a Gaussian error in this article. The proposed two-stage method could be generalized to non-Gaussian error, logistic or Poisson regression provided the model is identifiable and regularity conditions are suitably modified.

We allow only time-independent covariates in this article for simplicity. Time dependent covariates are often of interest in longitudinal data analysis and survival analysis. Implementation of the two-stage method for time-dependent covariates involves extrapolating η(u; (C)) beyond C, where (C) is the history of the time-dependent covariates X up to time C. It involves predicting the censored covariate process, which will be explored elsewhere. An alternative is an estimating equation approach using inverse probability weighting, which would only use the subjects with observed terminal event time.

The function g(St, ξ) in (1) that we consider in this article is a known nonlinear function up to the parameter ξ. In practice, smoothing techniques can be used to determine an appropriate parametric functional form of g or to examine the fit of the data to a hypothesized g. For example, we fitted the model (1) but approximated g with cubic B-splines with 20 knots over the entire observation window of 48 months. This yielded an estimate of g that was very similar to the proposed Gaussian form.

We only considered the intercept parameter as a function of St in this article to illustrate the basic concept of the proposed methodology. This modeling strategy extends naturally to regression models with a time-varying coefficient for each regressor. Such an extension is under investigation.

The major difference between our work and that of Li et al. (2013) is that the function g in model (1), together with β0, can be viewed as the intercept parameter which depends on St, but all other variables in the model are with reference to time t in the same way as in the usual regression models for longitudinal data. On the other hand, none of the regression parameters in Li et al. (2013) varies with St, but all the variables in their model (including the error terms) are with reference to the reverse time scale, St. See their equation (2). This leads to different model interpretations.

Supplementary Material

Supp1

Table 2.

Simulation results for equally spaced time interval with at nonlinear term. varb=boostrap variance estimator; CR=coverage rate

β0 = 1 β1 = 1 β2 = −3 μ = 1 γ = 4 ξ = 0.2
Full data bias 0.0081 −0.0116 0.0209 −0.0007 0.0009 0.0001
var 0.1279 0.0130 0.0455 0.0009 0.0121 0.0005
Two-stage bias 0.0090 −0.0121 0.0229 0.0003 −0.0046 0.0010
var 0.1599 0.0161 0.0561 0.0014 0.0152 0.0007
varb 0.1745 0.0166 0.0641 0.0015 0.0152 0.0007
90% CR 0.902 0.890 0.912 0.904 0.910 0.902
95% CR 0.958 0.948 0.946 0.950 0.964 0.952
Complete case bias 0.0044 −0.0196 0.0463 −0.0004 −0.0062 −0.0004
var 0.2248 0.0239 0.0821 0.0017 0.0236 0.0008

Table 3.

Simulation results for non-equally spaced time interval with sharp nonlinear term varb=boostrap variance estimator; CR=coverage rate

β0 = 1 β1 = 1 β2 = −3 μ = 1 γ = 4 ξ = 1.2
Full data bias −0.0045 0.0020 0.0068 0.0016 −0.0051 0.0059
var 0.1474 0.0203 0.0187 0.0005 0.0177 0.0084
Two-stage bias −0.0238 0.0089 0.0111 0.0015 −0.0046 0.0090
var 0.1675 0.0235 0.0253 0.0007 0.0229 0.0113
varb 0.1550 0.0220 0.0276 0.0007 0.0213 0.0135
90% CR 0.866 0.884 0.884 0.888 0.914 0.910
95% CR 0.942 0.936 0.938 0.944 0.950 0.960
Complete case bias −0.0295 0.0093 0.0217 0.0004 0.0002 0.0120
var 0.2480 0.0366 0.0353 0.0010 0.0340 0.0161

Acknowledgments

The data used in this paper were made available by the U.S. Renal Data System. This study was supported in part by the U.S. Renal Data System under Contract No. NO1-DK-9-2344 (National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland). The data analysis was completed while Shengchun Kong was Assistant Professor of Statistics at Purdue University.

The research is supported in part by NIH grant R01-AG036802 and NSF grants DMS-1007590 and DMS-1407142, and with Federal funds from the National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Department of Health and Human Services, under Contract No. HHSN276201400001C. The data reported here have been supplied by the United States Renal Data System (USRDS). The interpretation and reporting of these data are the responsibility of the authors and in no way should be seen as an official policy or interpretation of the U.S. government.

A Appendix

A.1 Regularity conditions

Denote the true value of θ by θ0, the true value of ϕ by ϕ0, the sample space of response variable Y by 𝒴, the sample space of covariate X by 𝒳, the sample space of random effect Z by 𝒵 ⊂ 𝒳, the parameter space of θ by Θ, the parameter space of ϕ by Φ, and the parameter space of η by ℱ. In addition to the assumptions of bounded support for X, bounded parameter spaces Θ and Φ, conditional independence between C and (S, Y) given X, and non-informative censoring, we provide a set of regularity conditions in the following:

Condition 1

The third derivatives |∂3g(t, ξ)/(∂ξiξjξk)| and |∂3g(t, ξ)/(∂tξjξk)| are bounded uniformly for all ξ ∈ Ξ and bounded t.

Condition 2

  1. Pl0(θ, ϕ; Y, X, Δ, V) has a unique maximizer (θ0, ϕ0);

  2. Pl(θ, ϕ0, η0; Y, X, Δ, V) has a unique maximizer θ0.

Condition 3

The eigenvalues for Σ(ϕ) are bounded between [λ1, λ2], where 0 < λ1 < λ2 < ∞ for any ϕ ∈ Φ and Z ∈ 𝒵.

Condition 4

The absolute values of all the elements in ∂3Σ(ϕ)/(∂ϕiϕjϕk) are bounded uniformly for all ϕ ∈ Φ and Z ∈ 𝒵.

Condition 5

The study stops at a finite time τ > 0 such that infx∈𝒳 P(Cτ |X = x) = ω1 > 0 and infx∈𝒳 P(Sτ|X = x) = ω2 > 0 for constants ω1 and ω2.

Condition 6

The conditional distribution of S given X possesses a continuous Lebesgue density.

Condition 7

The information matrix of the partial likelihood for the Cox regression model at the true parameter values is positive definite.

Condition 8

There exist constants δ1 > 0 and δ2 > 0, such that Cτfθ,ϕ(Y|s,X)dη(s;X)δ1 with probability 1 for any θ ∈ Θ and |ϕϕ0| + ‖ηη0‖ < δ2.

REMARK

Condition 1 holds for many smooth function g, e.g. g(t, ξ) = ξ1 exp{(tξ2)2ξ3} or g(t, ξ) = ξ1 exp{−(tξ2)}. Bounded third derivatives implies bounded second derivatives, which is adequate for the proof of consistency. We implemented the numerical studies with g being the normal kernel. When g(t, ξ) = ξ1 exp{(tξ2)2ξ3}, Condition 2(a) implies ξ1ξ3 ≠ 0; by Theorem 2.1 of Lehmann (1998), this condition holds provided model (1) is identifiable. Condition 2(b) is for the consistency of the proposed two-stage estimator θ̂n, which may be unnecessarily strong as can be seen from the following. In the proof of Theorem 4.2, we can show Pl̈11(θ0, ϕ0, η0; Y, X, Δ, V) = P{{∂2l(θ, ϕ0, η0; Y, X, Δ, V)}/(∂θθ′)|θ=θ0} is negative definite by Condition 2(a). Thus Pl̈11(θ, ϕ0, η0; Y, X, Δ, V), a continuous matrix of θ, is also negative definite in a neighborhood of θ0, which guarantees that θ0 is a unique maximizer of Pl(θ, ϕ0, η0; Y, X, Δ, V) in a neighborhood of θ0. The initial value we use in the algorithm for maximizing (7) is obtained from the complete case analysis, which is shown to be n1/2 -consistent; thus, the solution of the proposed two-stage method is likely to be in the same neighborhood, and therefore also consistent without the uniqueness requirement in Condition 2(b).

Conditions 3–4 automatically hold for model (1) with the NOU process if |ρ| ≤ 1 − δ, and ti,k+1ti,kε, i = 1, ⋯, n, k = 1, ⋯, ni − 1, where δ > 0 and ε > 0; they are parallel to the conditions of bounded derivatives of the log likelihood in Theorem 1.1 and Theorem 2.3 of Lehmann (1998).

Conditions 5–7 are usual assumptions for Cox regression models (Andersen and Gill, 1982; Nan and Wellner, 2013). From Condition 5, we have

l(θ,ϕ,η;Y,X,Δ,V)=Δ log fθ,ϕ(Y|S,X)+(1Δ) log Cτfθ,ϕ(Y|u,X)d[1exp{Λ(u) exp(αX)}]. (8)

Condition 8 is mainly for technical convenience. One way to obtain Condition 8 might be to truncate the response variable Y such that |Y| ≤ M < ∞ for a large constant M. In our simulations, however, we do not implement such truncation but still obtain satisfactory results.

A.2 Proofs of Theorem 4.1 and 4.2

All the Lemmas A.1 – A.5 used in the following proofs are provided in the online Supplementary Material.

A.2.1 Proof of consistency in Theorem 4.1 for complete case analysis estimator

Proof

From Corollary 3.2.3 in van der Vaart and Wellner (1996), we need to show that (i)Pl0(θ0, ϕ0; Y, X, Δ, V) > sup(θ,ϕ)∉G Pl0(θ, ϕ; Y, X, Δ, V) for any open set G that contains (θ0, ϕ0); (ii) sup(θ,ϕ)‖(ℙnP)l0(θ, ϕ; Y, X, Δ, V)‖ → 0. Condition (i) is satisfied from Condition 2(a) and non-informative censoring assumption. Condition (ii) is satisfied because the class of functions {−Δ(YXβg(S1 − t, ξ))′Σ(ϕ)−1(YXβg(S1 − t, ξ))/2 − log |Σ(ϕ)|/2 : θ ∈ Θ, ϕ ∈ Φ} belongs to Glivenko-Cantelli from Lemma A.4.

A.2.2 Proof of asymptotic normality in Theorem 4.1 for complete case analysis estimator

Denote the element-wise product of two matrices A and B by A * B. Let

Aj(ϕ)=(ϕ)/ϕj,Ajk(ϕ)=2(ϕ)/(ϕjϕk),j=1,,q,k=1,,q;r(θ;V,Y,X)=YXβg(S1t,ξ).
Proof

The proof follows Lemma A.1 with ψ = (θ, ϕ). Here

m(θ,ϕ;Y,X,Δ,V)=l0(θ,ϕ;Y,X,Δ,V).

The first order derivative of l0(θ, ϕ; Y, X, Δ, V) equals

l˙0(θ,ϕ;Y,X,Δ,V)=(l˙01(θ,ϕ;Y,X,Δ,V)l˙02(θ,ϕ;Y,X,Δ,V)),

where

l˙01(θ,ϕ;Y,X,Δ,V)=l0(θ,ϕ;Y,X,Δ,V)/θ=ΔD2(θ;V,X)(ϕ)1r(θ;V,Y,X)

with

D2(θ;V,X)=(X,g(V1t,ξ)/ξ), (9)

and

l˙02(θ,ϕ;Y,X,Δ,V)=l0(θ,ϕ;Y,X,Δ,V)/ϕ=C(θ,ϕ;Y,X,Δ,V)=(C1(θ,ϕ;Y,X,Δ,V),,Cq(θ,ϕ;Y,X,Δ,V)) (10)

with

Cj(θ0,ϕ0;Y,X,Δ,V)=Δtr[(ϕ0)1Aj(ϕ0)]/2+Δr(θ0;V,Y,X)(ϕ0)1Aj(ϕ0)(ϕ0)1r(θ0;V,Y,X)/2. (11)

The second order derivative of l0(θ, ϕ; Y, X, Δ, V) equals

l¨0(θ,ϕ;Y,X,Δ,V)=(l¨011(θ,ϕ;Y,X,Δ,V)l¨021(θ,ϕ;Y,X,Δ,V)l¨021(θ,ϕ;Y,X,Δ,V)l¨022(θ,ϕ;Y,X,Δ,V)),

where

l¨011(θ,ϕ;Y,X,Δ,V)=2l0(θ,ϕ;Y,X,Δ,V)/{θθ}=ΔD2(θ;V,X)1(ϕ)D2(θ;V,X)+ΔD3(θ,ϕ;V,Y,X)

with

D3jk(θ,ϕ;V,Y,X)={0,jp1 or kp2(2g(V1t,ξ)ξjp1ξkp2)(ϕ)1r(θ;V,Y,X),j>p1 and k>p2, (12)
l¨021(θ,ϕ;Y,X,Δ,V)=2l0(θ,ϕ;Y,X,Δ,V)/{ϕθ}=(l¨0211(θ,ϕ;Y,X,Δ,V),,l¨021q(θ,ϕ;Y,X,Δ,V))

with

l¨021j(θ,ϕ;Y,X,Δ,V)=ΔD2(θ;V,X)(ϕ)1Aj(ϕ)(ϕ)1r(θ;V,Y,X),

and

l¨022(θ,ϕ;Y,X,Δ,V)=2l0(θ,ϕ;Y,X,Δ,V)/{ϕϕ}=(l¨02211(θ,ϕ;Y,X,Δ,V)l¨0221q(θ,ϕ;Y,X,Δ,V)l¨022q1(θ,ϕ;Y,X,Δ,V)l¨022qq(θ,ϕ;Y,X,Δ,V))

with

l¨022jk(θ,ϕ;Y,X,Δ,V)=Δtr [(ϕ)1Aj(ϕ)(ϕ)1Ak(ϕ)+(ϕ)1Ajk(ϕ)]/2Δr(θ;V,Y,X)(ϕ)1{Aj(ϕ)(ϕ)1Ak(ϕ)Ajk(ϕ)+Ak(ϕ)(ϕ)1Aj(ϕ)}(ϕ)1r(θ;V,Y,X)/2.

Condition A1 holds from consistency. Condition A2 holds since for any u,

fθ0,ϕ0(y|u,x)r(θ0;u,y,x)dy=0, (13)
fθ0,ϕ0(y|u,x)r(θ0;u,y,x)r(θ0;u,y,x)dy=(ϕ0). (14)

We have

Pl¨0(θ0,ϕ0;Y,X,Δ,V)=(D4(θ0,ϕ0)00D5(ϕ0)),

where

D4(θ0,ϕ0)=P{ΔD2(θ0;V,X)(ϕ0)1D2(θ0;V,X)},
D5(ϕ0)=(D511(ϕ0)D51q(ϕ0)D5q1(ϕ0)D5qq(ϕ0),)

with

D5jk(ϕ0)=P{Δtr [(ϕ0)1Ak(ϕ0)(ϕ0)1Aj(ϕ0)]/2}=P{Δtr [(ϕ0)1/2Ak(ϕ0)(ϕ0)1Aj(ϕ0)(ϕ0)1/2]/2}.

Hence,

D5(ϕ0)=P{ΔD1(ϕ0;X)D1(ϕ0;X)/2}, (15)

where

D1(ϕ0;X)=(vec ((ϕ0)1/2A1(ϕ0)(ϕ0)1/2)vec ((ϕ0)1/2Aq(ϕ0)(ϕ0)1/2)).

Thus, Pm̈(θ0, ϕ0; Y, X, Δ, V) is negative definite from Condition 2(a).

From (13), we have Condition A3 holds. Condition A4 holds automatically. Condition A5 holds if the class of functions {−Δtr [Σ(ϕ)−1Aj(ϕ)] /2+Δr(θ; V, Y, X)′Σ(ϕ)−1Aj(ϕ)Σ(ϕ)−1 r(θ; V, Y, X)/2 : j = 1, ⋯, q, |θθ0| < δ, |ϕϕ0| < δ} is Donsker for some δ > 0 and satisfies P|(θ, ϕ; Y, X, Δ, V) − (θ0, ϕ0; Y, X, Δ, V)|2 → 0 as |(θ, ϕ) − (θ0, ϕ0)| ≤ δn ↓ 0. These two conditions hold from Conditions 1, 3–5, and Theorem 2.10.6 of van der Vaart and Wellner (1996). Condition A6 holds from Taylor expansion and Conditions 1 and 3–5. Hence,

n((θn,ϕn)(θ0,ϕ0))=[Pl¨0(θ0,ϕ0;Y,X,Δ,V)]1nl˙0(θ0,ϕ0;Y,X,Δ,V)+op(1),

which converges weakly to a mean zero normal random variable with variance J11Q1J11, where J1 = −Pl̈0(θ0, ϕ0; Y, X, Δ, V) and Q1 = P{0(θ0, ϕ0; Y, X, Δ, V)⊗2}. Furthermore,

n(ϕnϕ0)=D5(ϕ0)1nnC(θ0,ϕ0;Y,X,Δ,V)+op(1), (16)

where D5(ϕ0) and C(θ0, ϕ0; , , Δ̃, ) are defined in (15) and (10), respectively.

A.2.3 Proof of consistency in Theorem 4.2 for two-stage estimator

Proof

From Condition 2(b), we have

supd(θ,θ0)>δPl(θ,ϕ0,η0;Y,X,Δ,V)<Pl(θ0,ϕ0,η0;Y,X,Δ,V) (17)

holds for every δ > 0. By the definition of θ̂n, we have

nl(θ^n,ϕn,ηn;Y,X,Δ,V)nl(θ0,ϕn,ηn;Y,X,Δ,V)=nl(θ0,ϕ0,η0;Y,X,Δ,V)+op(1), (18)

where the equality is obtained by Lemma A.4 and Lemma A.5. The class of functions {l(θ, ϕ, η; Y, X, Δ, V) : θ ∈ Θ, ϕ ∈ Φ, η ∈ ℱ} is Donsker from Lemma A.4. Hence it is Glivenko-Cantelli, and we then have

0Pl(θ0,ϕ0,η0;Y,X,Δ,V)Pl(θ^n,ϕ0,η0;Y,X,Δ,V)
=nl(θ0,ϕ0,η0;Y,X,Δ,V)nl(θ^n,ϕ0,η0;Y,X,Δ,V)+op(1)
nl(θ^n,ϕn,ηn;Y,X,Δ,V)nl(θ^n,ϕ0,η0;Y,X,Δ,V)+op(1) (19)
=Pl(θ^n,ϕn,ηn;Y,X,Δ,V)Pl(θ^n,ϕ0,η0;Y,X,Δ,V)+op(1)
=op(1), (20)

where (19) is obtained from (18) and (20) is obtained by Lemma A.5. By inequality (17), for every δ > 0 we have

{d(θ^n,θ0)δ}{Pl(θ^n,ϕ0,η0;Y,X,Δ,V)<Pl(θ0,ϕ0,η0;Y,X,Δ,V)},

with the sequence of the events on the right going to a null event in view of inequality (20), which yields the almost sure (thus in probability) convergence of θ̂n. This argument is taken from the proof of Theorem 5.8 in van der Vaart (2002) and the proof of Theorem 3 in Li and Nan (2011).

A.2.4 Proof of asymptotic normality in Theorem 4.2 for two-stage estimator

Proof

The proof follows Lemma A.2. Here

m(θ,ϕ,η;Y,X,Δ,V)=l(θ,ϕ,η;Y,X,Δ,V).

The partial derivative of l(θ, ϕ, η; Y, X, Δ, V) with respect to θ equals

l˙1(θ,ϕ,η;Y,X,Δ,V)=ΔD2(θ;V,X)(ϕ)1r(θ;V,Y,X)+(1Δ)[Cτfθ,ϕ(Y|u,X)dη(u;X)]1[Cτfθ,ϕ(Y|u,X)D2(θ;u,X)(ϕ)1r(θ;u,Y,X)dη(u;X)],

where D2(θ; u, X) is defined in (9).

The second order derivative of l(θ, ϕ, η; Y, X, Δ, V) with respect to θ equals

l¨11(θ,ϕ,η;Y,X,Δ,V)=ΔD2(θ;V,X)(ϕ)1D2(θ;V,X)+ΔD3(θ,ϕ;V,Y,X)+(1Δ)×{[Cτfθ,ϕ(Y|u,X){D2(θ;u,X)(ϕ)1D2(θ;u,X)+D3(θ,ϕ;u,Y,X)+[D2(θ;u,X)(ϕ)1r(θ;u,Y,X)]2}dη(u;X)][Cτfθ,ϕ(Y|u,X)dη(u;X)]1[Cτfθ,ϕ(Y|u,X)D2(θ;u,X)(ϕ)1r(θ;u,Y,X)dη(u;X)]2[Cτfθ,ϕ(Y|u,X)dη(u;X)]2},

where D3(θ, ϕ; V, Y, X) is defined in (12).

B1 holds from Theorem 4.1, Lemma A.3 and consistency of two-stage estimator. From (13) and (14),

Pl¨11(θ0,ϕ0,η0;Y,X,Δ,V)=P{ΔD2(θ0;V,X)1(ϕ0)D2(θ0;V,X)+(1Δ)[Cτfθ0,ϕ0(Y|u,X)dη0(u;X)]2[Cτfθ0,ϕ0(Y|u,X)D2(θ0;u,X)(ϕ0)1r(θ0;u,Y,X)dη0(u;X)]2}, (21)

which is negative definite from Condition 2(b); thus, B2 holds. From (13), we have B3 holds. B4 holds automatically.

Since

A1/B1A2/B2={A1(B2B1)}/(B1B2)+(A1A2)/B2,

under Conditions 1, 3–5 and 8, we have

P|l˙1(θ,ϕ,η;Y,X,Δ,V)l˙1(θ0,ϕ0,η0;Y,X,Δ,V)|20

as |(θ, ϕ) − (θ0, ϕ0)| ≤ δn ↓ 0 by continuity and Condition 8. Similar to the proof of Lemma A.4, we have the class of functions { Cτfθ,ϕ(Y|u,X)D2(θ;u,X)(ϕ)1r(θ;u,Y,X)dη(u;X) : θ ∈ Θ, ϕ ∈ Φ, η ∈ ℱ} belongs to Donsker. Hence, {1(θ, ϕ, η; Y, X, Δ, V) : θ ∈ Θ, ϕ ∈ Φ} is Donsker from Section 2.10.2 of van der Vaart and Wellner (1996) and Condition 8. Furthermore, from Corollary 2.3.12 of van der Vaart and Wellner (1996), we have B5 holds. Under Conditions 3–5 and 8, similar to the proof of Theorem 1 in Kong and Nan (2016), we can show that B6 holds. Particularly in B6,

Pl¨12(θ0,ϕ0,η0;Y,X,Δ,V)=(Pl¨121(θ0,ϕ0,η0;Y,X,Δ,V),,Pl¨12q(θ0,ϕ0,η0;Y,X,Δ,V))

with

Pl¨12j(θ0,ϕ0,η0;Y,X,Δ,V)=P((1Δ)[Cτfθ0,ϕ0(Y|u,X)D2(θ0;u,X)(ϕ0)1r(θ0;u,Y,X)dη0(u;X)][Cτfθ0,ϕ0(Y|u,X){r(θ0;u,Y,X)(ϕ0)1Aj(ϕ0)(ϕ0)1r(θ0;u,Y,X)tr[(ϕ0)1Aj(ϕ0)]}dη0(u;X)][Cτfθ0,ϕ0(Y|u,X)dη0(u;X)]2/2),

and

Pl¨13(θ0,ϕ0,η0;Y,X,Δ,V)[ηnη0]=P((1Δ)[Cτfθ0,ϕ0(Y|u,X)D2(θ0;u,X)(ϕ0)1r(θ0;u,Y,X)dη0(u;X)][Cτfθ0,ϕ0(Y|u,X)d{ηn(u;X)η0(u;X)}][Cτfθ0,ϕ0(Y|u,X)dη0(u;X)]2)=P((1Δ)[Cτfθ0,ϕ0(Y|u,X)D2(θ0;u,X)(ϕ0)1r(θ0;u,Y,X)dη0(u;X)][fθ0,ϕ0(Y|τ,X){ηn(τ;X)η0(τ;X)}fθ0,ϕ0(Y|C,X){ηn(C;X)η0(C;X)}Cτfθ0,ϕ0(Y|u,X)(g(u1t,ξ0)/u)(ϕ0)1r(θ0;u,Y,X){ηn(u;X)η0(u;X)}du][Cτfθ0,ϕ0(Y|u,X)dη0(u;X)]2)=𝔾n{G(θ0,ϕ0,η0;X,Δ,V)}+op(1), (22)

where

G(θ0,ϕ0,η0;X,Δ,V)=P{E1(θ0,ϕ0,η0;Y,X,Δ,V)E2(θ0,ϕ0,η0;Y,X,τ)A1(η0;τ,X;X,Δ,V)}P{E1(θ0,ϕ0,η0;Y,X,Δ,V)E2(θ0,ϕ0,η0;Y,X,C)A1(η0;C,X;X,Δ,V)}+P{CτE1(θ0,ϕ0,η0;Y,X,Δ,V)E2(θ0,ϕ0,η0;Y,X,u)E3(θ0,ϕ0;Y,X,u)A1(η0;u,X;X,Δ,V)du}

with

E1(θ0,ϕ0,η0;Y,X,Δ,V)=(1Δ)[Cτfθ0,ϕ0(Y|u,X)dη0(u;X)]2[Cτfθ0,ϕ0(Y|u,X)D2(θ0;u,X)(ϕ0)1r(θ0;u,Y,X)dη0(u;X)],
E2(θ0,ϕ0,η0;Y,X,u)=fθ0,ϕ0(Y|u,X)[1η0(u;X)] exp(α0X),
E3(θ0,ϕ0;Y,X,u)=(g(u1t,ξ0)/u)(ϕ0)1r(θ0;u,Y,X),

and A1(η0; u, X; , Δ̃, ) is defined in Lemma A.3.

Hence by Lemma A.2 and the central limit theorem,

n(θ^nθ0)=J21nnl˙1(θ0,ϕ0,η0;X)+J21nP{l¨12(θ0,ϕ0,η0;X)}(ϕnϕ0)+J21nP{l¨13(θ0,ϕ0,η0;X)[ηnη0]}+op(1),

which converges weakly to a mean zero normal random variable with variance J21Q2J21 from (16) and (22), where

J2=P{l¨11(θ0,ϕ0,η0;Y,X,Δ,V)},
Q2=P[l˙1(θ0,ϕ0,η0;Y,X,Δ,V)+G(θ0,ϕ0,η0;X,Δ,V)+P{l¨12(θ0,ϕ0,η0;Y,X,Δ,V)}D5(ϕ0)1C(θ0,ϕ0;Y,X,Δ,V)]2

with D5(ϕ0) and C(θ0, ϕ0; , , Δ̃, ) defined in (15) and (10), respectively.

Footnotes

SUPPLEMENTARY MATERIAL

The online supplement contains general theorems about M-estimators, technical lemmas, and additional simulation. It also contains R code for implementing the methods developed here.

Contributor Information

Shengchun Kong, Gilead Sciences, Inc., Foster City, CA 94404.

Bin Nan, Departments of Biostatistics, University of Michigan, Ann Arbor, MI 48109.

John D. Kalbfleisch, Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109.

Rajiv Saran, Department of Internal Medicine, University of Michigan, Ann Arbor, MI 48109.

Richard Hirth, Department of Health Management and Policy, University of Michigan, Ann Arbor, MI 48109.

References

  1. Albert PS, Shih JH. An approach for jointly modeling multivariate longitudinal measurements and discrete time-to-event data. The Annals of Applied Statistics. 2010;4(3):1517–1532. doi: 10.1214/10-AOAS339. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Andersen PK, Gill RD. Cox’s regression model for counting processes: a large sample study. The Annals of Statistics. 1982;10(4):1100–1120. [Google Scholar]
  3. Breslow NE. Discussion of “Regression models and life-tables” by D. R. Cox. Journal of the Royal Statistical Society, Series B. 1972;34(2):216–217. [Google Scholar]
  4. Chan K, Wang M. Backward estimation of stochastic processes with failure events as time origins. The Annals of Applied Statistics. 2010;4(3):1602–1620. doi: 10.1214/09-AOAS319. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Cox DR. Regression models and life-tables (with discussion) Journal of the Royal Statistical Society, Series B. 1972;34(2):187–220. [Google Scholar]
  6. Ding J, Wang JL. Modeling longitudinal data with nonparametric multiplicative random effects jointly with survival data. Biometrics. 2008;64(2):546–556. doi: 10.1111/j.1541-0420.2007.00896.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Ghosh D, Lin DY. Marginal regression models for recurrent and terminal events. Statistica Sinica. 2002;12:663–688. [Google Scholar]
  8. Harlow SD, Mitchell ES, Crawford S, Nan B, Little R, Taffe J. The restage collaboration: defining optimal bleeding criteria for onset of early menopausal transition. Fertility and Sterility. 2008;89(1):129–140. doi: 10.1016/j.fertnstert.2007.02.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Hsieh F, Tseng YK, Wang JL. Joint modeling of survival and longitudinal data: likelihood approach revisited. Biometrics. 2006;62(4):1037–1043. doi: 10.1111/j.1541-0420.2006.00570.x. [DOI] [PubMed] [Google Scholar]
  10. Huang CY, Wang MC. Joint modeling and estimation for recurrent event processes and failure time data. Journal of the American Statistical Association. 2004;99(468):1153–1165. doi: 10.1198/016214504000001033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. 2. Hoboken: John Wiley & Sons, Inc; 2002. [Google Scholar]
  12. Kalbfleisch JD, Schaubel DE, Ye Y, Gong Q. An estimating function approach to the analysis of recurrent and terminal events. Biometrics. 2013;69(2):366–374. doi: 10.1111/biom.12025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Kong S, Nan B. Semiparametric approach to regression with a covariate subject to a detection limit. Biometrika. 2016;103(1):161–174. [Google Scholar]
  14. Lehmann EL. Theory of Point Estimation. New York: Springer-Verlag; 1998. [Google Scholar]
  15. Li Z, Nan B. Relative risk regression for current status data in case-cohort studies. The Canadian Journal of Statistics. 2011;39(4):557–577. [Google Scholar]
  16. Li Z, Tosteson TD, Bakitas MA. Joint modeling quality of life and survival using a terminal decline model in palliative care studies. Statistics in Medicine. 2013;32(8):1394–1406. doi: 10.1002/sim.5635. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Lin DY, Wei LJ, Ying Z. Checking the cox model with cumulative sums of martingale-based residuals. Biometrika. 1993;80(3):557–572. [Google Scholar]
  18. Little RJ, Rubin DB. Statistical Analysis with Missing Data. 2. Hoboken: John Wiley & Sons, Inc; 2002. [Google Scholar]
  19. Liu L, Wolfe RA, Kalbfleisch JD. A shared random effects model for censored medical costs and mortality. Statistics in Medicine. 2007;26(1):139–155. doi: 10.1002/sim.2535. [DOI] [PubMed] [Google Scholar]
  20. Lu X, Nan B, Song P, Sowers M. Longitudinal data analysis with event time as a covariate. Statistics in Biosciences. 2010;2(1):65–80. [Google Scholar]
  21. Nan B, Wellner JA. A general semiparametric z-estimation approach for case-cohort studies. Statistica Sinica. 2013;23:1155–1180. [PMC free article] [PubMed] [Google Scholar]
  22. Sowers M, Tomey K, Jannausch M, Eyvazzdh A, Crutchfield M, Nan B, Randolph J. Physical functioning and menopause states. Obstet Gynecol. 2007;110(6):1290–1296. doi: 10.1097/01.AOG.0000290693.78106.9a. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Tsiatis AA, Davidian M. Joint modeling of longitudinal and time-to-event data: an overview. Statistica Sinica. 2004;14:809–834. [Google Scholar]
  24. van der Vaart AW. In: Semiparametric Statistics. In Lectures on Probability Theory and Statistics, Ecole d’Ete de Probabilites de Saint-Flour XXIX99. Bernard P, editor. Berlin Heidelberg: Springer-Verlag; 2002. pp. 330–457. [Google Scholar]
  25. van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. New York: Springer-Verlag; 1996. [Google Scholar]
  26. Wellner JA, Zhang Y. Two likelihood-based semiparametric estimation methods for panel count data with covariates. Annals of Statistics. 2007;35(5):2106–2142. [Google Scholar]
  27. Zeng D, Lin DY. Semiparametric transformation models with random effects for joint analysis of recurrent and terminal events. Biometrics. 2009;65(3):746–752. doi: 10.1111/j.1541-0420.2008.01126.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp1

RESOURCES