Conditional modeling of longitudinal data with terminal event

Shengchun Kong; Bin Nan; John D Kalbfleisch; Rajiv Saran; Richard Hirth

doi:10.1080/01621459.2016.1255637

. Author manuscript; available in PMC: 2019 Mar 6.

Published in final edited form as: J Am Stat Assoc. 2017 Nov 13;113(521):357–368. doi: 10.1080/01621459.2016.1255637

Conditional modeling of longitudinal data with terminal event

Shengchun Kong ¹, Bin Nan ², John D Kalbfleisch ³, Rajiv Saran ⁴, Richard Hirth ^5,^✉

PMCID: PMC6402357 NIHMSID: NIHMS1502792 PMID: 30853735

Abstract

We consider a random effects model for longitudinal data with the occurrence of an informative terminal event that is subject to right censoring. Existing methods for analyzing such data include the joint modeling approach using latent frailty and the marginal estimating equation approach using inverse probability weighting; in both cases the effect of the terminal event on the response variable is not explicit and thus not easily interpreted. In contrast, we treat the terminal event time as a covariate in a conditional model for the longitudinal data, which provides a straight-forward interpretation while keeping the usual relationship of interest between the longitudinally measured response variable and covariates for times that are far from the terminal event. A two-stage semiparametric likelihood-based approach is proposed for estimating the regression parameters; first, the conditional distribution of the right-censored terminal event time given other covariates is estimated and then the likelihood function for the longitudinal event given the terminal event and other regression parameters is maximized. The method is illustrated by numerical simulations and by analyzing medical cost data for patients with end-stage renal disease. Desirable asymptotic properties are provided.

Keywords: Cox regression, Empirical process, Mixed effects model, Pseudo-maximum likelihood estimation

1 Introduction

In longitudinal studies, the collection of information can be stopped at the end of the study, at the time of dropout of a study participant, or at the time of a terminal event. Death, the most common terminal event, often occurs in cohort studies of older populations and in fatal disease follow-up studies, e.g., organ failure or cancer studies. Other types of terminal events also exist, for example, the final menstrual period is a terminal event for menstrual cycle data.

The current literature has primarily focused on modeling the longitudinally measured response variable and covariates given that the terminal event has not yet happened; see e.g. Tsiatis and Davidian (2004), Hsieh et al. (2006), Ding and Wang (2008), Albert and Shih (2010). If the terminal event is ignorable (Little and Rubin, 2002), then a likelihood-based estimation of regression parameters is straightforward. Oftentimes, however, the terminal event time is non-ignorable. Two types of approaches are widely used for longitudinal data analysis with non-ignorable terminal events: the joint modeling approach using latent frailty and the marginal estimating equation approach using inverse probability weighting. In the former, the relationship between the terminal event and the longitudinal data is indirectly modeled through the shared random effect. The latter approach is appropriate when the terminal event is simply censoring the observations of the longitudinal process, which is in fact continuing but unobserved; its use when the terminal event stops the longitudinal process is more controversial. Similar approaches have also been used in the context of recurrent events correlated with a terminal event; for example, see Ghosh and Lin (2002), Huang and Wang (2004), Zeng and Lin (2009), Albert and Shih (2010), Kalbfleisch et al. (2013), among many others.

These modeling strategies, however, are not as useful as one might wish for many longitudinal studies where the explicit effect of the terminal event time on the longitudinal measures is of interest. For example, medical payments in dialysis patients (Liu et al., 2007) and cancer patients (Chan and Wang, 2010) tend to increase when patients approach death; functional limitations in an aging population (Sowers et al., 2007) become more severe when people are closer to the end of life; and menstrual cycles become longer and more variable when women approach menopause (Harlow et al., 2008). In these cases, a question of interest is how does the impending terminal event affect the longitudinal measures, and for this question, a model for the longitudinal event conditional on the terminal event seems particularly useful and appropriate.

In this article, we propose a random effects model for repeated measures which includes the event time as an additional (fixed effect) covariate, and thus provides a more intuitive and meaningful interpretation of the effect of the terminal event time. The proposed conditional modeling strategy keeps the usual relationship of interest between the longitudinally measured response variable and covariates when the data collection time is far from the occurrence of the terminal event, but the response variable becomes increasingly dependent on the terminal event time when the data collection time is close to the terminal event. Since the terminal event time is subject to right censoring, the regression model with the terminal event time as a covariate falls into a general framework of regression with censored covariate. For this situation, the complete case analysis by dropping observations with censored event times will be shown to be a valid estimating approach under the usual noninformative conditional independent censoring assumption for the censoring time.

We propose a semiparametric, likelihood-based approach for parameter estimation in a linear regression model with a nonlinear component for the censored covariate, that utilizes both the complete and censored data. The proposed method is shown to be consistent and asymptotically normal under a set of mild regularity conditions, and is more efficient than the complete case analysis. The proofs of the asymptotic properties rely heavily on empirical process theory. A referee drew our attention to Li et al. (2013), which has a similar aim of recovering information from censored data. We comment further on this work later in the Discussion Section.

The rest of the article is organized as follows. We describe the proposed model in Section 2 and the two-stage estimating method in Section 3. The asymptotic properties are outlined in Section 4 with proofs given in the Appendix. Section 5 contains numerical results followed by a brief discussion. Detailed technical preparations are provided in the online Supplementary Material.

2 A Nonlinear Regression Model with Mixed Effects and Censored Covariate

2.1 Complete data model with observed terminal event time

For a subject i, denote the terminal event time by S_i, the baseline covariates by a vector X_i where the first element is 1, the longitudinal response by Y_ij, and the prespecified visit time by t_ij, where i = 1, ⋯, n and j = 1, ⋯, n_i. For given S_i, we model Y_ij with the following mixed effect model for longitudinal data:

Y_{ij} = X_{i}' β + g (S_{i} - t_{ij}, ξ) + Z_{i}' b_{i} + U_{i} (t_{ij}) + ε_{ij},

(1)

where β is a vector of regression coefficients with length p₁, b_i is an independent random effects vector of length q₁ associated with covariates Z_i, U_i(t) is an independent stochastic processes, ε_ij, j = 1, …, n_i, are independent measurement errors, g is a known function that satisfies Condition 1 in Appendix, and ξ is a vector with length p₂. The function g(t, ξ) → 0 when t → ∞ so that model (1) reduces to a simpler relationship of interest between the longitudinally measured response variable Y_ij and covariates X_i, when t_ij is distant from S_i, and should become increasingly related to the terminal event when t_ij is close to the terminal event S_i. Motivated by figure 5 in Chan and Wang (2010), we can choose g(S − t, ξ) to be a normal kernel where g(S − t, ξ) = ξ₁e^{−(S−t−ξ₂)²ξ₃}, ξ = (ξ₁, ξ₂). Other examples include an exponential kernel where g(t, ξ) = ξ₁e^{−(t−ξ₂)}.

We make the following additional assumptions: (i) b_i follows a normal distribution N(0, D(φ)), where D is a positive definite matrix depending on a parameter vector φ with length q₂; (ii) U_i(t) is a mean zero Gaussian process with a given covariance function cov(U_i(t₁), U_i(t₂)) = κ(ν, ρ; t₁, t₂) that depends on a parameter vector ν with length q₃ and a scalar ρ; for example, U_i(t) can be the nonhomogeneous Ornstein-Uhlenbeck (NOU) process satisfying var(U_i(t)) = ν(t) with log(ν(t)) = ν₀ + ν₁t and corr(U_i(t₁), U_i(t₂)) = ρ^{|t₁−t₂|}; (iii) ε_ij follows a normal distribution N(0, σ²); and (iv) b_i, U_i(t), and ε_ij are mutually independent.

For a vector t = (t₁, ⋯, t_m), denote g(t, ξ) = (g(t₁, ξ), ⋯, g(t_m, ξ))′. Let Y_i = (Y_i1, ⋯, Y_{in_i})′, t_i = (t_i1, ⋯, t_{in_i}), $X_{i} = {(X_{i}, \dots, X_{i})}_{p_{1} \times n_{i}}^{'}$ and $Z_{i} = {(Z_{i}, \dots, Z_{i})}_{q_{1} \times n_{i}}^{'}$ . When S_i is observed, from (1) we have

f_{θ, ϕ} (Y_{i} | S_{i}, X_{i}) = {(2 π)}^{- n_{i} / 2} {| \sum_{i} |}^{- 1 / 2} exp {- (Y_{i} - X_{i} β - g (S_{i} 1_{i} - t_{i}, ξ))' \sum_{i}^{- 1} (Y_{i} - X_{i} β - g (S_{i} 1_{i} - t_{i}, ξ)) / 2},

(2)

where 1_i = (1, ⋯, 1)′ with length n_i, θ = (β, ξ)′ with length p = p₁ + p₂, ϕ = (φ, ν, ρ, σ²)′ with length q = q₂ + q₃ + 2, and Σ_i = Z_iDZ_i′ + Γ_i + σ²I_i, where I_i is the n_i × n_i identity matrix and Γ_i is the covariance matrix of (U(t_i1), ⋯, U(t_{in_i}))′.

A semiparametric mixed effects model could also be considered, where g is an unknown function that can be estimated by smoothing splines. We focus on the parametric model (1) to more simply illustrate the proposed methodology.

2.2 Observed data model with potentially censored terminal event time

Let C_i be the censoring time for the ith subject. If S_i ≤ C_i, then S_i is observed; otherwise S_i is right-censored by C_i. We denote the observed time by V_i = min(S_i, C_i) and the censoring indicator by Δ_i = 1(S_i ≤ C_i). Note that t_ij ≤ V_i, for all i = 1, ⋯, n, j = 1⋯, n_i. Here, we assume that C_i and (S_i, Y_i) are conditionally independent given X_i.

For notational simplicity, assume that the random effect Z is a sub-vector of X. For a single subject, we observe (V, Δ, Y, X). The likelihood function for the observed data (V, Δ, Y, X) can be factored into

f_{1} (V, Δ, Y, X) = f_{2} (V, Δ | Y, X) f_{3} (Y | X) f_{4} (X),

where f₁ denotes the joint density of (V, Δ, Y, X), f₂ denotes the conditional density of (V, Δ) given (Y, X), f₃ denotes the conditional density of Y given X, and f₄ denotes the marginal density of X. Since the conditional independence of C and (S, Y) given X implies that C and S are conditionally independent given (Y, X), we have

f_{2} (V, Δ | Y, X) = {f_{S} (S | Y, X) {\bar{G}}_{C} (S | Y, X)}^{Δ} {{\bar{F}}_{S} (C | Y, X) g_{C} (C | Y, X)}^{1 - Δ},

(3)

where f_S denotes the conditional density of S given (Y, X), g_C denotes the conditional density of C given (Y, X), with F̄_S and Ḡ_C being the corresponding conditional survival functions. Further assuming noninformative censoring, we can drop g_C(C|Y, X) and Ḡ_C(C|Y, X). Going through conditional arguments using the Bayes’ rule and dropping f₄(X), we obtain the likelihood function

L (V, Δ, Y, X) = {f_{θ, ϕ} (Y | S, X) f_{5} (S | X)}^{Δ} {\int_{C}^{\infty} f_{θ, ϕ} (Y | s, X) {dF}_{5} (s | X)}^{1 - Δ},

(4)

where f₅(S|X) is the conditional density of S given X, and F₅(S|X) is the corresponding cumulative distribution function. In (4), only f_θ,ϕ contains the parameter of interest θ and nuisance parameter ϕ, whereas F₅ (or f₅) is an additional nuisance parameter.

In (4), {f_θ,ϕ(Y |S, X)f₅(S|X)}^Δ is for a subject with observed terminal event time, which yields the fully observed data likelihood, and ${\int_{C}^{\infty} f_{θ, ϕ} (Y | s, X) {dF}_{5} (s | X)}^{1 - Δ}$ is for a subject with censored terminal event time. In section 4, we show that the complete case analysis by dropping the second part in (4) yields a consistent and asymptotically normally distributed estimator, but is inefficient compared to an approach that also utilizes the censored data. From the second part in (4), we see that the amount of efficiency gain depends on how well we can estimate the right tail of the conditional distribution F₅(s|X) beyond C. We consider a semiparametric approach that allows reliable extrapolation beyond C and is robust against any parametric assumption.

Since C_i and (S_i, Y_i) are conditionally independent given X_i and C_i is random, all the commonly used semiparametric models for right-censored data allow extrapolation beyond C_i. Here, we propose the most widely used Cox regression model (Cox, 1972). Other viable models include the accelerated failure time model, the additive hazard model, and the transformation model (Kalbfleisch and Prentice, 2002). Suppose the hazard function of S given X has the following form:

λ (s | X) = λ (s) exp (α' X),

(5)

where α is the regression parameter with an unknown true value α₀, and λ(·) is the baseline hazard function. The conditional cumulative distribution function is then given by

η (s; X) \equiv F_{5} (s | X) = 1 - exp {- Λ (s) exp (α' X)},

where $Λ (s) = \int_{0}^{s} λ (u) du$ is the cumulative baseline hazard function with an unknown true value Λ₀. Note that X appears in both models (2) and (5), but these two instances may refer to different regressions. For example, X₁ might be a covariate in (2) whereas $X_{1}^{2}$ is a covariate in (5). The same X is used to denote all fully observed covariates for notational simplicity. The log-likelihood function then becomes

log L = Δ log f_{θ, ϕ} (Y | S, X) + Δ log \dot{η} (S; X) + (1 - Δ) log \int_{C}^{\infty} f_{θ, ϕ} (Y | u, X) d η (u; X) .

(6)

A similar idea has been used by Lu et al. (2010), but for a different problem. Lu et al. (2010) considered longitudinal data analysis with an event time, which does not terminate the observed data.

3 The Pseudo-likelihood Method

The log likelihood function (6) involves an unknown distribution function η and the corresponding density function η̇. Hence a maximum likelihood estimate, if it exists, can be complicated. We propose a tractable two-stage pseudo-likelihood approach in which the nuisance parameters (ϕ, η) are estimated in stage 1, and the parameter of interest θ is then estimated by maximizing (6) in stage 2 with nuisance parameters replaced by their estimators obtained in stage 1 (Kong and Nan, 2016). Details are given below:

Stage 1. Nuisance parameter estimation. The dispersion parameter ϕ is estimated by the complete case analysis of the nonlinear regression model (2); the Cox model regression coefficient α is estimated by maximizing the partial likelihood, and the cumulative baseline hazard Λ is estimated with the Breslow estimator (Breslow, 1972). Denote the estimators by ϕ̃_n, α̃_n, and Λ̃_n, respectively. The c.d.f η(s; X) is estimated by ${\tilde{η}}_{n} (s; X) = 1 - exp {- {\tilde{Λ}}_{n} (s) exp ({\tilde{α}}_{n}^{'} X)}$ , which is asymptotically equivalent to the product integral expression. It can be shown that all the estimates obtained in Stage 1 have desirable statistical properties. In particular, η̃_n is n^1/2-consistent in a finite interval, see Lemma A.3 in Supplementary Material; ϕ̃_n obtained from the complete case analysis is n^1/2-consistent, see Theorem 4.1 in section 4.
Stage 2. Pseudo-likelihood estimation of θ. Replacing (ϕ, η) by their Stage 1 estimates (ϕ̃_n, η̃_n) in the log likelihood function yields the following log pseudo-likelihood function for a random sample of n subjects:
${pl}_{n} (θ) = \frac{1}{n} \sum_{i = 1}^{n} {Δ_{i} log f_{θ, {\tilde{ϕ}}_{n}} (Y_{i} | S_{i}, X_{i}) + (1 - Δ_{i}) log \int_{C_{i}}^{\infty} f_{θ, {\tilde{ϕ}}_{n}} (Y_{i} | u, X_{i}) d {\tilde{η}}_{n} (u; X_{i})}$ (7)
Note that the term Δ log η̇ in (6) is dropped because it does not involve θ. However, if one wants to maximize the log-likelihood directly without using the two-stage approach, then this term cannot be omitted.

Let θ̂_n denote the pseudo-likelihood estimator. Since it is obtained by maximizing the objective function (7), its asymptotic properties can be obtained from M-estimation theory, see van der Vaart (2002), Wellner and Zhang (2007) and Li and Nan (2011).

The estimates (η̃_n, Λ̃_n) are obtained using a standard package for the Cox regression model. The estimates (θ̃_n, ϕ̃_n) from complete case analysis are obtained by maximizing $\frac{1}{n} \sum_{i = 1}^{n} {Δ_{i} log f_{θ, ϕ} (Y_{i} | S_{i}, X_{i})}$ using a Newton-Raphson algorithm, where multiple initial values are tried. The two-stage estimator θ̂_n is also obtained from a Newton-Raphson algorithm with the complete case analysis estimator θ̃_n as the initial value.

4 Asymptotic Properties

Let l₀(θ, ϕ; Y, X, Δ, V) = Δ log f_θ,ϕ (Y |S, X). This is the first part in the log-likelihood for the observed data. Then

l (θ, ϕ, η; Y, X, Δ, V) \equiv l (θ, ϕ, η (α, Λ); Y, X, Δ, V) = l_{0} (θ, ϕ; Y, X, Δ, V) + (1 - Δ) log \int_{C}^{\infty} f_{θ, ϕ} (Y | u, X) d η (u; X) = l_{0} (θ, ϕ; Y, X, Δ, V) + (1 - Δ) log \int_{C}^{\infty} f_{θ, ϕ} (Y | u, X) d [1 - exp {- Λ (u) exp (α' X)}],

which is (6) with Δ log η̇ dropped.

A set of regularity conditions is introduced in the Appendix. Some conditions are commonly assumed for the Cox regression model; other conditions are for the mixed effects model, which are easily verified for a smooth function g and the NOU process. We will use standard empirical process notation from now on. In particular, ℙ_n is the empirical measure and Pf = ∫ fdP for a probability measure P and a function f.

Under the conditional independent censoring assumption, the estimators from the complete case analysis and the two-stage procedure, respectively, are consistent and asymptotically normal. These results are given in the following two theorems.

Theorem 4.1. (Complete case)

Assume that C and (S, Y) are independent given X. Under Conditions 1, 2(a), and 3–5, the complete case analysis estimator (θ̃_n, ϕ̃_n) that maximizes ℙ_nl₀(θ, ϕ; Y, X, Δ, V) converges in outer probability to (θ₀, ϕ₀); and $\sqrt{n} (({\tilde{θ}}_{n}, {\tilde{ϕ}}_{n}) - (θ_{0}, ϕ_{0}))$ converges in distribution to a mean zero normal random variable with variance $J_{1}^{- 1} Q_{1} J_{1}^{- 1}$ , where J₁ and Q₁ are provided in the Appendix.

Theorem 4.2. (Two-stage)

Assume that C and (S, Y) are independent given X. Under Conditions 1–8, the two-stage pseudo-likelihood estimator θ̂_n that maximizes (7) converges in outer probability to θ₀; and $\sqrt{n} ({\hat{θ}}_{n} - θ_{0})$ converges in distribution to a mean zero normal random variable with variance $J_{2}^{- 1} Q_{2} J_{2}^{- 1}$ , where J₂ and Q₂ are defined in the Appendix

The proof of consistency is similar to Li and Nan (2011) and van der Vaart (2002). The proof of asymptotic normality is given in the appendix, and is based on the general M-estimation theory similar to Li and Nan (2011) and Wellner and Zhang (2007). The detailed proof relies heavily on empirical process theory and is given in the Appendix.

Because the asymptotic variance of θ̂_n has a very complicated expression that does not yield a simply computed estimate from the observed data, we use the bootstrap variance estimator.

5 Numerical Results

5.1 Simulations

We conduct simulations to investigate the finite sample performance of the proposed method. Simulation data sets are generated from the nonlinear model with mixed effects,

Y_{ij} = β_{0} + β_{1} X_{1 i} + β_{2} X_{2 i} + γ exp {- {(S_{i} - T_{ij} - μ)}^{2} ξ} + b_{i} + U_{i} (T_{ij}) + ε_{ij},

where β₀ = 1, β₁ = 1, β₂ = −3, μ = 1, and γ = 4. The random effect b_i ~ N(0, exp(−0.5)), the error term ε_ij ~ N(0, exp(−0.1)), and U_i(t) is an NOU process with ν₀ = 1, ν₁ = −1 and ρ = exp(−1)/(1 + exp(−1)). The two fully observed covariates are X_1i and X_2i, where X_1i ~ Bernoulli(0.5) and X_2i ~ N(0, 1) truncated at ±3. The terminal event time is S_i = 4 + S_0i, where S_0i follows an exponential distribution with conditional hazard function exp(−1 − 6X_1i + 4X_2i). To generate the censoring time C_i, we first generate $C_{0 i} = κ C_{0 i}^{*}$ , where $C_{0 i}^{*}$ follows an exponential distribution with conditional hazard function exp(−3 − X_1i + X_2i), then set C_i = t_ij, where j satisfies t_ij ≤ C_0i and t_ij+1 > C_0i assuming t_{in_i+1} = ∞. The constant κ is chosen to yield 40% censoring. For each subject i, there are 10 scheduled visit times, and the first visit time t_i1 is 0. There are two different settings to generate the subsequent visit times: (1) equally spaced time intervals with t_ij = j − 1, j = 2, ⋯, 10; (2) non-equally spaced time intervals with the subsequent visit times generated recursively from t_ij = t_ij−1 + min(4, W_i) for j = 2,⋯, 10, where W_i follows an exponential distribution with conditional hazard function exp(−3 − X_1i + X_2i). In each setting, ξ takes two different values, 1.2 and 0.2, corresponding to a flat and a sharp nonlinear predictor in the regression model, respectively.

We simulate 500 replications for each scenario with sample size 300. The biases and variances of the proposed method are compared with those of full data and complete case analyses. The full data analysis represents the case that all data are available; in other words, there is no censoring, which has more visits and serves as a benchmark. The complete case analysis simply eliminates subjects with censored terminal event time. For the proposed two-stage method, we report the 90% and 95% coverage proportions for which the variances estimators are obtained from 100 bootstrap samples. The results are presented in Tables 1–4.

Table 1.

Simulation results for equally spaced time interval with sharp nonlinear term. var_b=boostrap variance estimator; CR=coverage rate

		β₀ = 1	β₁ = 1	β₂ = −3	μ = 1	γ = 4	ξ = 1.2
Full data	bias	−0.0064	0.0011	−0.0002	0.0001	−0.0032	0.0031
	var	0.0801	0.0115	0.0082	0.0002	0.0140	0.0032
Two-stage	bias	−0.0153	0.0045	0.0003	−0.0010	−0.0030	0.0039
	var	0.0973	0.0144	0.0098	0.0003	0.0166	0.0040
	var_b	0.1094	0.0161	0.0102	0.0003	0.0151	0.0043
	90% CR	0.904	0.876	0.898	0.918	0.896	0.912
	95% CR	0.966	0.944	0.960	0.972	0.952	0.946
Complete case	bias	−0.0092	0.0010	0.0041	−0.0008	−0.0024	0.0024
	var	0.1217	0.0208	0.0130	0.0003	0.0242	0.0053

Open in a new tab

Table 4.

Simulation results for non-equally spaced time interval with at nonlinear term. var_b=boostrap variance estimator; CR=coverage rate

		β₀ = 1	β₁ = 1	β₂ = −3	μ =1	γ = 4	ξ = 0.2
Full data	bias	−0.0141	−0.0044	0.0336	−0.0005	0.0015	0.0037
	var	0.1951	0.0189	0.0789	0.0036	0.0187	0.0013
Two-stage	bias	−0.0371	0.0035	0.0401	−0.0055	−0.0053	0.0048
	var	0.2471	0.0239	0.1019	0.0054	0.0225	0.0018
	var_b	0.3158	0.0227	0.1485	0.0063	0.0220	0.0019
	90% CR	0.916	0.896	0.882	0.910	0.926	0.886
	95% CR	0.966	0.952	0.946	0.954	0.956	0.940
Complete case	bias	−0.0306	−0.0133	0.0772	−0.0059	−0.0021	0.0049
	var	0.4136	0.0388	0.1896	0.0078	0.0356	0.0026

Open in a new tab

The results suggest that the biases for the proposed two-stage method are minimal and comparable to both the full data analysis and the complete case analysis. From the tables, it can be seen that the proposed method is much more efficient than the complete case analysis, and the bootstrap method performs well in estimating the variance, yielding reasonable coverage rates for all the scenarios.

We run additional simulations to further investigate the impact of survival model misspecification in (5). The results are provided in the Supplementary Material, where it is shown that misspecification of the Cox regression model can yield biased results, and that the bias increases as the severity of misspecification of (5) grows; this indicates the importance of model-checking before implementing the proposed two-stage method.

5.2 End-stage renal disease

We consider data on inpatient hospital costs of patients with end-stage renal disease (ESRD) as reported in an analysis file provided by the United States Renal Data System (USRDS); this provides an illustrative example of longitudinal data with a terminal event. These costs are of substantial interest, since Medicare paid about $10.5 billion in 2012 for inpatient costs (USRDS 2014 annual report). We focus on the monthly inpatient costs paid by Medicare; these costs are terminated by the occurrence of death, and Chan and Wang (2010) and Liu et al. (2007) suggested that the medical payment pattern changes when patients approach death. We explore this issue taking account of patient level covariates.

For illustrative purposes, we selected a 2% random sample of the white and black patients whose service started in the calendar year 2007, and who were 65 years or older at baseline. The average age at baseline was 76.1 and follow-up ended on December 31st, 2010. Of the 840 patients selected for analysis, 65.5% died during the follow-up period. Others were censored through loss to follow up or at the end of the study. The average follow-up time for medical payment was 23.4 months. For convenience, we assume that the inpatient cost rate is constant within each hospitalization. For example, if a hospitalization starts from April 21st and ends on May 10th with the amount $3,000, then the Medical payment is $2,000 for April and $1,000 for May. Usually the month of death is shorter than other months. For example, a death on April 15th only has a half month to accrue spending. We consider “spending rate” for the month in which death occurs. For example, for a death on April 15th with the April Medical payment amount $3,000, we scale up the payment to $6,000 for that month in the analysis. Age, log transformed body mass index (BMI), heart disease and lung disease are used as predictors for the death hazard. All of them are significant with p-values < 0.0001, 0.0011, < 0.0001 and 0.0127, respectively. The goodness of fit for the Cox regression model is checked in Figure 1. Dotted lines in the first row of Figure 1 are the plots of 20 realizations from the distributions of the score processes. The observed score processes are presented with solid lines which randomly fluctuate around zero. From Figure 1, we see that the proportional hazards model for age and log transformed BMI fits the data reasonably well, with respective goodness-of-fit empirical p-values of 0.485, and 0.284, respectively, based on 1000 simulated martingale residual score processes (Lin et al., 1993). A plot of log Λ̂₀(t) versus log t is displayed in the lower panel of Figure 1. The approximate parallelism of the curves suggests that the proportional hazards model for lung disease and heart disease provides a reasonably good approximation, all except for early times with lung disease.

Goodness of fit of the Cox model for the ESRD data.

Since the distribution of monthly Medicare payment, Y, is highly skewed, we consider a log transformation log(Y/1000 + 1). Figure 2 shows the final six-month trajectories of monthly inpatient costs (log transformed) for 30 randomly selected patients who died during follow-up (dotted lines). Many show an increasing and then decreasing pattern before death. We consider a normal kernel in the nonlinear mixed model.

Monthly inpatient costs (log transformed). The solid line is the average of the estimated log transformed monthly cost. The shaded area is its 95% pointwise confidence band. The dotted lines are 30 randomly selected subjects with terminal event

Exploration of the data showed a similar pattern after entry as described in Liu et al. (2007). Inpatient costs tended to increase over the first two months after entry, and then showed an approximately linear decreasing pattern through to the eighth month. Hence, we create three variables to capture this effect, where Start1 = 1(Month = 1), Start2 = Month × 1(2 ≤ Month ≤ 7) and Start3 = 1(Month ≥ 8). Diabetes, heart disease and race are also the covariates of interest, whereas age, BMI, sex and lung disease are not significantly associated. The final models are

log (Y_{i} (t_{ij}) / 1000 + 1) = β_{0} + β_{1} Start 1 + β_{2} Start 2 + β_{3} Start 3 + β_{4} Diabetes + β_{5} Heart + β_{6} Race + γ exp {- ξ {(S_{i} - t_{ij} - μ)}^{2}} + b_{i} + U_{i} (t_{ij}) + ε_{ij}

λ_{i} (s) = λ_{0} (s) exp (α_{1} Age + α_{2} log (BMI) + α_{3} Lung + α_{4} Heart) .

Table 5 shows the regression coefficient estimates, where we see that the proposed two-stage method yields similar point estimates with smaller estimated variances compared to the complete case analysis, indicating the efficiency gain of the proposed method.

Table 5.

Longitudinal data analysis results for the inpatient cost paid by Medicare with death as a covariate.

		Complete Case			Two-Stage
	estimate	var (×10⁻³)	p-value	estimate	var (×10⁻³)	p-value
Start1	−0.65	4.10	< 0.0001	−0.59	2.00	< 0.0001
Start2	−0.07	0.11	< 0.0001	−0.07	0.04	< 0.0001
Start3	−0.46	3.54	< 0.0001	−0.50	2.78	< 0.0001
Diabetes	0.06	2.51	0.20	0.09	1.46	0.01
Heart	0.12	3.31	0.04	0.11	1.80	0.01
Race	0.14	2.71	0.007	0.11	1.81	0.01
γ	1.54	3.67	< 0.0001	1.57	3.59	< 0.0001
ξ	0.99	42.90	< 0.0001	0.96	30.95	< 0.0001
μ	0.78	5.92	< 0.0001	0.76	5.30	< 0.0001

Open in a new tab

The estimated averages of log transformed Medicare payments are presented with a solid line in Figure 2. A Q-Q plot is used to check the normal error assumption, where only one residual is randomly selected for each patient to avoid the correlation within subjects, see Figure 3. The linear pattern is consistent with the nonlinear mixed model with normal kernel. We checked many random selections, and the plot is typical.

6 Discussion

We consider the identity link and a Gaussian error in this article. The proposed two-stage method could be generalized to non-Gaussian error, logistic or Poisson regression provided the model is identifiable and regularity conditions are suitably modified.

We allow only time-independent covariates in this article for simplicity. Time dependent covariates are often of interest in longitudinal data analysis and survival analysis. Implementation of the two-stage method for time-dependent covariates involves extrapolating η(u; X̄ (C)) beyond C, where X̄ (C) is the history of the time-dependent covariates X up to time C. It involves predicting the censored covariate process, which will be explored elsewhere. An alternative is an estimating equation approach using inverse probability weighting, which would only use the subjects with observed terminal event time.

The function g(S − t, ξ) in (1) that we consider in this article is a known nonlinear function up to the parameter ξ. In practice, smoothing techniques can be used to determine an appropriate parametric functional form of g or to examine the fit of the data to a hypothesized g. For example, we fitted the model (1) but approximated g with cubic B-splines with 20 knots over the entire observation window of 48 months. This yielded an estimate of g that was very similar to the proposed Gaussian form.

We only considered the intercept parameter as a function of S − t in this article to illustrate the basic concept of the proposed methodology. This modeling strategy extends naturally to regression models with a time-varying coefficient for each regressor. Such an extension is under investigation.

The major difference between our work and that of Li et al. (2013) is that the function g in model (1), together with β₀, can be viewed as the intercept parameter which depends on S − t, but all other variables in the model are with reference to time t in the same way as in the usual regression models for longitudinal data. On the other hand, none of the regression parameters in Li et al. (2013) varies with S − t, but all the variables in their model (including the error terms) are with reference to the reverse time scale, S − t. See their equation (2). This leads to different model interpretations.

Supplementary Material

Supp1

NIHMS1502792-supplement-Supp1.pdf^{(188.3KB, pdf)}

Table 2.

Simulation results for equally spaced time interval with at nonlinear term. var_b=boostrap variance estimator; CR=coverage rate

		β₀ = 1	β₁ = 1	β₂ = −3	μ = 1	γ = 4	ξ = 0.2
Full data	bias	0.0081	−0.0116	0.0209	−0.0007	0.0009	0.0001
	var	0.1279	0.0130	0.0455	0.0009	0.0121	0.0005
Two-stage	bias	0.0090	−0.0121	0.0229	0.0003	−0.0046	0.0010
	var	0.1599	0.0161	0.0561	0.0014	0.0152	0.0007
	var_b	0.1745	0.0166	0.0641	0.0015	0.0152	0.0007
	90% CR	0.902	0.890	0.912	0.904	0.910	0.902
	95% CR	0.958	0.948	0.946	0.950	0.964	0.952
Complete case	bias	0.0044	−0.0196	0.0463	−0.0004	−0.0062	−0.0004
	var	0.2248	0.0239	0.0821	0.0017	0.0236	0.0008

Open in a new tab

Table 3.

Simulation results for non-equally spaced time interval with sharp nonlinear term var_b=boostrap variance estimator; CR=coverage rate

		β₀ = 1	β₁ = 1	β₂ = −3	μ = 1	γ = 4	ξ = 1.2
Full data	bias	−0.0045	0.0020	0.0068	0.0016	−0.0051	0.0059
	var	0.1474	0.0203	0.0187	0.0005	0.0177	0.0084
Two-stage	bias	−0.0238	0.0089	0.0111	0.0015	−0.0046	0.0090
	var	0.1675	0.0235	0.0253	0.0007	0.0229	0.0113
	var_b	0.1550	0.0220	0.0276	0.0007	0.0213	0.0135
	90% CR	0.866	0.884	0.884	0.888	0.914	0.910
	95% CR	0.942	0.936	0.938	0.944	0.950	0.960
Complete case	bias	−0.0295	0.0093	0.0217	0.0004	0.0002	0.0120
	var	0.2480	0.0366	0.0353	0.0010	0.0340	0.0161

Open in a new tab

Acknowledgments

The data used in this paper were made available by the U.S. Renal Data System. This study was supported in part by the U.S. Renal Data System under Contract No. NO1-DK-9-2344 (National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland). The data analysis was completed while Shengchun Kong was Assistant Professor of Statistics at Purdue University.

The research is supported in part by NIH grant R01-AG036802 and NSF grants DMS-1007590 and DMS-1407142, and with Federal funds from the National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Department of Health and Human Services, under Contract No. HHSN276201400001C. The data reported here have been supplied by the United States Renal Data System (USRDS). The interpretation and reporting of these data are the responsibility of the authors and in no way should be seen as an official policy or interpretation of the U.S. government.

A Appendix

A.1 Regularity conditions

Denote the true value of θ by θ₀, the true value of ϕ by ϕ₀, the sample space of response variable Y by 𝒴, the sample space of covariate X by 𝒳, the sample space of random effect Z by 𝒵 ⊂ 𝒳, the parameter space of θ by Θ, the parameter space of ϕ by Φ, and the parameter space of η by ℱ. In addition to the assumptions of bounded support for X, bounded parameter spaces Θ and Φ, conditional independence between C and (S, Y) given X, and non-informative censoring, we provide a set of regularity conditions in the following:

Condition 1

The third derivatives |∂³g(t, ξ)/(∂ξ_i∂ξ_j∂ξ_k)| and |∂³g(t, ξ)/(∂t∂ξ_j∂ξ_k)| are bounded uniformly for all ξ ∈ Ξ and bounded t.

Condition 2

Pl₀(θ, ϕ; Y, X, Δ, V) has a unique maximizer (θ₀, ϕ₀);
Pl(θ, ϕ₀, η₀; Y, X, Δ, V) has a unique maximizer θ₀.

Condition 3

The eigenvalues for Σ(ϕ) are bounded between [λ₁, λ₂], where 0 < λ₁ < λ₂ < ∞ for any ϕ ∈ Φ and Z ∈ 𝒵.

Condition 4

The absolute values of all the elements in ∂³Σ(ϕ)/(∂ϕ_i∂ϕ_j∂ϕ_k) are bounded uniformly for all ϕ ∈ Φ and Z ∈ 𝒵.

Condition 5

The study stops at a finite time τ > 0 such that inf_x∈𝒳 P(C ≥ τ |X = x) = ω₁ > 0 and inf_x∈𝒳 P(S ≥ τ|X = x) = ω₂ > 0 for constants ω₁ and ω₂.

Condition 6

The conditional distribution of S given X possesses a continuous Lebesgue density.

Condition 7

The information matrix of the partial likelihood for the Cox regression model at the true parameter values is positive definite.

Condition 8

There exist constants δ₁ > 0 and δ₂ > 0, such that $\int_{C}^{τ} f_{θ, ϕ} (Y | s, X) d η (s; X) \geq δ_{1}$ with probability 1 for any θ ∈ Θ and |ϕ − ϕ₀| + ‖η − η₀‖ < δ₂.

REMARK

Condition 1 holds for many smooth function g, e.g. g(t, ξ) = ξ₁ exp{(t − ξ₂)²ξ₃} or g(t, ξ) = ξ₁ exp{−(t − ξ₂)}. Bounded third derivatives implies bounded second derivatives, which is adequate for the proof of consistency. We implemented the numerical studies with g being the normal kernel. When g(t, ξ) = ξ₁ exp{(t − ξ₂)²ξ₃}, Condition 2(a) implies ξ₁ξ₃ ≠ 0; by Theorem 2.1 of Lehmann (1998), this condition holds provided model (1) is identifiable. Condition 2(b) is for the consistency of the proposed two-stage estimator θ̂_n, which may be unnecessarily strong as can be seen from the following. In the proof of Theorem 4.2, we can show Pl̈₁₁(θ₀, ϕ₀, η₀; Y, X, Δ, V) = P{{∂²l(θ, ϕ₀, η₀; Y, X, Δ, V)}/(∂θ∂θ′)|_θ=θ₀} is negative definite by Condition 2(a). Thus Pl̈₁₁(θ, ϕ₀, η₀; Y, X, Δ, V), a continuous matrix of θ, is also negative definite in a neighborhood of θ₀, which guarantees that θ₀ is a unique maximizer of Pl(θ, ϕ₀, η₀; Y, X, Δ, V) in a neighborhood of θ₀. The initial value we use in the algorithm for maximizing (7) is obtained from the complete case analysis, which is shown to be n^1/2 -consistent; thus, the solution of the proposed two-stage method is likely to be in the same neighborhood, and therefore also consistent without the uniqueness requirement in Condition 2(b).

Conditions 3–4 automatically hold for model (1) with the NOU process if |ρ| ≤ 1 − δ, and t_i,k+1 − t_i,k ≥ ε, i = 1, ⋯, n, k = 1, ⋯, n_i − 1, where δ > 0 and ε > 0; they are parallel to the conditions of bounded derivatives of the log likelihood in Theorem 1.1 and Theorem 2.3 of Lehmann (1998).

Conditions 5–7 are usual assumptions for Cox regression models (Andersen and Gill, 1982; Nan and Wellner, 2013). From Condition 5, we have

l (θ, ϕ, η; Y, X, Δ, V) = Δ log f_{θ, ϕ} (Y | S, X) + (1 - Δ) log \int_{C}^{τ} f_{θ, ϕ} (Y | u, X) d [1 - exp {- Λ (u) exp (α' X)}] .

(8)

Condition 8 is mainly for technical convenience. One way to obtain Condition 8 might be to truncate the response variable Y such that |Y| ≤ M < ∞ for a large constant M. In our simulations, however, we do not implement such truncation but still obtain satisfactory results.

A.2 Proofs of Theorem 4.1 and 4.2

All the Lemmas A.1 – A.5 used in the following proofs are provided in the online Supplementary Material.

A.2.1 Proof of consistency in Theorem 4.1 for complete case analysis estimator

Proof

From Corollary 3.2.3 in van der Vaart and Wellner (1996), we need to show that (i)Pl₀(θ₀, ϕ₀; Y, X, Δ, V) > sup_(θ,ϕ)∉G Pl₀(θ, ϕ; Y, X, Δ, V) for any open set G that contains (θ₀, ϕ₀); (ii) sup_(θ,ϕ)‖(ℙ_n − P)l₀(θ, ϕ; Y, X, Δ, V)‖ → 0. Condition (i) is satisfied from Condition 2(a) and non-informative censoring assumption. Condition (ii) is satisfied because the class of functions {−Δ(Y − Xβ − g(S1 − t, ξ))′Σ(ϕ)⁻¹(Y − Xβ − g(S1 − t, ξ))/2 − log |Σ(ϕ)|/2 : θ ∈ Θ, ϕ ∈ Φ} belongs to Glivenko-Cantelli from Lemma A.4.

A.2.2 Proof of asymptotic normality in Theorem 4.1 for complete case analysis estimator

Denote the element-wise product of two matrices A and B by A * B. Let

A_{j} (ϕ) = \partial \sum (ϕ) / \partial ϕ_{j}, A_{jk} (ϕ) = \partial^{2} \sum (ϕ) / (\partial ϕ_{j} \partial ϕ_{k}), j = 1, \dots, q, k = 1, \dots, q; r (θ; V, Y, X) = Y - X β - g (S 1 - t, ξ) .

Proof

The proof follows Lemma A.1 with ψ = (θ, ϕ). Here

m (θ, ϕ; Y, X, Δ, V) = l_{0} (θ, ϕ; Y, X, Δ, V) .

The first order derivative of l₀(θ, ϕ; Y, X, Δ, V) equals

{\dot{l}}_{0} (θ, ϕ; Y, X, Δ, V) = (\begin{matrix} {\dot{l}}_{01} (θ, ϕ; Y, X, Δ, V) \\ {\dot{l}}_{02} (θ, ϕ; Y, X, Δ, V) \end{matrix}),

where

{\dot{l}}_{01} (θ, ϕ; Y, X, Δ, V) = \partial l_{0} (θ, ϕ; Y, X, Δ, V) / \partial θ = Δ D_{2} (θ; V, X)' \sum {(ϕ)}^{- 1} r (θ; V, Y, X)

with

D_{2} (θ; V, X) = (X, \partial g (V 1 - t, ξ) / \partial ξ),

(9)

and

{\dot{l}}_{02} (θ, ϕ; Y, X, Δ, V) = \partial l_{0} (θ, ϕ; Y, X, Δ, V) / \partial ϕ = C (θ, ϕ; Y, X, Δ, V) = (C_{1} (θ, ϕ; Y, X, Δ, V), \dots, C_{q} (θ, ϕ; Y, X, Δ, V))'

(10)

with

C_{j} (θ_{0}, ϕ_{0}; Y, X, Δ, V) = - Δ tr [\sum {(ϕ_{0})}^{- 1} A_{j} (ϕ_{0})] / 2 + Δ r (θ_{0}; V, Y, X)' \sum {(ϕ_{0})}^{- 1} A_{j} (ϕ_{0}) \sum {(ϕ_{0})}^{- 1} r (θ_{0}; V, Y, X) / 2 .

(11)

The second order derivative of l₀(θ, ϕ; Y, X, Δ, V) equals

{\ddot{l}}_{0} (θ, ϕ; Y, X, Δ, V) = (\begin{matrix} {\ddot{l}}_{011} (θ, ϕ; Y, X, Δ, V) & {\ddot{l}}_{021} (θ, ϕ; Y, X, Δ, V)' \\ {\ddot{l}}_{021} (θ, ϕ; Y, X, Δ, V) & {\ddot{l}}_{022} (θ, ϕ; Y, X, Δ, V) \end{matrix}),

where

{\ddot{l}}_{011} (θ, ϕ; Y, X, Δ, V) = \partial^{2} l_{0} (θ, ϕ; Y, X, Δ, V) / {\partial θ \partial θ'} = - Δ D_{2} (θ; V, X)' \sum^{- 1} (ϕ) D_{2} (θ; V, X) + Δ D_{3} (θ, ϕ; V, Y, X)

with

D_{3 jk} (θ, ϕ; V, Y, X) = {\begin{matrix} 0, j \leq p_{1} or k \leq p_{2} \\ (\frac{\partial^{2} g (V 1 - t, ξ)}{\partial ξ_{j - p_{1}} \partial ξ_{k - p_{2}}})' \sum {(ϕ)}^{- 1} r (θ; V, Y, X), j > p_{1} and k > p_{2}, \end{matrix}

(12)

{\ddot{l}}_{021} (θ, ϕ; Y, X, Δ, V) = \partial^{2} l_{0} (θ, ϕ; Y, X, Δ, V) / {\partial ϕ \partial θ'} = ({\ddot{l}}_{0211} (θ, ϕ; Y, X, Δ, V), \dots, {\ddot{l}}_{021 q} (θ, ϕ; Y, X, Δ, V))'

with

{\ddot{l}}_{021 j} (θ, ϕ; Y, X, Δ, V) = - Δ D_{2} (θ; V, X)' \sum {(ϕ)}^{- 1} A_{j} (ϕ) \sum {(ϕ)}^{- 1} r (θ; V, Y, X),

and

{\ddot{l}}_{022} (θ, ϕ; Y, X, Δ, V) = \partial^{2} l_{0} (θ, ϕ; Y, X, Δ, V) / {\partial ϕ \partial ϕ'} = (\begin{matrix} {\ddot{l}}_{02211} (θ, ϕ; Y, X, Δ, V) & \dots & {\ddot{l}}_{0221 q} (θ, ϕ; Y, X, Δ, V) \\ ⋮ & ⋮ & ⋮ \\ {\ddot{l}}_{022 q 1} (θ, ϕ; Y, X, Δ, V) & \dots & {\ddot{l}}_{022 qq} (θ, ϕ; Y, X, Δ, V) \end{matrix})

with

{\ddot{l}}_{022 jk} (θ, ϕ; Y, X, Δ, V) = - Δ tr [- \sum {(ϕ)}^{- 1} A_{j} (ϕ) \sum {(ϕ)}^{- 1} A_{k} (ϕ) + \sum {(ϕ)}^{- 1} A_{jk} (ϕ)] / 2 - Δ r (θ; V, Y, X)' \sum {(ϕ)}^{- 1} {A_{j} (ϕ) \sum {(ϕ)}^{- 1} A_{k} (ϕ) - A_{jk} (ϕ) + A_{k} (ϕ) \sum {(ϕ)}^{- 1} A_{j} (ϕ)} \sum {(ϕ)}^{- 1} r (θ; V, Y, X) / 2 .

Condition A1 holds from consistency. Condition A2 holds since for any u,

\int_{- \infty}^{\infty} f_{θ_{0}, ϕ_{0}} (y | u, x) r (θ_{0}; u, y, x) dy = 0,

(13)

\int_{- \infty}^{\infty} f_{θ_{0}, ϕ_{0}} (y | u, x) r (θ_{0}; u, y, x) r (θ_{0}; u, y, x)' dy = \sum (ϕ_{0}) .

(14)

We have

P {\ddot{l}}_{0} (θ_{0}, ϕ_{0}; Y, X, Δ, V) = (\begin{matrix} D_{4} (θ_{0}, ϕ_{0}) & 0 \\ 0 & D_{5} (ϕ_{0}) \end{matrix}),

where

D_{4} (θ_{0}, ϕ_{0}) = - P {Δ D_{2} (θ_{0}; V, X)' \sum {(ϕ_{0})}^{- 1} D_{2} (θ_{0}; V, X)},

D_{5} (ϕ_{0}) = (\begin{matrix} D_{511} (ϕ_{0}) & \dots & D_{51 q} (ϕ_{0}) \\ ⋮ & ⋮ & ⋮ \\ D_{5 q 1} (ϕ_{0}) & \dots & D_{5 qq} (ϕ_{0}), \end{matrix})

with

D_{5 jk} (ϕ_{0}) = - P {Δ tr [\sum {(ϕ_{0})}^{- 1} A_{k} (ϕ_{0}) \sum {(ϕ_{0})}^{- 1} A_{j} (ϕ_{0})] / 2} = - P {Δ tr [\sum {(ϕ_{0})}^{- 1 / 2} A_{k} (ϕ_{0}) \sum {(ϕ_{0})}^{- 1} A_{j} (ϕ_{0}) \sum {(ϕ_{0})}^{- 1 / 2}] / 2} .

Hence,

D_{5} (ϕ_{0}) = - P {Δ D_{1} (ϕ_{0}; X)' D_{1} (ϕ_{0}; X) / 2},

(15)

where

D_{1} (ϕ_{0}; X)' = (\begin{matrix} vec (\sum {(ϕ_{0})}^{- 1 / 2} A_{1} (ϕ_{0}) \sum {(ϕ_{0})}^{- 1 / 2})' \\ ⋮ \\ vec (\sum {(ϕ_{0})}^{- 1 / 2} A_{q} (ϕ_{0}) \sum {(ϕ_{0})}^{- 1 / 2})' \end{matrix}) .

Thus, Pm̈(θ₀, ϕ₀; Y, X, Δ, V) is negative definite from Condition 2(a).

From (13), we have Condition A3 holds. Condition A4 holds automatically. Condition A5 holds if the class of functions {−Δtr [Σ(ϕ)⁻¹A_j(ϕ)] /2+Δr(θ; V, Y, X)′Σ(ϕ)⁻¹A_j(ϕ)Σ(ϕ)⁻¹ r(θ; V, Y, X)/2 : j = 1, ⋯, q, |θ − θ₀| < δ, |ϕ − ϕ₀| < δ} is Donsker for some δ > 0 and satisfies P|ṁ(θ, ϕ; Y, X, Δ, V) − ṁ(θ₀, ϕ₀; Y, X, Δ, V)|² → 0 as |(θ, ϕ) − (θ₀, ϕ₀)| ≤ δ_n ↓ 0. These two conditions hold from Conditions 1, 3–5, and Theorem 2.10.6 of van der Vaart and Wellner (1996). Condition A6 holds from Taylor expansion and Conditions 1 and 3–5. Hence,

\sqrt{n} (({\tilde{θ}}_{n}, {\tilde{ϕ}}_{n}) - (θ_{0}, ϕ_{0})) = - {[P {\ddot{l}}_{0} (θ_{0}, ϕ_{0}; Y, X, Δ, V)]}^{- 1} ℙ_{n} {\dot{l}}_{0} (θ_{0}, ϕ_{0}; Y, X, Δ, V) + o_{p^{*}} (1),

which converges weakly to a mean zero normal random variable with variance $J_{1}^{- 1} Q_{1} J_{1}^{- 1}$ , where J₁ = −Pl̈₀(θ₀, ϕ₀; Y, X, Δ, V) and Q₁ = P{l̇₀(θ₀, ϕ₀; Y, X, Δ, V)^⊗2}. Furthermore,

\sqrt{n} ({\tilde{ϕ}}_{n} - ϕ_{0}) = D_{5} {(ϕ_{0})}^{- 1} \sqrt{n} ℙ_{n} C (θ_{0}, ϕ_{0}; Y, X, Δ, V) + o_{p} (1),

(16)

where D₅(ϕ₀) and C(θ₀, ϕ₀; Ỹ, X̃, Δ̃, Ṽ) are defined in (15) and (10), respectively.

A.2.3 Proof of consistency in Theorem 4.2 for two-stage estimator

Proof

From Condition 2(b), we have

sup_{d (θ, θ_{0}) > δ} Pl (θ, ϕ_{0}, η_{0}; Y, X, Δ, V) < Pl (θ_{0}, ϕ_{0}, η_{0}; Y, X, Δ, V)

(17)

holds for every δ > 0. By the definition of θ̂_n, we have

ℙ_{n} l ({\hat{θ}}_{n}, {\tilde{ϕ}}_{n}, {\tilde{η}}_{n}; Y, X, Δ, V) \geq ℙ_{n} l (θ_{0}, {\tilde{ϕ}}_{n}, {\tilde{η}}_{n}; Y, X, Δ, V) = ℙ_{n} l (θ_{0}, ϕ_{0}, η_{0}; Y, X, Δ, V) + o_{p} (1),

(18)

where the equality is obtained by Lemma A.4 and Lemma A.5. The class of functions {l(θ, ϕ, η; Y, X, Δ, V) : θ ∈ Θ, ϕ ∈ Φ, η ∈ ℱ} is Donsker from Lemma A.4. Hence it is Glivenko-Cantelli, and we then have

0 \leq Pl (θ_{0}, ϕ_{0}, η_{0}; Y, X, Δ, V) - Pl ({\hat{θ}}_{n}, ϕ_{0}, η_{0}; Y, X, Δ, V)

= ℙ_{n} l (θ_{0}, ϕ_{0}, η_{0}; Y, X, Δ, V) - ℙ_{n} l ({\hat{θ}}_{n}, ϕ_{0}, η_{0}; Y, X, Δ, V) + o_{p} (1)

\leq ℙ_{n} l ({\hat{θ}}_{n}, {\tilde{ϕ}}_{n}, {\tilde{η}}_{n}; Y, X, Δ, V) - ℙ_{n} l ({\hat{θ}}_{n}, ϕ_{0}, η_{0}; Y, X, Δ, V) + o_{p} (1)

(19)

= Pl ({\hat{θ}}_{n}, {\tilde{ϕ}}_{n}, {\tilde{η}}_{n}; Y, X, Δ, V) - Pl ({\hat{θ}}_{n}, ϕ_{0}, η_{0}; Y, X, Δ, V) + o_{p} (1)

= o_{p} (1),

(20)

where (19) is obtained from (18) and (20) is obtained by Lemma A.5. By inequality (17), for every δ > 0 we have

{d ({\hat{θ}}_{n}, θ_{0}) \geq δ} \subset {Pl ({\hat{θ}}_{n}, ϕ_{0}, η_{0}; Y, X, Δ, V) < Pl (θ_{0}, ϕ_{0}, η_{0}; Y, X, Δ, V)},

with the sequence of the events on the right going to a null event in view of inequality (20), which yields the almost sure (thus in probability) convergence of θ̂_n. This argument is taken from the proof of Theorem 5.8 in van der Vaart (2002) and the proof of Theorem 3 in Li and Nan (2011).

A.2.4 Proof of asymptotic normality in Theorem 4.2 for two-stage estimator

Proof

The proof follows Lemma A.2. Here

m (θ, ϕ, η; Y, X, Δ, V) = l (θ, ϕ, η; Y, X, Δ, V) .

The partial derivative of l(θ, ϕ, η; Y, X, Δ, V) with respect to θ equals

{\dot{l}}_{1} (θ, ϕ, η; Y, X, Δ, V) = Δ D_{2} (θ; V, X)' \sum {(ϕ)}^{- 1} r (θ; V, Y, X) + (1 - Δ) {[\int_{C}^{τ} f_{θ, ϕ} (Y | u, X) d η (u; X)]}^{- 1} [\int_{C}^{τ} f_{θ, ϕ} (Y | u, X) D_{2} (θ; u, X)' \sum {(ϕ)}^{- 1} r (θ; u, Y, X) d η (u; X)],

where D₂(θ; u, X) is defined in (9).

The second order derivative of l(θ, ϕ, η; Y, X, Δ, V) with respect to θ equals

{\ddot{l}}_{11} (θ, ϕ, η; Y, X, Δ, V) = - Δ D_{2} (θ; V, X)' \sum {(ϕ)}^{- 1} D_{2} (θ; V, X) + Δ D_{3} (θ, ϕ; V, Y, X) + (1 - Δ) \times {[\int_{C}^{τ} f_{θ, ϕ} (Y | u, X) {- D_{2} (θ; u, X)' \sum {(ϕ)}^{- 1} D_{2} (θ; u, X) + D_{3} (θ, ϕ; u, Y, X) + {[D_{2} (θ; u, X)' \sum {(ϕ)}^{- 1} r (θ; u, Y, X)]}^{\otimes 2}} d η (u; X)] {[\int_{C}^{τ} f_{θ, ϕ} (Y | u, X) d η (u; X)]}^{- 1} - {[\int_{C}^{τ} f_{θ, ϕ} (Y | u, X) D_{2} (θ; u, X)' \sum {(ϕ)}^{- 1} r (θ; u, Y, X) d η (u; X)]}^{\otimes 2} {[\int_{C}^{τ} f_{θ, ϕ} (Y | u, X) d η (u; X)]}^{- 2}},

where D₃(θ, ϕ; V, Y, X) is defined in (12).

B1 holds from Theorem 4.1, Lemma A.3 and consistency of two-stage estimator. From (13) and (14),

P {\ddot{l}}_{11} (θ_{0}, ϕ_{0}, η_{0}; Y, X, Δ, V) = - P {Δ D_{2} (θ_{0}; V, X)' \sum^{- 1} (ϕ_{0}) D_{2} (θ_{0}; V, X) + (1 - Δ) {[\int_{C}^{τ} f_{θ_{0}, ϕ_{0}} (Y | u, X) d η_{0} (u; X)]}^{- 2} {[\int_{C}^{τ} f_{θ_{0}, ϕ_{0}} (Y | u, X) D_{2} (θ_{0}; u, X)' \sum {(ϕ_{0})}^{- 1} r (θ_{0}; u, Y, X) d η_{0} (u; X)]}^{\otimes 2}},

(21)

which is negative definite from Condition 2(b); thus, B2 holds. From (13), we have B3 holds. B4 holds automatically.

Since

A_{1} / B_{1} - A_{2} / B_{2} = {A_{1} (B_{2} - B_{1})} / (B_{1} B_{2}) + (A_{1} - A_{2}) / B_{2},

under Conditions 1, 3–5 and 8, we have

P {| {\dot{l}}_{1} (θ, ϕ, η; Y, X, Δ, V) - {\dot{l}}_{1} (θ_{0}, ϕ_{0}, η_{0}; Y, X, Δ, V) |}^{2} \to 0

as |(θ, ϕ) − (θ₀, ϕ₀)| ≤ δ_n ↓ 0 by continuity and Condition 8. Similar to the proof of Lemma A.4, we have the class of functions { $\int_{C}^{τ} f_{θ, ϕ} (Y | u, X) D_{2} (θ; u, X)' \sum {(ϕ)}^{- 1} r (θ; u, Y, X) d η (u; X)$ : θ ∈ Θ, ϕ ∈ Φ, η ∈ ℱ} belongs to Donsker. Hence, {l̇₁(θ, ϕ, η; Y, X, Δ, V) : θ ∈ Θ, ϕ ∈ Φ} is Donsker from Section 2.10.2 of van der Vaart and Wellner (1996) and Condition 8. Furthermore, from Corollary 2.3.12 of van der Vaart and Wellner (1996), we have B5 holds. Under Conditions 3–5 and 8, similar to the proof of Theorem 1 in Kong and Nan (2016), we can show that B6 holds. Particularly in B6,

P {\ddot{l}}_{12} (θ_{0}, ϕ_{0}, η_{0}; Y, X, Δ, V) = (P {\ddot{l}}_{121} (θ_{0}, ϕ_{0}, η_{0}; Y, X, Δ, V), \dots, P {\ddot{l}}_{12 q} (θ_{0}, ϕ_{0}, η_{0}; Y, X, Δ, V))

with

P {\ddot{l}}_{12 j} (θ_{0}, ϕ_{0}, η_{0}; Y, X, Δ, V) = - P ((1 - Δ) [\int_{C}^{τ} f_{θ_{0}, ϕ_{0}} (Y | u, X) D_{2} (θ_{0}; u, X)' \sum {(ϕ_{0})}^{- 1} r (θ_{0}; u, Y, X) d η_{0} (u; X)] [\int_{C}^{τ} f_{θ_{0}, ϕ_{0}} (Y | u, X) {r (θ_{0}; u, Y, X)' \sum {(ϕ_{0})}^{- 1} A_{j} (ϕ_{0}) \sum {(ϕ_{0})}^{- 1} r (θ_{0}; u, Y, X) - tr [\sum {(ϕ_{0})}^{- 1} A_{j} (ϕ_{0})]} d η_{0} (u; X)] {[\int_{C}^{τ} f_{θ_{0}, ϕ_{0}} (Y | u, X) d η_{0} (u; X)]}^{- 2} / 2),

and

P {\ddot{l}}_{13} (θ_{0}, ϕ_{0}, η_{0}; Y, X, Δ, V) [{\tilde{η}}_{n} - η_{0}] = - P ((1 - Δ) [\int_{C}^{τ} f_{θ_{0}, ϕ_{0}} (Y | u, X) D_{2} (θ_{0}; u, X)' \sum {(ϕ_{0})}^{- 1} r (θ_{0}; u, Y, X) d η_{0} (u; X)] [\int_{C}^{τ} f_{θ_{0}, ϕ_{0}} (Y | u, X) d {{\tilde{η}}_{n} (u; X) - η_{0} (u; X)}] {[\int_{C}^{τ} f_{θ_{0}, ϕ_{0}} (Y | u, X) d η_{0} (u; X)]}^{- 2}) = - P ((1 - Δ) [\int_{C}^{τ} f_{θ_{0}, ϕ_{0}} (Y | u, X) D_{2} (θ_{0}; u, X)' \sum {(ϕ_{0})}^{- 1} r (θ_{0}; u, Y, X) d η_{0} (u; X)] [f_{θ_{0}, ϕ_{0}} (Y | τ, X) {{\tilde{η}}_{n} (τ; X) - η_{0} (τ; X)} - f_{θ_{0}, ϕ_{0}} (Y | C, X) {{\tilde{η}}_{n} (C; X) - η_{0} (C; X)} - \int_{C}^{τ} f_{θ_{0}, ϕ_{0}} (Y | u, X) (\partial g (u 1 - t, ξ_{0}) / \partial u)' \sum {(ϕ_{0})}^{- 1} r (θ_{0}; u, Y, X) {{\tilde{η}}_{n} (u; X) - η_{0} (u; X)} du] {[\int_{C}^{τ} f_{θ_{0}, ϕ_{0}} (Y | u, X) d η_{0} (u; X)]}^{- 2}) = - 𝔾_{n} {G (θ_{0}, ϕ_{0}, η_{0}; \tilde{X}, \tilde{Δ}, \tilde{V})} + o_{p} (1),

(22)

where

G (θ_{0}, ϕ_{0}, η_{0}; \tilde{X}, \tilde{Δ}, \tilde{V}) = P {E_{1} (θ_{0}, ϕ_{0}, η_{0}; Y, X, Δ, V) E_{2} (θ_{0}, ϕ_{0}, η_{0}; Y, X, τ) A_{1} (η_{0}; τ, X; \tilde{X}, \tilde{Δ}, \tilde{V})} - P {E_{1} (θ_{0}, ϕ_{0}, η_{0}; Y, X, Δ, V) E_{2} (θ_{0}, ϕ_{0}, η_{0}; Y, X, C) A_{1} (η_{0}; C, X; \tilde{X}, \tilde{Δ}, \tilde{V})} + P {\int_{C}^{τ} E_{1} (θ_{0}, ϕ_{0}, η_{0}; Y, X, Δ, V) E_{2} (θ_{0}, ϕ_{0}, η_{0}; Y, X, u) E_{3} (θ_{0}, ϕ_{0}; Y, X, u) A_{1} (η_{0}; u, X; \tilde{X}, \tilde{Δ}, \tilde{V}) du}

with

E_{1} (θ_{0}, ϕ_{0}, η_{0}; Y, X, Δ, V) = (1 - Δ) {[\int_{C}^{τ} f_{θ_{0}, ϕ_{0}} (Y | u, X) d η_{0} (u; X)]}^{- 2} [\int_{C}^{τ} f_{θ_{0}, ϕ_{0}} (Y | u, X) D_{2} (θ_{0}; u, X)' \sum {(ϕ_{0})}^{- 1} r (θ_{0}; u, Y, X) d η_{0} (u; X)],

E_{2} (θ_{0}, ϕ_{0}, η_{0}; Y, X, u) = f_{θ_{0}, ϕ_{0}} (Y | u, X) [1 - η_{0} (u; X)] exp (α_{0}^{'} X),

E_{3} (θ_{0}, ϕ_{0}; Y, X, u) = (\partial g (u 1 - t, ξ_{0}) / \partial u)' \sum {(ϕ_{0})}^{- 1} r (θ_{0}; u, Y, X),

and A₁(η₀; u, X; X̃, Δ̃, Ṽ) is defined in Lemma A.3.

Hence by Lemma A.2 and the central limit theorem,

\sqrt{n} ({\hat{θ}}_{n} - θ_{0}) = J_{2}^{- 1} \sqrt{n} ℙ_{n} {\dot{l}}_{1} (θ_{0}, ϕ_{0}, η_{0}; X) + J_{2}^{- 1} \sqrt{n} P {{\ddot{l}}_{12} (θ_{0}, ϕ_{0}, η_{0}; X)} ({\tilde{ϕ}}_{n} - ϕ_{0}) + J_{2}^{- 1} \sqrt{n} P {{\ddot{l}}_{13} (θ_{0}, ϕ_{0}, η_{0}; X) [{\tilde{η}}_{n} - η_{0}]} + o_{p} (1),

which converges weakly to a mean zero normal random variable with variance $J_{2}^{- 1} Q_{2} J_{2}^{- 1}$ from (16) and (22), where

J_{2} = - P {{\ddot{l}}_{11} (θ_{0}, ϕ_{0}, η_{0}; Y, X, Δ, V)},

Q_{2} = P {[{\dot{l}}_{1} (θ_{0}, ϕ_{0}, η_{0}; \tilde{Y}, \tilde{X}, \tilde{Δ}, \tilde{V}) + G (θ_{0}, ϕ_{0}, η_{0}; \tilde{X}, \tilde{Δ}, \tilde{V}) + P {{\ddot{l}}_{12} (θ_{0}, ϕ_{0}, η_{0}; Y, X, Δ, V)} D_{5} {(ϕ_{0})}^{- 1} C (θ_{0}, ϕ_{0}; \tilde{Y}, \tilde{X}, \tilde{Δ}, \tilde{V})]}^{\otimes 2}

with D₅(ϕ₀) and C(θ₀, ϕ₀; Ỹ, X̃, Δ̃, Ṽ) defined in (15) and (10), respectively.

Footnotes

SUPPLEMENTARY MATERIAL

The online supplement contains general theorems about M-estimators, technical lemmas, and additional simulation. It also contains R code for implementing the methods developed here.

Contributor Information

Shengchun Kong, Gilead Sciences, Inc., Foster City, CA 94404.

Bin Nan, Departments of Biostatistics, University of Michigan, Ann Arbor, MI 48109.

John D. Kalbfleisch, Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109.

Rajiv Saran, Department of Internal Medicine, University of Michigan, Ann Arbor, MI 48109.

Richard Hirth, Department of Health Management and Policy, University of Michigan, Ann Arbor, MI 48109.

References

Albert PS, Shih JH. An approach for jointly modeling multivariate longitudinal measurements and discrete time-to-event data. The Annals of Applied Statistics. 2010;4(3):1517–1532. doi: 10.1214/10-AOAS339. [DOI] [PMC free article] [PubMed] [Google Scholar]
Andersen PK, Gill RD. Cox’s regression model for counting processes: a large sample study. The Annals of Statistics. 1982;10(4):1100–1120. [Google Scholar]
Breslow NE. Discussion of “Regression models and life-tables” by D. R. Cox. Journal of the Royal Statistical Society, Series B. 1972;34(2):216–217. [Google Scholar]
Chan K, Wang M. Backward estimation of stochastic processes with failure events as time origins. The Annals of Applied Statistics. 2010;4(3):1602–1620. doi: 10.1214/09-AOAS319. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cox DR. Regression models and life-tables (with discussion) Journal of the Royal Statistical Society, Series B. 1972;34(2):187–220. [Google Scholar]
Ding J, Wang JL. Modeling longitudinal data with nonparametric multiplicative random effects jointly with survival data. Biometrics. 2008;64(2):546–556. doi: 10.1111/j.1541-0420.2007.00896.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ghosh D, Lin DY. Marginal regression models for recurrent and terminal events. Statistica Sinica. 2002;12:663–688. [Google Scholar]
Harlow SD, Mitchell ES, Crawford S, Nan B, Little R, Taffe J. The restage collaboration: defining optimal bleeding criteria for onset of early menopausal transition. Fertility and Sterility. 2008;89(1):129–140. doi: 10.1016/j.fertnstert.2007.02.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hsieh F, Tseng YK, Wang JL. Joint modeling of survival and longitudinal data: likelihood approach revisited. Biometrics. 2006;62(4):1037–1043. doi: 10.1111/j.1541-0420.2006.00570.x. [DOI] [PubMed] [Google Scholar]
Huang CY, Wang MC. Joint modeling and estimation for recurrent event processes and failure time data. Journal of the American Statistical Association. 2004;99(468):1153–1165. doi: 10.1198/016214504000001033. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. 2. Hoboken: John Wiley & Sons, Inc; 2002. [Google Scholar]
Kalbfleisch JD, Schaubel DE, Ye Y, Gong Q. An estimating function approach to the analysis of recurrent and terminal events. Biometrics. 2013;69(2):366–374. doi: 10.1111/biom.12025. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kong S, Nan B. Semiparametric approach to regression with a covariate subject to a detection limit. Biometrika. 2016;103(1):161–174. [Google Scholar]
Lehmann EL. Theory of Point Estimation. New York: Springer-Verlag; 1998. [Google Scholar]
Li Z, Nan B. Relative risk regression for current status data in case-cohort studies. The Canadian Journal of Statistics. 2011;39(4):557–577. [Google Scholar]
Li Z, Tosteson TD, Bakitas MA. Joint modeling quality of life and survival using a terminal decline model in palliative care studies. Statistics in Medicine. 2013;32(8):1394–1406. doi: 10.1002/sim.5635. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lin DY, Wei LJ, Ying Z. Checking the cox model with cumulative sums of martingale-based residuals. Biometrika. 1993;80(3):557–572. [Google Scholar]
Little RJ, Rubin DB. Statistical Analysis with Missing Data. 2. Hoboken: John Wiley & Sons, Inc; 2002. [Google Scholar]
Liu L, Wolfe RA, Kalbfleisch JD. A shared random effects model for censored medical costs and mortality. Statistics in Medicine. 2007;26(1):139–155. doi: 10.1002/sim.2535. [DOI] [PubMed] [Google Scholar]
Lu X, Nan B, Song P, Sowers M. Longitudinal data analysis with event time as a covariate. Statistics in Biosciences. 2010;2(1):65–80. [Google Scholar]
Nan B, Wellner JA. A general semiparametric z-estimation approach for case-cohort studies. Statistica Sinica. 2013;23:1155–1180. [PMC free article] [PubMed] [Google Scholar]
Sowers M, Tomey K, Jannausch M, Eyvazzdh A, Crutchfield M, Nan B, Randolph J. Physical functioning and menopause states. Obstet Gynecol. 2007;110(6):1290–1296. doi: 10.1097/01.AOG.0000290693.78106.9a. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tsiatis AA, Davidian M. Joint modeling of longitudinal and time-to-event data: an overview. Statistica Sinica. 2004;14:809–834. [Google Scholar]
van der Vaart AW. In: Semiparametric Statistics. In Lectures on Probability Theory and Statistics, Ecole d’Ete de Probabilites de Saint-Flour XXIX99. Bernard P, editor. Berlin Heidelberg: Springer-Verlag; 2002. pp. 330–457. [Google Scholar]
van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. New York: Springer-Verlag; 1996. [Google Scholar]
Wellner JA, Zhang Y. Two likelihood-based semiparametric estimation methods for panel count data with covariates. Annals of Statistics. 2007;35(5):2106–2142. [Google Scholar]
Zeng D, Lin DY. Semiparametric transformation models with random effects for joint analysis of recurrent and terminal events. Biometrics. 2009;65(3):746–752. doi: 10.1111/j.1541-0420.2008.01126.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp1

NIHMS1502792-supplement-Supp1.pdf^{(188.3KB, pdf)}

[R1] Albert PS, Shih JH. An approach for jointly modeling multivariate longitudinal measurements and discrete time-to-event data. The Annals of Applied Statistics. 2010;4(3):1517–1532. doi: 10.1214/10-AOAS339. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] Andersen PK, Gill RD. Cox’s regression model for counting processes: a large sample study. The Annals of Statistics. 1982;10(4):1100–1120. [Google Scholar]

[R3] Breslow NE. Discussion of “Regression models and life-tables” by D. R. Cox. Journal of the Royal Statistical Society, Series B. 1972;34(2):216–217. [Google Scholar]

[R4] Chan K, Wang M. Backward estimation of stochastic processes with failure events as time origins. The Annals of Applied Statistics. 2010;4(3):1602–1620. doi: 10.1214/09-AOAS319. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] Cox DR. Regression models and life-tables (with discussion) Journal of the Royal Statistical Society, Series B. 1972;34(2):187–220. [Google Scholar]

[R6] Ding J, Wang JL. Modeling longitudinal data with nonparametric multiplicative random effects jointly with survival data. Biometrics. 2008;64(2):546–556. doi: 10.1111/j.1541-0420.2007.00896.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] Ghosh D, Lin DY. Marginal regression models for recurrent and terminal events. Statistica Sinica. 2002;12:663–688. [Google Scholar]

[R8] Harlow SD, Mitchell ES, Crawford S, Nan B, Little R, Taffe J. The restage collaboration: defining optimal bleeding criteria for onset of early menopausal transition. Fertility and Sterility. 2008;89(1):129–140. doi: 10.1016/j.fertnstert.2007.02.015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] Hsieh F, Tseng YK, Wang JL. Joint modeling of survival and longitudinal data: likelihood approach revisited. Biometrics. 2006;62(4):1037–1043. doi: 10.1111/j.1541-0420.2006.00570.x. [DOI] [PubMed] [Google Scholar]

[R10] Huang CY, Wang MC. Joint modeling and estimation for recurrent event processes and failure time data. Journal of the American Statistical Association. 2004;99(468):1153–1165. doi: 10.1198/016214504000001033. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. 2. Hoboken: John Wiley & Sons, Inc; 2002. [Google Scholar]

[R12] Kalbfleisch JD, Schaubel DE, Ye Y, Gong Q. An estimating function approach to the analysis of recurrent and terminal events. Biometrics. 2013;69(2):366–374. doi: 10.1111/biom.12025. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] Kong S, Nan B. Semiparametric approach to regression with a covariate subject to a detection limit. Biometrika. 2016;103(1):161–174. [Google Scholar]

[R14] Lehmann EL. Theory of Point Estimation. New York: Springer-Verlag; 1998. [Google Scholar]

[R15] Li Z, Nan B. Relative risk regression for current status data in case-cohort studies. The Canadian Journal of Statistics. 2011;39(4):557–577. [Google Scholar]

[R16] Li Z, Tosteson TD, Bakitas MA. Joint modeling quality of life and survival using a terminal decline model in palliative care studies. Statistics in Medicine. 2013;32(8):1394–1406. doi: 10.1002/sim.5635. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] Lin DY, Wei LJ, Ying Z. Checking the cox model with cumulative sums of martingale-based residuals. Biometrika. 1993;80(3):557–572. [Google Scholar]

[R18] Little RJ, Rubin DB. Statistical Analysis with Missing Data. 2. Hoboken: John Wiley & Sons, Inc; 2002. [Google Scholar]

[R19] Liu L, Wolfe RA, Kalbfleisch JD. A shared random effects model for censored medical costs and mortality. Statistics in Medicine. 2007;26(1):139–155. doi: 10.1002/sim.2535. [DOI] [PubMed] [Google Scholar]

[R20] Lu X, Nan B, Song P, Sowers M. Longitudinal data analysis with event time as a covariate. Statistics in Biosciences. 2010;2(1):65–80. [Google Scholar]

[R21] Nan B, Wellner JA. A general semiparametric z-estimation approach for case-cohort studies. Statistica Sinica. 2013;23:1155–1180. [PMC free article] [PubMed] [Google Scholar]

[R22] Sowers M, Tomey K, Jannausch M, Eyvazzdh A, Crutchfield M, Nan B, Randolph J. Physical functioning and menopause states. Obstet Gynecol. 2007;110(6):1290–1296. doi: 10.1097/01.AOG.0000290693.78106.9a. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] Tsiatis AA, Davidian M. Joint modeling of longitudinal and time-to-event data: an overview. Statistica Sinica. 2004;14:809–834. [Google Scholar]

[R24] van der Vaart AW. In: Semiparametric Statistics. In Lectures on Probability Theory and Statistics, Ecole d’Ete de Probabilites de Saint-Flour XXIX99. Bernard P, editor. Berlin Heidelberg: Springer-Verlag; 2002. pp. 330–457. [Google Scholar]

[R25] van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. New York: Springer-Verlag; 1996. [Google Scholar]

[R26] Wellner JA, Zhang Y. Two likelihood-based semiparametric estimation methods for panel count data with covariates. Annals of Statistics. 2007;35(5):2106–2142. [Google Scholar]

[R27] Zeng D, Lin DY. Semiparametric transformation models with random effects for joint analysis of recurrent and terminal events. Biometrics. 2009;65(3):746–752. doi: 10.1111/j.1541-0420.2008.01126.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Conditional modeling of longitudinal data with terminal event

Shengchun Kong

Bin Nan

John D Kalbfleisch

Rajiv Saran

Richard Hirth

Roles

Abstract

1 Introduction

2 A Nonlinear Regression Model with Mixed Effects and Censored Covariate

2.1 Complete data model with observed terminal event time

2.2 Observed data model with potentially censored terminal event time

3 The Pseudo-likelihood Method

4 Asymptotic Properties

Theorem 4.1. (Complete case)

Theorem 4.2. (Two-stage)

5 Numerical Results

5.1 Simulations

Table 1.

Table 4.

5.2 End-stage renal disease

Figure 1.

Figure 2.

Table 5.

Figure 3.

6 Discussion

Supplementary Material

Table 2.

Table 3.

Acknowledgments

A Appendix

A.1 Regularity conditions

Condition 1

Condition 2

Condition 3

Condition 4

Condition 5

Condition 6

Condition 7

Condition 8

REMARK

A.2 Proofs of Theorem 4.1 and 4.2

A.2.1 Proof of consistency in Theorem 4.1 for complete case analysis estimator

Proof

A.2.2 Proof of asymptotic normality in Theorem 4.1 for complete case analysis estimator

Proof

A.2.3 Proof of consistency in Theorem 4.2 for two-stage estimator

Proof

A.2.4 Proof of asymptotic normality in Theorem 4.2 for two-stage estimator

Proof

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases