A maximum pseudo-profile likelihood estimator for the Cox model under length-biased sampling

Chiung-Yu Huang; Jing Qin; Dean A Follmann

doi:10.1093/biomet/asr072

. 2012 Jan 27;99(1):199–210. doi: 10.1093/biomet/asr072

A maximum pseudo-profile likelihood estimator for the Cox model under length-biased sampling

Chiung-Yu Huang ¹, Jing Qin ², Dean A Follmann ³

PMCID: PMC3667656 PMID: 23843659

Abstract

This paper considers semiparametric estimation of the Cox proportional hazards model for right-censored and length-biased data arising from prevalent sampling. To exploit the special structure of length-biased sampling, we propose a maximum pseudo-profile likelihood estimator, which can handle time-dependent covariates and is consistent under covariate-dependent censoring. Simulation studies show that the proposed estimator is more efficient than its competitors. A data analysis illustrates the methods and theory.

Keywords: Approximate likelihood, Cross-sectional sampling, Product-limit estimator, Random truncation, Screening trials

1. Introduction

When studying the natural history of a disease, the time from disease onset to an event or failure is usually the focus. An incident cohort approach, which studies initially disease-free subjects from disease onset to failure, can be very inefficient, especially if the disease is uncommon. A prevalent sampling design, which only includes diseased subjects who have not experienced the failure event at the time of recruitment, can be much more efficient. However, the observed survival time is subject to left truncation: those who have experienced the failure event before the recruitment time are not observable. Thus, individuals in the prevalent cohort tend to have slower progression of the disease than those in a typical incident study. As a result, statistical methods such as the Kaplan–Meier estimator that fail to account for left truncation can lead to substantial overestimation of the survival time.

In the case of stable disease, that is, the occurrence of disease onset follows a stationary Poisson process, the survival time in the prevalent cohort is a biased sample of that in the incident population, where the sampling weight is proportional to the length of the survival time. Similarly, the truncation time, from disease onset to recruitment, in the prevalent cohort is also a biased sample of the uniform truncation time in the incident population, and its distribution is related to the underlying survival distribution in a known fashion. We use the term length-biased sampling for left truncation under the assumption of stationary disease incidence. Examples of length-biased sampling include studies of cancer screening trials (Zelen & Feinleib, 1969; Zelen, 2004), HIV prevalent cohort studies (Lagakos et al., 1988) and unemployment duration (Lancaster, 1979; de Una-Alvarez et al., 2003).

This paper focuses on semiparametric estimation of the Cox proportional hazards model for right-censored survival data under length-biased sampling. Intuitively, efficient estimation can be achieved by maximizing the full semiparametric likelihood with respect to the regression parameter and the baseline hazard function. The maximum likelihood approach, however, involves high-dimensional maximization, and hence may cause computational concerns for large sample sizes. Estimation of a finite-dimensional parameter in the presence of an infinite-dimensional nuisance parameter has been studied by a number of authors. In particular, Severini & Wong (1992) and Zucker (2005) generalized the profile likelihood method by replacing the nuisance parameters in the full likelihood or the partial likelihood with a consistent estimator that may depend on the parameter of interest. In this paper, we follow their idea to propose a semiparametric estimation procedure for the Cox model under length-biased sampling. Specifically, we replace the hazard function in the full likelihood with a Breslow-type estimator for the hazard function to obtain a pseudo-profile likelihood function. Thus, a consistent estimator of the regression parameters can be easily derived by maximizing the pseudo-profile likelihood. Unlike other bias-adjusted risk-set methods, including Ghosh (2008), Tsai (2009) and Qin & Shen (2010), the proposed estimation procedure does not involve estimation of the censoring distribution, so it is expected to be more stable when the censoring proportion is high.

2. Model and estimation methods

2.1. Data and model set-up

For subjects in the target disease population, let T⁰ denote the time from the disease incidence to the failure event of interest, W⁰ denote the calendar time of the disease incidence and X⁰ denote a p × 1 vector of covariates. Assume that the sampling time, ξ, is independent of (W⁰, T⁰, X⁰). An individual would be qualified to be sampled at time ξ only if T⁰ + W⁰ ⩾ ξ ⩾ 0. Denote by (W, T, X) the random variables from the prevalent population. The probability distribution of (W, T, X) is the same as the probability distribution of (W⁰, T⁰, X⁰) conditional on T⁰ + W⁰ ⩾ ξ ⩾ W⁰.

In practice, the observation of failure time T in the prevalent cohort is subject to right censoring due to the study ending or premature dropout. The censoring time measured from recruitment, C, is usually assumed to be independent of (T, A) given X. However, the total censoring time A + C and the survival time T are correlated, as they share the same A. Let Y = min(T, A + C) denote the follow-up time until failure or censoring, and let Δ = I (T ⩾ A + C) be the indicator of failure. For subject i ∈ {1, …, n}, denote by x_i the covariate vector, by y_i and a_i the observed survival time and truncation time, and by δ_i the indicator of an uncensored event time. The observed data (y_i, a_i, δ_i, x_i) for i = 1, …, n are assumed to be independent and identically distributed realizations of (Y, A, Δ, X).

Denote by f(t | x) and S(t | x) the conditional density function and survival function of T⁰ = t given X⁰ = x, and let $μ (x) = \int_{0}^{\infty} u f (u | x) d u$ be the conditional mean of T⁰ given X⁰ = x. We impose the following conditions for incident population random variables.

Assumption 1. The variable (T⁰, X⁰) is independent of when the disease incidence occurs, W⁰.

Assumption 2. Disease incidence occurs over calendar time at a constant rate.

Under Assumptions 1 and 2, the joint density function of (A, T) given X = x evaluated at (a, t) is f (t | x)μ(x)⁻¹ I (t > a > 0) (Lancaster, 1990, Ch. 3), and the survival time T given X = x has a length-biased density function tf (t | x)μ(x)⁻¹.

We assume that the survival time T⁰ in the incident population follows the Cox (1972) proportional hazards model λ(t | x) = λ(t) exp(β′x), where λ(t) is an unspecified, continuous baseline hazard function and β is a vector of p × 1 regression parameters. Let $Λ (t) = \int_{0}^{t} λ (u) d u$ be the cumulative baseline hazard function. Under Assumptions 1 and 2 and the independence of C and (T, A) given X, the full likelihood function is proportional to

ℒ (β, Λ) = \prod_{i = 1}^{n} \frac{f {(y_{i} | x_{i})}^{δ_{i}} S {(y_{i} | x_{i})}^{1 - δ_{i}}}{μ (x_{i})} = \prod_{i = 1}^{n} \frac{{λ (y_{i}) exp (β^{'} x_{i})}^{δ_{i}} exp {- Λ (y_{i}) exp (β^{'} x_{i})}}{\int_{0}^{\infty} exp {- Λ (u) exp (β^{'} x_{i})} d u} .

(1)

2.2. Brief review of existing methods

The likelihood (1) can be re-expressed as the product of the truncation likelihood conditional on A and the marginal likelihood of A:

ℒ (β, Λ) = ℒ_{T} (β, Λ) \times ℒ_{M} (β, Λ) = \prod_{i = 1}^{n} {\frac{f {(y_{i} | x_{i})}^{δ_{i}} S {(y_{i} | x_{i})}^{1 - δ_{i}}}{S (a_{i} | x_{i})}} \times \prod_{i = 1}^{n} {\frac{S (a_{i} | x_{i})}{μ (x_{i})}} .

Written in this way, we see that there is information about the regression parameter β in ℒ_M(β, Λ). The truncation likelihood ℒ_T can be further decomposed as the product of the partial likelihood (Kalbfleisch & Lawless, 1991)

ℒ_{P} (β) = \prod_{i = 1}^{n} {\frac{exp (β^{'} x_{i})}{\sum_{j = 1}^{n} exp (β^{'} x_{j}) I (a_{j} ⩽ y_{i} ⩽ y_{j})}}^{δ_{i}},

and the residual likelihood ℒ_R(β, Λ). Wang et al. (1993) showed that ℒ_P is fully efficient with respect to ℒ_T. However, under length-biased sampling the maximum partial likelihood estimator is expected to be inefficient, because it ignores information in ℒ_M(β, Λ).

Various methods that better exploit the special structure of length-biased survival data have been proposed in the literature. Let G(t) be the survival function of the censoring time C, and let Ĝ (t) be the Kaplan–Meier estimator of G(t) based on {(y_i − a_i, 1 − δ_i) : i = 1, …, n}. Qin & Shen (2010) proposed to solve the weighted estimating equation

U_{1} (β) = \sum_{i = 1}^{n} δ_{i} [x_{i} - \frac{\sum_{j = 1}^{n} δ_{i} x_{j} exp (β^{'} x_{j}) {y_{j} \hat{G} (y_{j} - a_{j})}^{- 1} I (y_{j} ⩾ y_{i})}{\sum_{j = 1}^{n} δ_{j} exp (β^{'} x_{j}) {y_{j} \hat{G} (y_{j} - a_{j})}^{- 1} I (y_{j} ⩾ y_{i})}] = 0,

where the contribution of a subject in the risk set is inversely weighted by the probability of the subject being sampled and uncensored. This estimating method, however, might be unstable as the weight function $y_{j}^{- 1} \hat{G} {(y_{j} - a_{j})}^{- 1}$ involves estimation of the tail probability of the censoring distribution. As an alternative, Qin & Shen (2010) considered solving the estimating equation

U_{2} (β) = \sum_{i = 1}^{n} δ_{i} [x_{i} - \frac{\sum_{j = 1}^{n} δ_{i} x_{j} exp (β^{'} x_{j}) {{\hat{w}}_{c} (y_{j})}^{- 1} I (y_{j} ⩾ y_{i})}{\sum_{j = 1}^{n} δ_{i} exp (β^{'} x_{j}) {{\hat{w}}_{c} (y_{j})}^{- 1} I (y_{j} ⩾ y_{i})}] = 0,

with ${\hat{w}}_{c} (y) = \int_{0}^{y} \hat{G} (u) d u$ . The weight function ŵ_c(y_j)⁻¹ is the integral of the censoring survival function, which is more stable than the weight function $y_{j}^{- 1} \hat{G} {(y_{j} - a_{j})}^{- 1}$ in U₁. A major restriction of the two estimating equation-based methods is that the censoring time must not depend on the covariates. Moreover, the estimating equations only use covariate information from uncensored individuals, suggesting that there is still room for efficiency gains.

2.3. Maximum pseudo-profile likelihood estimator

The maximum likelihood estimator could be obtained by applying the semiparametric profile likelihood method (Murphy & van der Vaart, 2000) to deal with the nuisance parameter Λ. For length-biased sampling data, however, maximizing ℒ with respect to Λ for fixed β is computationally difficult because ℒ involves Λ in a complicated way. Instead of profiling out the nonparametric component Λ in ℒ, we propose to replace Λ(t) with a simple estimate that is consistent and has a n^1/2-convergence rate. This approach has been used in various contexts under various names, including pseudo- and estimated-likelihood estimation (Gong & Samaniego, 1981; Pepe & Fleming, 1991; Severini & Wong, 1992; Zucker, 2005).

Our simple estimate is based on profiling the truncation likelihood ℒ_T(β, Λ). Specifically, for fixed β, the truncation likelihood ℒ_T(β, Λ) is maximized by the Breslow-type estimator

{\hat{Λ}}_{β} (t) = \int_{0}^{t} \frac{d {\sum_{j = 1}^{n} δ_{j} I (y_{j} ⩽ u)}}{\sum_{j = 1}^{n} exp (β^{'} x_{j}) I (a_{j} ⩽ u ⩽ y_{j})}

in the class of nondecreasing right-continuous functions which jump only at uncensored failure times. Note that Λ̂_β(t) can be generalized to handle time-varying covariates. Profiling out Λ from the truncation likelihood ℒ_T(β, Λ) yields the partial likelihood, that is, ℒ_T(β, Λ̂_β) = ℒ_P(β). Replacing Λ with Λ̂_β in the full likelihood ℒ, we obtain a pseudo-profile likelihood function,

ℒ (β, {\hat{Λ}}_{β}) = ℒ_{T} (β, {\hat{Λ}}_{β}) \times ℒ_{M} (β, {\hat{Λ}}_{β}) = ℒ_{P} (β) \times \prod_{i = 1}^{n} \frac{exp {- {\hat{Λ}}_{β} (a_{i}) exp (β^{'} x_{i})}}{\int_{0}^{\infty} exp {- {\hat{Λ}}_{β} (u) exp (β^{'} x_{i})} d u} .

We propose to estimate the regression parameter β by maximizing the pseudo-profile likelihood.

Assume that T⁰, and hence T, has a finite maximal support τ, where τ = sup{t : pr(T⁰ ⩽ t) < 1} < ∞. Then τ is also the maximal support for the truncation time random variable A, as A given T has a uniform distribution on [0, T]. We further assume that C is not degenerate at 0, that is, pr(C > 0) > 0. Then it can be shown that max Δ_i Y_i → τ as n → ∞. Thus, Λ(t) is estimable on the interval [0, τ]; as a result, the conditional mean of T⁰ given X is also estimable. Let N_i (t) = δ_iI (y_i ⩽ t) be the counting process of observed failure events for subject i, and denote $\bar{N} (t) = n^{- 1} \sum_{i = 1}^{n} N_{i} (t)$ and F^u(t) = pr(Δ = 1, Y ⩽ t). Define the functions $S^{(k)} (u, β) = n^{- 1} \sum_{i = 1}^{n} x_{i}^{\otimes k} exp (β^{'} x_{i}) I (a_{i} ⩽ u ⩽ y_{i}) (k = 0, 1, 2)$ , and let 𝒮^(k)(u, β) = E{X^⊗k exp(β′X)I (A ⩽ u ⩽ Y)} be the expectations. Assume that X is bounded, that the two classes of functions {ΔI (Y ⩽ t) : t ∈ [0, τ]} and {X^⊗k exp(β′ X)I (A ⩽ t ⩽ Y) : t ∈ [0, τ], β ∈ Θ} are both Glivenko–Cantelli, as the class of indicator functions and the class of bounded monotone functions are Glivenko–Cantelli (van der Vaart & Wellner, 1996, Theorems 2.4.1 and 2.7.5). Moreover, because S⁽⁰⁾(t, β) is bounded away from zero, we can show that sup_{t∈[0,τ],β∈Θ} | Λ̂_β(t) − Λ_β(t) |→ 0 almost surely as n → ∞, where

\begin{array}{l} Λ_{β} (t) = \int_{0}^{t} \frac{d F^{u} (u)}{𝒮^{(0)} (u, β)} . \end{array}

(2)

The limit Λ_β(t) of Λ̂_β(t) defines a smooth mapping in β, and it passes through the true baseline cumulative hazard function Λ(t) when β equals the true parameter value. If we regard (2) as a known function of β, the function ℒ(β, Λ_β) can be viewed as the full likelihood function derived under an induced parametric submodel λ(t | x) = λ_β(t) exp(β′ X).

Replacing Λ with Λ̂_β in ℒ(β, Λ), we obtain a log pseudo-profile likelihood function ℓ(β) = ℓ_P(β) + ℓ_M(β), where

ℓ_{P} (β) = \sum_{i = 1}^{n} \int_{0}^{τ} [β^{'} x_{i} - log {S^{(0)} (u, β)}] d N_{i} (u)

is the log partial likelihood obtained by profiling out Λ from the truncation likelihood ℒ_T, and

ℓ_{M} (β) = - \sum_{i = 1}^{n} {{\hat{Λ}}_{β} (a_{i}) exp (β^{'} x_{i}) + log {\hat{μ}}_{β} (x_{i})},

with ${\hat{μ}}_{β} (x_{i}) = \int_{0}^{\infty} exp {- {\hat{Λ}}_{β} (u) exp (β^{'} x_{i})} d u$ . We show in the Appendix that, in a compact neighbourhood of the true regression parameter, ℓ(β) can be approximated by ℓ̃(β) = ℓ̃_P(β) + ℓ̃_M(β), where ${\tilde{ℓ}}_{P} (β) = \sum_{i = 1}^{n} \int_{0}^{τ} [β^{'} x_{i} - log {𝒮^{(0)} (u, β)}] d N_{i} (u)$ , ${\tilde{ℓ}}_{M} (β) = - \sum_{i = 1}^{n} {Λ_{β} (a_{i}) exp (β^{'} x_{i}) + log μ_{β} (x_{i})}$ and $μ_{β} (x_{i}) = \int_{0}^{\infty} exp {- Λ_{β} (u) exp (β^{'} x_{i})} d u$ . Thus, ℓ and ℓ̃ have similar local behaviour in the compact neighbourhood, and the asymptotic properties of the maximum pseudo-profile likelihood estimator can be investigated through ℓ̃.

Define the limit function γ(β) = lim_n→∞ n⁻¹ℓ̃(β) = lim_n→∞n⁻¹ {ℓ̃_P(β) + ℓ̃_M(β)}. We denote the true parameter values of the proportional hazards model by {β₀, λ₀(·)}, and define $Λ_{0} (t) = \int_{0}^{t} λ_{0} (u) d u$ . Theorem 1 summarizes the consistency and asymptotic normality of β̂ that maximizes the log pseudo-profile likelihood function ℓ(β), with proofs given in the Appendix.

Theorem 1. Assume the following conditions hold: (a) β₀ lies in the interior of a known compact set Θ in ℝ^p; (b) X is bounded; (c) pr(Y ⩾ t) is a continuous function for t ∈ [0, τ] and (d) ∂²γ(β₀)/∂β′∂β is nonsingular. Then β̂ → β₀ in probability as n → ∞. Moreover, n^1/2 (β̂ − β₀) converges in distribution to a zero mean multivariate normal distribution with variance-covariance matrix ∑(β₀), where ∑(β₀) is specified in the Appendix.

While the asymptotic variance ∑(β₀) may be estimated by its empirical version, the computation is quite complicated. Since we have established the asymptotic normality, it is computationally more convenient to use the bootstrap method. The performance of the proposed estimator is evaluated in § 3 via simulations.

2.4. Efficiency considerations

To investigate the potential efficiency gains in the proposed pseudo-profile likelihood estimator, we first consider the case that Λ is parameterized by a vector of q × 1 parameters ν, that is, Λ(t) = Λ(t, ν). For model identifiability, we assume without loss of generality that E(X) = 0. Define the log truncation likelihood function ℓ_T = log(ℒ_T). The proposed method is equivalent to solving the system of estimating equations ∂ℓ_T/∂β + ∂ℓ_M/∂β = 0 and ∂ℓ_T/∂ν = 0. Let η = (∂ℓ_T/∂β, ∂ℓ_T/∂ν, ∂ℓ_M/∂β) be a vector of score functions. Define

(\begin{array}{l} a_{11} & a_{12} \\ {a^{'}}_{12} & a_{22} \end{array}) = - E (\begin{array}{l} \partial^{2} ℓ_{T} / \partial β^{'} \partial β & \partial^{2} ℓ_{T} / \partial β^{'} \partial ν \\ \partial^{2} ℓ_{T} / \partial β \partial ν^{'} & \partial^{2} ℓ_{T} / \partial ν \partial ν \end{array}),

and let b₁ = −E(∂²ℓ_M/∂β′∂β) and b₂ = −E(∂²ℓ_M/∂β′∂ν). Denote ν^* = (β, ν). Then the optimal linear combination of estimating functions is

E {(\frac{\partial η}{\partial ν *})}^{'} var {(η)}^{- 1} η = (\begin{array}{l} a_{11} & a_{12} & b_{1} \\ {a^{'}}_{12} & a_{22} & {b^{'}}_{2} \end{array}) {(\begin{array}{l} a_{11} & a_{12} & 0 \\ {a^{'}}_{12} & a_{22} & 0 \\ 0 & 0 & b_{1} \end{array})}^{- 1} η = (\begin{array}{l} I_{p} & 0 & I_{p} \\ 0 & I_{q} & {b^{'}}_{2} b_{1}^{- 1} \end{array}) η,

(3)

where, for convenience, 0 denotes a matrix of 0s of appropriate dimensions and I_p is a p × p identity matrix. It can be verified that, when evaluated at the true parameter values, b₂ = n/2 × E(X exp(2β′ X) × ∂/∂ν^′[Λ(A)² − E{Λ(A) | X}²]). Hence, if b₂ = 0, the system of estimating equations ∂ℓ_T/∂β + ∂ℓ_M/∂β = 0 and ∂ℓ_T/∂ν = 0 is the optimal linear combination of estimating equations based on η. The partial likelihood method solves the system of estimating equations ∂ℓ_T/∂β = 0 and ∂ℓ_T/∂ν = 0, which also belongs to the class of linear combinations of estimating equations based on η. Thus, the proposed method is more efficient than the partial likelihood method when b₂ = 0.

When the baseline hazard function λ is of infinite dimension, the proposed pseudo-profile likelihood method solves the system (van der Vaart, 1998, § 25.12)

ℙ_{n} (\partial ℓ_{T} / \partial β + \partial ℓ_{T} / \partial β) = 0, ℙ_{n} Ψ h - P Ψ h = 0, (h \in H),

where Ψ is the score operator (Begun et al., 1983) for Λ based on the truncation likelihood ℓ_T = log ℒ_T and H is a infinite-dimensional class of direction h from which paths of one-dimensional submodels for Λ may approach the true parameter. We use ℙ_n to denote the empirical measure, and use P for the probability measures. Let L²(μ) denote the Hilbert space that contains square integrable functions with the inner product 〈g, h〉_μ = ∫ g(u)h(u) dμ(u) for g, h ∈ L² (μ). It is easy to see that ℋ ⊂ L²(Λ). Applying a similar argument as in van der Vaart (1998, § 25.12.1), we can show that the score operator Ψ : L²(Λ) → L²(P_β,Λ) for Λ is given by $Ψ (h) = \int_{0}^{τ} h (u) d M (u)$ . Let H̄ be a Hilbert space containing ℋ. The adjoint operator Ψ^* : L²(P_β,Λ) → H̄ of Ψ, which satisfies $E {Ψ (g) h} = \int_{0}^{τ} g (u) Ψ * (h) (u) d Λ (u)$ for all g ∈ H̄ and h ∈ L²(P_β,Λ), can be shown to be Ψ^*(g)(t) = E{gd M(t)}/dΛ(t). It can be further shown that Ψ^*Ψ(h)(t) = E{h(t) exp(β′X)I (Y ⩾ t ⩾ A)} and Ψ^*(∂ℓ_T/∂β)(t) = nE{X exp(β′X)I (Y ⩾ t ⩾ A)} (Murphy & van der Vaart, 2000).

By a similar argument as above, we show in the Supplementary Material that the score operator Φ : L²(Λ) → L²(P_β,Λ) for Λ based on the marginal likelihood ℓ_M is $Φ (h) = - \int_{0}^{τ} [h (u) X exp (β^{'} X) {I (A ⩾ u) - pr (A ⩾ u | X)}] d Λ (u)$ . The adjoint operator Φ^* : L²(P_β,Λ) → H̄ of Φ can be shown to be Φ^*(g)(t) = −E[g{I (A ⩾ t) − pr(A ⩾ t | X)} exp(β′X)]. Moreover, the adjoint operator Φ^* satisfies $Φ * Φ (h) (t) = E [\int_{0}^{τ} h (u) {I (A ⩾ u) - pr (A ⩾ u | X)} d Λ (u) {I (A ⩾ t) - pr (A ⩾ t | X)} exp (2 β^{'} X)]$ and Φ^*(∂ℓ_M/∂β)(t) = nE[{I (A ⩾ t) − pr(A ⩾ t | X)}{Λ(A) − E(Λ(A) | X)}X exp(2β′X)].

Analogous to (3) for parametric models, the optimal combination of estimating functions based on the score operators ∂ℓ_T/∂β, ∂ℓ_M/∂β and Ψ is given by ℙ_n(∂ℓ_T/∂β + ∂ℓ_T/∂ β) = 0 and $(ℙ_{n} - P) Ψ h + {(b_{2}^{*})}^{'} b_{1}^{- 1} \partial ℓ_{T} / \partial β = 0$ , h ∈ H, where b₁ = − E(∂²ℓ_M/∂β′ ∂β) and $b_{2}^{*} = \int_{0}^{τ} Φ * (\partial ℓ_{M} / \partial β) (t) d Λ (t) = n E ([Λ {(A)}^{2} - E {Λ (A) | X}^{2}] X exp (2 β^{'} X))$ . Hence, if $b_{2}^{*} = 0$ , the proposed pseudo-partial likelihood method is the most efficient estimator in the class of linear combinations of estimating functions based on ∂ℓ_T/∂β, ∂ℓ_M/∂β and Ψ. The weight ${(b_{2}^{*})}^{T} b_{1}^{- 1}$ can be estimated by replacing Λ with Λ̂ in the corresponding empirical estimators. In general, solving the optimal combination of estimation equations is computationally intensive, and hence is impractical. Moreover, there is no guarantee that it works better than the proposed method for small samples.

3. Simulations and data analysis

3.1. Monte-Carlo simulations

We conducted simulations to assess the performance of the proposed methods. In each simulation, 2000 studies were generated, each with n = 400. The sampling time ξ was set to be 100, and the time of disease onset, W⁰, was simulated from a uniform distribution over [0, 100] to mimic the incidence of a stable disease. For each subject, we generated $X_{1}^{0}$ from the Bernoulli distribution with $pr (X_{1}^{0} = 1) = pr (X_{1}^{0} = 0) = 0.5$ and generated $X_{2}^{0}$ from the standard normal distribution. The survival time T⁰ was independently generated from one of the three models: (I) an exponential distribution with hazard function $2 exp (X_{1}^{0} + X_{2}^{0})$ , (II) a Weibull distribution with hazard function $2 t exp (X_{1}^{0} + X_{2}^{0})$ or (III) a Weibull distribution with hazard function $0.5 {(t - 2)}^{2} exp (X_{1}^{0} + X_{2}^{0})$ . Thus, we simulated failure time distributions with constant, increasing and U-shape hazards. To form a prevalent cohort of sample size n, realizations of (W⁰, T⁰, $X_{1}^{0}$ , $X_{2}^{0}$ ) were generated repeatedly until there were n subjects satisfying the sampling constraint W⁰ + T⁰ ⩾ τ. The time from enrolment ξ to loss to follow-up was generated from a uniform distribution so that the censoring rate was approximately 0, 30 and 50%.

We compared the finite-sample performance of the proposed pseudo-profile likelihood method with those of the weighted estimating equation methods studied in Qin & Shen (2010) and of the popular partial likelihood method for truncated survival time data. By applying these methods to estimate the Cox model λ(t | X₁, X₂) = λ₀(t) exp(β₁X₁ + β₂X₂), we evaluated the relative efficiency by comparing the bootstrap variance of the maximum partial likelihood estimator to that of the other methods. Table 1 summarizes the empirical bias, empirical standard error and the relative efficiency of these four estimation methods. All four estimators are close to their estimands. In the absence of censoring, the pseudo-profile likelihood method has a similar efficiency gain as the weighted estimating equation methods in Qin & Shen (2010). Overall, the relative efficiency of the proposed estimator increases with censoring rate. When the censoring proportion reaches 50%, the pseudo-profile likelihood estimator yields a significant improvement over the maximum partial likelihood estimator, with an efficiency gain greater than 50% in the exponential and Weibull cases, and an efficiency gain greater than 20% in the U-shape hazard function scenario. In the presence of censoring, the proposed pseudo-profile method always outperforms its competitors. In some scenarios, weighted estimating equation methods fail to show improvement, as these methods only use covariate information from uncensored subjects.

Table 1.

Summary statistics for the estimated regression parameters under independent censoring

Proportion censored	Estimated coefficient	Partial		WEE-1		WEE-2		Profile
Proportion censored	Estimated coefficient	Bias	SE	Bias	SE	Bias	SE	Bias	SE	RE
Scenario I: λ₀(t) = 2
0%	β̂₁	6	133	−1	98	−1	98	−2	98	1.84
0%	β̂₂	5	83	0	65	0	65	−2	65	1.60
30%	β̂₁	6	151	−46	136	2	120	1	108	1.98
30%	β̂₂	6	94	−57	90	2	82	4	77	1.50
50%	β̂₁	10	171	−113	189	10	157	12	122	1.96
50%	β̂₂	6	112	−125	115	2	98	10	90	1.53
Scenario II: λ₀(t) = 2t
0%	β̂₁	4	118	0	98	0	98	−1	97	1.45
0%	β̂₂	2	74	0	63	0	63	−1	63	1.40
30%	β̂₁	3	132	−21	137	9	121	3	105	1.58
30%	β̂₂	5	89	−28	90	4	80	2	73	1.46
50%	β̂₁	6	159	−101	206	10	154	5	118	1.82
50%	β̂₂	8	100	−97	118	6	96	5	79	1.62
Scenario III: λ₀(t) = 0.5(t − 2)²
0%	β̂₁	6	112	5	104	5	104	5	104	1.17
0%	β̂₂	6	69	5	65	5	65	5	64	1.14
30%	β̂₁	10	134	10	143	10	130	9	122	1.21
30%	β̂₂	7	82	0	85	3	80	4	77	1.14
50%	β̂₁	9	151	−34	216	7	154	6	134	1.27
50%	β̂₂	7	97	−35	123	7	98	5	88	1.20

Open in a new tab

Partial, the maximum partial likelihood estimator; WEE-1 and WEE-2, estimators derived by solving U₁(β) = 0 and U₂(β) = 0; Profile, the maximum pseudo-profile likelihood estimator; Bias and ES, empirical bias (×1000) and empirical standard deviation (×1000) of 2000 regression parameter estimates; RE, the empirical variance of the maximum partial likelihood estimator divided by that of the maximum pseudo-profile likelihood estimator.

In addition to better efficiency, another advantage of the proposed pseudo-profile likelihood method is that it does not involve estimation of the censoring distribution. When this distribution depends on the covariate, the estimating equation methods may yield biased estimation. For demonstration, we simulated survival time data under Model (II). The censoring times for subjects with observed covariates X₁ = 1 and X₂ < 0 were generated from an exponential distribution with mean 5 exp(−X₂), while the censoring times for other subjects were generated from a uniform distribution. The overall censoring proportion was set at approximately 30 and 50%. As summarized in Table 2, the estimating equation-based methods yield biased estimators, while the bias of the pseudo-profile estimators remains small.

Table 2.

Empirical bias and standard error of estimators of estimated regression parameters under covariate dependent censoring

Proportion censored	Estimated coefficient	Partial		WEE-1		WEE-2		Profile
Proportion censored	Estimated coefficient	Bias	SE	Bias	SE	Bias	SE	Bias	SE	RE
30%	β̂₁	7	132	−388	135	−127	119	2	105	1.56
30%	β̂₂	8	87	48	84	30	79	4	73	1.44
50%	β̂₁	10	166	−819	175	−252	161	−2	128	1.69
50%	β̂₂	8	103	82	107	51	97	6	80	1.64

Open in a new tab

See Table 1 for abbreviations.

3.2. Analysis of Canadian Study of Health and Aging

In this section, we report the results of data analysis for a cohort of prevalent cases in one of the largest epidemiologic studies of dementia, the Canadian Study of Health and Aging. From February 1991 to May 1992, an extensive survey was carried out and a total of 1132 persons aged 65 and older with dementia were identified in this first phase of the study. For each study subject, a diagnosis of possible Alzheimer’s disease, probable Alzheimer’s disease, or vascular dementia was assigned, and the date of dementia onset was determined by interviewing care-givers. Information on mortality were collected between January 1996 and May 1997.

We considered a subset of the study data by excluding those with missing date of onset or missing dementia subtype classification. Moreover, as in Wolfson et al. (2001), those with observed survival times greater than or equal to 20 years were excluded because these subjects are considered unlikely to have Alzheimer’s disease or vascular dementia. As a result, a total of 807 dementia patients were included in our analysis. Among them 388 had a diagnosis of probable Alzheimer’s disease, 249 had possible Alzheimer’s disease and 170 had vascular dementia. In the second phase of the study, a total of 627 deaths were recorded, among whom 302 had a diagnosis of probable Alzheimer’s, 189 had possible Alzheimer’s and 136 had vascular dementia.

The stationarity assumption that the incidence of dementia is constant over time was found to be reasonably met for this data using the method suggested in Wang (1991). To compare the risk of death between different diagnoses, we fit a Cox proportional hazards model for the length-biased survival time data, with indicators of probable Alzheimer’s and vascular dementia as covariates. We applied the pseudo-profile likelihood method, the two weighted estimating equation methods in Qin & Shen (2010), and the partial likelihood method. The estimated regression coefficients are summarized in Table 3. The proposed method yields similar estimates of the regression parameters as do the estimating equation methods, and the bootstrap standard errors of the proposed estimator are smaller than those of its competitors. The proposed pseudo-profile likelihood method estimates a significant higher risk of death in patients with probable Alzheimer’s and those with vascular dementia. Specifically, as compared with patients with possible Alzheimer’s, the risk of death increased by 16% among those with probable Alzheimer’s and by 27% among those with vascular dementia. For β₁, the variance ratio for the competitors to the proposed method is always at least 1.67. This suggests that if a competitor method were used in lieu of the proposed method, the study would need to recruit at least 760 more subjects to achieve the same precision.

Table 3.

Estimated regression coefficients for the Canadian study

	β₁, probable Alzheimer’s			β₂, vascular dementia
Method	Estimate	SE	95% CI	Estimate	SE	95% CI
Partial	0.030	0.089	(−0.142, 0.203)	0.113	0.109	(−0.103, 0.323)
EE-1	0.130	0.095	(−0.058, 0.312)	0.278	0.107	(0.070, 0.497)
EE-2	0.157	0.088	(−0.022, 0.328)	0.257	0.121	(0.038, 0.519)
Profile	0.150	0.068	(0.016, 0.278)	0.241	0.088	(0.066, 0.419)

Open in a new tab

Partial, maximum partial likelihood estimator; EE-1 and EE-2, estimators derived by solving U₁(β) = 0 and U₂(β) = 0; Profile, the pseudo-profile likelihood estimator; SE, the empirical standard deviation of 2000 regression parameter.

4. Remark

The validity of the proposed method relies on the assumption of stable disease. When the stationarity assumption fails to hold, it is not uncommon that knowledge about the distribution of disease incidence can be obtained from other sources. If H denotes the distribution of the truncation time in the disease population, then the transformed survival time H(T⁰) is truncated by a uniformly distributed random variable. Thus, it follows from the fact that the Cox model is invariant under monotone transformation, that the regression coefficients in the Cox model can be consistently estimated by applying the proposed method to the transformed data {H(a_i), H(y_i), δ_i} (i = 1, …, n).

Acknowledgments

The authors thank Professors Ian McDowell, Masoud Asgharian and Christina Wolfson for kindly sharing the Canadian Study of Health and Aging data. The core study was funded by the National Health Research and Development Program, Canada. Additional funding was provided by Pfizer Canada Incorporated through the Medical Research Council/Pharmaceutical Manufacturers Association of Canada Health Activity Program, Bayer Incorporated and the British Columbia Health Research Foundation. The authors also thank the referees, associate editor and editor for their comments which improved the presentation of this article.

Appendix.

Proofs

We begin by establishing the consistency of β̂. In view of the proof of van der Vaart (1998, Theorem 5.7), it suffices to show that, as n → ∞, sup_β∈Θ|n⁻¹ℓ(β) − γ(β) |→ 0 almost surely and that β₀ is the unique maximizer of γ (β) in a compact neighbourhood of β₀.

We first show that, for sufficiently large n, ℓ(β) has similar local behaviour to ℓ̃(β) in the compact neighbourhood Θ. Because {exp(β′X)I (A ⩽ t ⩽ Y) : t ∈ [0, τ], β ∈ Θ} is Glivenko–Cantelli and the logarithmic transformation is monotone, log{S⁽⁰⁾(t, β)} − log{𝒮⁽⁰⁾(t, β)} converges to 0 uniformly over β ∈ Θ and t ∈ [0, τ]. Hence, sup_β∈Θ | n⁻¹ℓ_P(β) − n⁻¹ℓ̃_P(β)| → 0 almost surely. Following the result that Λ̂_β (t) converges to Λ_β(t) uniformly over β ∈ Θ and t ∈ [0, τ], exp{Λ̂_β(t) exp(β′X)} converges to exp{Λ_β (t) exp(β′X)} uniformly over β ∈ Θ and t ∈ [0, τ]. Hence, μ̂_β(x) converges to μ_β (x) uniformly over β ∈ Θ. For a δ_n > 0 with δ_n → 0 as n → ∞, define the class ℱ = [ f (t) = {g(t) − Λ_β (t)} exp(β′ X), where β∈Θ, g is nondecreasing and nonnegative and sup_t∈[0,τ] | g(t) − Λ_β (t) | ⩽ ∊_n]. Thus, by definition, sup_f∈ℱ | P f | ⩽ ∊_n × sup_β∈Θ| exp(β′X) |. Moreover, it follows from van der Vaart & Wellner (1996, Theorems 2.7.5 and 2.4.1) that ℱ is Glivenko–Cantelli. Hence, sup_f∈ℱ | ℙ_n f − P f | → 0 almost surely. For a sufficiently large n, $| n^{- 1} \sum_{i = 1}^{n} {{\hat{Λ}}_{β} (a_{i}) - Λ_{β} (a_{i})} exp (β^{'} X_{i}) | ⩽ {sup}_{f \in ℱ} | P f | + {sup}_{f \in ℱ} | ℙ_{n} f - P f |$ . Thus, we show that ${sup}_{β \in Θ} | n^{- 1} \sum_{i = 1}^{n} {{\hat{Λ}}_{β} (a_{i}) - Λ_{β} (a_{i})} exp (β^{'} X_{i}) | \to 0$ almost surely. By a similar argument, we can show that ${sup}_{β \in Θ} | n^{- 1} \sum_{i = 1}^{n} {{\hat{μ}}_{β} (x_{i}) - μ_{β} (x_{i})} | \to 0$ almost surely, and hence n⁻¹ℓ_M(β) − n⁻¹ ℓ̃_M(β) → 0 uniformly over β ∈ Θ. Thus, ℓ(β) = ℓ_P(β) + ℓ_M(β) and ℓ̃(β) = ℓ_P(β) + ℓ̃_M(β) have similar local behaviour in Θ.

Next, because Θ is compact and the function $m (β) = \int_{0}^{τ} [β^{'} X - log {𝒮^{(0)} (u, β)}] d N (u) - Λ_{β} (A) exp (β^{'} X) - log μ_{β} (X)$ is continuous and dominated by an integrable function, the class of functions {m(β) : β ∈ Θ} is Glivenko–Cantelli (van der Vaart, 1998, Example 19.8). It follows from a uniform law of large numbers (Pollard, 1990) that sup_β∈Θ | n⁻¹ ℓ̃(β) − γ (β) | → 0 almost surely. Thus, sup_β∈Θ | n⁻¹ℓ(β) − γ (β) |→ 0 almost surely as n → ∞.

Below we prove that β₀ is the unique maximizer in a neighbourhood of β₀ by showing that ∂γ (β₀)/∂β = 0 and ∂²γ(β₀)/∂β′∂β is negative definitive at β = β₀. Following the fact that the partial score function has expectation zero when evaluated at β = β₀ and that E [∂ {Λ_β (A) exp(β′X)}/∂β |_β=β₀] = − E{μ_β (X)⁻¹ ∂μ_β (X)/∂β|_β=β₀} by double expectation, we can show that ∂γ (β)/∂β = 0 when β =β₀. Write S_β (u | x) = exp{−Λ_β (u) exp(β′X)}. The second derivative of γ(β) is

\frac{\partial^{2} γ (β)}{\partial β^{'} \partial β} = E [Δ {\frac{𝒮^{(1)} {(Y, β)}^{\otimes 2}}{𝒮^{(0)} {(Y, β)}^{2}} - \frac{𝒮^{(2)} (Y, β)}{𝒮^{(0)} (Y, β)}}]

(A1)

- E [\frac{\partial^{2}}{\partial β^{'} \partial β} {Λ_{β} (A) exp (β^{'} X)}]

(A2)

+ E [\int_{0}^{τ} \frac{S_{β} (u | X)}{μ_{β} (X)} \frac{\partial^{2}}{\partial β^{'} \partial β} {Λ_{β} (u) exp (β^{'} X)} d u]

(A3)

- E (\int_{0}^{τ} \frac{S_{β} (u | X)}{μ_{β} (X)} {[\frac{\partial}{\partial β} {Λ_{β} (u) exp (β^{'} X)}]}^{\otimes 2} d u)

(A4)

+ E ({[\int_{0}^{τ} \frac{S_{β} (u | X)}{μ_{β} (X)} \frac{\partial}{\partial β} {Λ_{β} (u) exp (β^{'} X)} d u]}^{\otimes 2}) .

(A5)

By applying the double expectation technique, it can be shown that (A2) + (A3) = 0 for β = β₀. Moreover, by the Cauchy–Schwarz inequality, both (A1) and (A4) + (A5) are negative semidefinite. Hence, it follows regularity condition (d) that ∂²γ(β)/∂β′∂β is negative definite at β = β₀. Because the function γ(β) is continuous in β, there exists a compact neighbourhood Θ₀ of β₀ that β₀ is the unique maximizer γ(β) in Θ₀. This completes the proof of consistency.

We now prove the asymptotic normality of the maximum pseudo-profile likelihood estimator. A Taylor series expansion yields 0 = ∂ℓ(β)/dβ|_β=β̂ = ∂ℓ(β)/∂β |_β=β₀ + ∂²ℓ(β)/∂β′∂β|_β=β* (β̂ − β₀), where β^* lies between β̂ and β₀. Thus, by consistency of β̂, one has β* → β₀ in probability and

n^{1 / 2} (\hat{β} - β_{0}) = - {n^{- 1} {\frac{\partial^{2} ℓ (β)}{\partial β^{'} \partial β} |}_{β = β_{0}}}^{- 1} {n^{- 1 / 2} {\frac{\partial ℓ (β)}{\partial β} |}_{β = β_{0}}} + o_{p} (1) .

In what follows, we show that

n^{1 / 2} {\frac{\partial ℓ_{M} (β)}{\partial β} - \frac{\partial {\tilde{ℓ}}_{M} (β)}{\partial β}} = - n^{1 / 2} \sum_{i = 1}^{n} \frac{\partial}{\partial β} [{{\hat{Λ}}_{β} (a_{i}) - Λ_{β} (a_{i})} exp (β^{'} x_{i})]

(A6)

- n^{- 1 / 2} {\sum_{i = 1}^{n} \frac{1}{{\hat{μ}}_{β} (x_{i})} \frac{\partial {\hat{μ}}_{β} (x_{i})}{\partial β} - \frac{1}{μ_{β} (x_{i})} \frac{\partial μ_{β} (x_{i})}{\partial β}}

(A7)

has an asymptotic independent and identically distributed representation. Let H be the joint probability measure of (A, X) and let Ĥ be the corresponding empirical measure for H. Then the right-hand side of (A6) can be expressed as

\begin{array}{l} - n^{1 / 2} \frac{\partial}{\partial β} \int_{- \infty}^{\infty} \int_{0}^{τ} [{\frac{d \bar{N} (u)}{S^{(0)} (u, β)} - \frac{d F^{u} (u)}{𝒮^{(0)} (u, β)}} exp (β^{'} X) I (u ⩽ a ⩽ τ)] d \hat{H} (a, x) \\ = - n^{1 / 2} \frac{\partial}{\partial β} \int_{- \infty}^{\infty} \int_{0}^{τ} [\frac{d \bar{N} (u) - d F^{u} (u)}{𝒮^{(0)} (u, β)} - \frac{d F^{u} (u)}{𝒮^{(0)} {(u, β)}^{2}} {S^{(0)} (u, β) - 𝒮^{(0)} (u, β)}] \\ \times exp (β^{'} X) I (u ⩽ a ⩽ τ) d \hat{H} (a, x) + o_{p} (1) \\ = - n^{1 / 2} \sum_{i = 1}^{n} \frac{\partial}{\partial β} \int_{- \infty}^{\infty} \int_{0}^{τ} [{\frac{d N_{i} (u)}{𝒮^{(0)} (u, β)} - \frac{exp (β^{'} x_{i}) I (a_{i} ⩽ u ⩽ y_{i}) d F^{u} (u)}{𝒮^{(0)} {(u, β)}^{2}}} \\ \times exp (β^{'} X) I (u ⩽ a ⩽ τ)] d H (a, x) = - n^{- 1 / 2} \sum_{i = 1}^{n} ϕ_{1 i} (β) + o_{p} (1) . \end{array}

Next, applying the functional delta method, we have $n^{1 / 2} {{\hat{μ}}_{β} (x) - μ_{β} (x)} = n^{- 1 / 2} \sum_{i = 1}^{n} ψ_{i} (β, x) + o_{p} (n^{- 1 / 2})$ , where

\begin{array}{l} ψ_{i} (β, x) \\ = \sum_{i = 1}^{n} \int_{0}^{τ} \int_{0}^{τ} [S_{β} (u | x) exp β^{'} X) {\frac{d N_{i} (υ)}{𝒮^{(0)} (υ, β)} - \frac{d F^{u} (υ)}{𝒮^{(0)} {(υ, β)}^{2}} exp (β^{'} x_{i}) I (a_{i} ⩽ υ ⩽ y_{i}}] d u . \end{array}

Thus, (A7) can be expressed as

\begin{array}{l} n^{- 1 / 2} \int_{- \infty}^{\infty} \int_{0}^{τ} [\frac{1}{μ_{β} (x)} {\frac{\partial {\hat{μ}}_{β} (x)}{\partial β} - \frac{\partial μ_{β} (x)}{\partial β}} - \frac{1}{μ_{β} {(x)}^{2}} \frac{\partial μ_{β} (x)}{\partial β} {{\hat{μ}}_{β} (x) - μ_{β} (x)}] d \hat{H} (a, x) + o_{p} (1) \\ = n^{- 1 / 2} \sum_{i = 1}^{n} \int_{- \infty}^{\infty} \int_{0}^{τ} [\frac{1}{μ_{β} (x)} {\frac{\partial ψ_{i} (β, x)}{\partial β} - \frac{ψ_{i} (β, x)}{μ_{β} (x)} \frac{\partial μ_{β} (x)}{\partial β}}] d H (a, x) + o_{p} (1) \\ = n^{- 1 / 2} \sum_{i = 1}^{n} ϕ_{2 i} (β) + o_{p} (1) . \end{array}

Finally, applying the functional delta method, we can obtain the asymptotic representation for the partial score function: $n^{- 1 / 2} \partial ℓ_{P} (β_{0}) / \partial β = n^{- 1 / 2} \sum_{i = 1}^{n} ϕ_{3 i} (β_{0}) + o_{p} (1)$ , where

ϕ_{3 i} (β_{0}) = \int_{0}^{τ} {x_{i} - \frac{𝒮^{(1)} (u, β_{0})}{𝒮^{(0)} (u, β_{0})}} {d N_{i} (u) - exp ({β^{'}}_{0} x_{i}) I (a_{i} ⩽ u ⩽ y_{i}) d Λ_{0} (u)} .

Let ϕ_i (β₀) = ϕ_1i (β₀) + ϕ_2i (β₀) + ϕ_3i (β₀). We have $n^{- 1 / 2} \partial ℓ (β) / \partial β |_{β = β_{0}} = n^{- 1 / 2} \sum_{i = 1}^{n} κ_{i} (β_{0}) + o_{p} (1)$ , where κ_i (β₀) = ϕ_i (β₀) − [∂{Λ_β (a_i) exp(β′x_i)}/∂β + μ_β (x_i)⁻¹∂μ_β (x_i)/∂β] |_β=β₀. Arguing as in the proof of consistency, we can show that, as n → ∞, n⁻¹ {∂²ℓ(β)/∂β^′∂β} |_β=β₀ → {∂²γ(β)/∂β^′∂β} |_β=β₀ almost surely. Define Γ(β₀) = E{κ_i (β₀)^⊗2}. Hence, under the regularity conditions, as n → ∞, n^1/2 (β̂ − β₀) converges weakly to a zero mean multivariate distribution with variance-covariance matrix ∑(β₀) = {∂²γ (β₀)/∂β′ ∂β}⁻¹ Γ (β₀){∂²γ (β₀)/∂β^′ ∂β}⁻¹.

Supplementary material

Supplementary material available at Biometrika online includes the derivation of the score operator Φ and the adjoint operator Φ^* based on the marginal likelihood function ℓ_M.

References

Begun JM, Hall WJ, Huang W-M, Wellner JA. Information and asymptotic efficiency in parametric-nonparametric models. Ann Statist. 1983;11:432–52. [Google Scholar]
Cox DR. Regression models and life-tables (with discussion) J. R. Statist. Soc. B. 1972;34:187–220. [Google Scholar]
de Una-Alvarez J, Otero-Giraldez M, Alvarez-Llorente G. Estimation under length-bias and right-censoring: an application to unemployment duration analysis for married women. J Appl Statist. 2003;30:283–91. [Google Scholar]
Ghosh D. Proportional hazards regression for cancer studies. Biometrics. 2008;64:141–8. doi: 10.1111/j.1541-0420.2007.00830.x. [DOI] [PubMed] [Google Scholar]
Gong G, Samaniego FJ. Pseudo maximum likelihood estimation: theory and applications. Ann Statist. 1981;9:861–9. [Google Scholar]
Kalbfleisch JD, Lawless JF. Regression models for right truncated data with applications to AIDS incubation times and reporting lags. Statist. Sinica. 1991;1:19–32. [Google Scholar]
Lagakos S, Barraj L, De Gruttola V. Nonparametric analysis of truncated survival data, with application to AIDS. Biometrika. 1988;75:515–23. [Google Scholar]
Lancaster T. Econometric methods for the duration of unemployment. Econometrica. 1979;47:939–56. [Google Scholar]
Lancaster T. The Econometric Analysis of Transition Data. Cambridge, UK: Cambridge University Press; 1990. [Google Scholar]
Murphy SA, van der Vaart AW. On profile likelihood. J Am Statist Assoc. 2000;95:449–65. [Google Scholar]
Pepe MS, Fleming TR. A nonparametric method for dealing with mismeasured covariate data. J Am Statist Assoc. 1991;413:108–13. [Google Scholar]
Pollard D. Empirical Processes: Theory and Applications. Hayward, CA: Institute of Mathematical Statistics; 1990. [Google Scholar]
Qin J, Shen Y. Statistical methods for analyzing right-censored length-biased data under Cox model. Biometrics. 2010;66:382–92. doi: 10.1111/j.1541-0420.2009.01287.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Severini TA, Wong WH. Profile likelihood and conditionally parametric models. Ann Statist. 1992;4:1768–802. [Google Scholar]
Tsai W-Y. Pseudo-partial likelihood for proportional hazards models with biased-sampling data. Biometrika. 2009;96:601–15. doi: 10.1093/biomet/asp026. [DOI] [PMC free article] [PubMed] [Google Scholar]
van der Vaart AW. Asymptotic Statistics. Cambridge: Cambridge University Press; 1998. [Google Scholar]
van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. New York: Springer; 1996. [Google Scholar]
Wang M-C. Nonparametric estimation from cross-sectional survival data. J Am Statist Assoc. 1991;86:130–43. [Google Scholar]
Wang M-C, Brookmeyer R, Jewell N. Statistical models for prevalent cohort data. Biometrics. 1993;49:1–11. [PubMed] [Google Scholar]
Wolfson C, Wolfson DB, Asgharian M, M’Lan CE. A reevaluation of the duration of survival after the onset of dementia. New Engl J Med. 2001;344:1111–6. doi: 10.1056/NEJM200104123441501. [DOI] [PubMed] [Google Scholar]
Zelen M. Forward and backward recurrence times and length biased sampling: age specific models. Lifetime Data Anal. 2004;10:325–34. doi: 10.1007/s10985-004-4770-1. [DOI] [PubMed] [Google Scholar]
Zelen M, Feinleib M. On the theory of screening for chronic diseases. Biometrika. 1969;56:601–14. [Google Scholar]
Zucker DM. A pseudo-partial likelihood method for semiparametric survival regression with covariate errors. J Am Statist Assoc. 2005;100:1264–77. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material available at Biometrika online includes the derivation of the score operator Φ and the adjoint operator Φ^* based on the marginal likelihood function ℓ_M.

[b1-asr072] Begun JM, Hall WJ, Huang W-M, Wellner JA. Information and asymptotic efficiency in parametric-nonparametric models. Ann Statist. 1983;11:432–52. [Google Scholar]

[b2-asr072] Cox DR. Regression models and life-tables (with discussion) J. R. Statist. Soc. B. 1972;34:187–220. [Google Scholar]

[b3-asr072] de Una-Alvarez J, Otero-Giraldez M, Alvarez-Llorente G. Estimation under length-bias and right-censoring: an application to unemployment duration analysis for married women. J Appl Statist. 2003;30:283–91. [Google Scholar]

[b4-asr072] Ghosh D. Proportional hazards regression for cancer studies. Biometrics. 2008;64:141–8. doi: 10.1111/j.1541-0420.2007.00830.x. [DOI] [PubMed] [Google Scholar]

[b5-asr072] Gong G, Samaniego FJ. Pseudo maximum likelihood estimation: theory and applications. Ann Statist. 1981;9:861–9. [Google Scholar]

[b6-asr072] Kalbfleisch JD, Lawless JF. Regression models for right truncated data with applications to AIDS incubation times and reporting lags. Statist. Sinica. 1991;1:19–32. [Google Scholar]

[b7-asr072] Lagakos S, Barraj L, De Gruttola V. Nonparametric analysis of truncated survival data, with application to AIDS. Biometrika. 1988;75:515–23. [Google Scholar]

[b8-asr072] Lancaster T. Econometric methods for the duration of unemployment. Econometrica. 1979;47:939–56. [Google Scholar]

[b9-asr072] Lancaster T. The Econometric Analysis of Transition Data. Cambridge, UK: Cambridge University Press; 1990. [Google Scholar]

[b10-asr072] Murphy SA, van der Vaart AW. On profile likelihood. J Am Statist Assoc. 2000;95:449–65. [Google Scholar]

[b11-asr072] Pepe MS, Fleming TR. A nonparametric method for dealing with mismeasured covariate data. J Am Statist Assoc. 1991;413:108–13. [Google Scholar]

[b12-asr072] Pollard D. Empirical Processes: Theory and Applications. Hayward, CA: Institute of Mathematical Statistics; 1990. [Google Scholar]

[b13-asr072] Qin J, Shen Y. Statistical methods for analyzing right-censored length-biased data under Cox model. Biometrics. 2010;66:382–92. doi: 10.1111/j.1541-0420.2009.01287.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b14-asr072] Severini TA, Wong WH. Profile likelihood and conditionally parametric models. Ann Statist. 1992;4:1768–802. [Google Scholar]

[b15-asr072] Tsai W-Y. Pseudo-partial likelihood for proportional hazards models with biased-sampling data. Biometrika. 2009;96:601–15. doi: 10.1093/biomet/asp026. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b16-asr072] van der Vaart AW. Asymptotic Statistics. Cambridge: Cambridge University Press; 1998. [Google Scholar]

[b17-asr072] van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. New York: Springer; 1996. [Google Scholar]

[b18-asr072] Wang M-C. Nonparametric estimation from cross-sectional survival data. J Am Statist Assoc. 1991;86:130–43. [Google Scholar]

[b19-asr072] Wang M-C, Brookmeyer R, Jewell N. Statistical models for prevalent cohort data. Biometrics. 1993;49:1–11. [PubMed] [Google Scholar]

[b20-asr072] Wolfson C, Wolfson DB, Asgharian M, M’Lan CE. A reevaluation of the duration of survival after the onset of dementia. New Engl J Med. 2001;344:1111–6. doi: 10.1056/NEJM200104123441501. [DOI] [PubMed] [Google Scholar]

[b21-asr072] Zelen M. Forward and backward recurrence times and length biased sampling: age specific models. Lifetime Data Anal. 2004;10:325–34. doi: 10.1007/s10985-004-4770-1. [DOI] [PubMed] [Google Scholar]

[b22-asr072] Zelen M, Feinleib M. On the theory of screening for chronic diseases. Biometrika. 1969;56:601–14. [Google Scholar]

[b23-asr072] Zucker DM. A pseudo-partial likelihood method for semiparametric survival regression with covariate errors. J Am Statist Assoc. 2005;100:1264–77. [Google Scholar]

PERMALINK

A maximum pseudo-profile likelihood estimator for the Cox model under length-biased sampling

Chiung-Yu Huang

Jing Qin

Dean A Follmann

Abstract

1. Introduction

2. Model and estimation methods

2.1. Data and model set-up

2.2. Brief review of existing methods

2.3. Maximum pseudo-profile likelihood estimator

2.4. Efficiency considerations

3. Simulations and data analysis

3.1. Monte-Carlo simulations

Table 1.

Table 2.

3.2. Analysis of Canadian Study of Health and Aging

Table 3.

4. Remark

Acknowledgments

Appendix.

Proofs

Supplementary material

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

A maximum pseudo-profile likelihood estimator for the Cox model under length-biased sampling

Chiung-Yu Huang

Jing Qin

Dean A Follmann

Abstract

1. Introduction

2. Model and estimation methods

2.1. Data and model set-up

2.2. Brief review of existing methods

2.3. Maximum pseudo-profile likelihood estimator

2.4. Efficiency considerations

3. Simulations and data analysis

3.1. Monte-Carlo simulations

Table 1.

Table 2.

3.2. Analysis of Canadian Study of Health and Aging

Table 3.

4. Remark

Acknowledgments

Appendix.

Proofs

Supplementary material

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases