Skip to main content
Biometrika logoLink to Biometrika
. 2012 Jan 27;99(1):199–210. doi: 10.1093/biomet/asr072

A maximum pseudo-profile likelihood estimator for the Cox model under length-biased sampling

Chiung-Yu Huang 1, Jing Qin 2, Dean A Follmann 3
PMCID: PMC3667656  PMID: 23843659

Abstract

This paper considers semiparametric estimation of the Cox proportional hazards model for right-censored and length-biased data arising from prevalent sampling. To exploit the special structure of length-biased sampling, we propose a maximum pseudo-profile likelihood estimator, which can handle time-dependent covariates and is consistent under covariate-dependent censoring. Simulation studies show that the proposed estimator is more efficient than its competitors. A data analysis illustrates the methods and theory.

Keywords: Approximate likelihood, Cross-sectional sampling, Product-limit estimator, Random truncation, Screening trials

1. Introduction

When studying the natural history of a disease, the time from disease onset to an event or failure is usually the focus. An incident cohort approach, which studies initially disease-free subjects from disease onset to failure, can be very inefficient, especially if the disease is uncommon. A prevalent sampling design, which only includes diseased subjects who have not experienced the failure event at the time of recruitment, can be much more efficient. However, the observed survival time is subject to left truncation: those who have experienced the failure event before the recruitment time are not observable. Thus, individuals in the prevalent cohort tend to have slower progression of the disease than those in a typical incident study. As a result, statistical methods such as the Kaplan–Meier estimator that fail to account for left truncation can lead to substantial overestimation of the survival time.

In the case of stable disease, that is, the occurrence of disease onset follows a stationary Poisson process, the survival time in the prevalent cohort is a biased sample of that in the incident population, where the sampling weight is proportional to the length of the survival time. Similarly, the truncation time, from disease onset to recruitment, in the prevalent cohort is also a biased sample of the uniform truncation time in the incident population, and its distribution is related to the underlying survival distribution in a known fashion. We use the term length-biased sampling for left truncation under the assumption of stationary disease incidence. Examples of length-biased sampling include studies of cancer screening trials (Zelen & Feinleib, 1969; Zelen, 2004), HIV prevalent cohort studies (Lagakos et al., 1988) and unemployment duration (Lancaster, 1979; de Una-Alvarez et al., 2003).

This paper focuses on semiparametric estimation of the Cox proportional hazards model for right-censored survival data under length-biased sampling. Intuitively, efficient estimation can be achieved by maximizing the full semiparametric likelihood with respect to the regression parameter and the baseline hazard function. The maximum likelihood approach, however, involves high-dimensional maximization, and hence may cause computational concerns for large sample sizes. Estimation of a finite-dimensional parameter in the presence of an infinite-dimensional nuisance parameter has been studied by a number of authors. In particular, Severini & Wong (1992) and Zucker (2005) generalized the profile likelihood method by replacing the nuisance parameters in the full likelihood or the partial likelihood with a consistent estimator that may depend on the parameter of interest. In this paper, we follow their idea to propose a semiparametric estimation procedure for the Cox model under length-biased sampling. Specifically, we replace the hazard function in the full likelihood with a Breslow-type estimator for the hazard function to obtain a pseudo-profile likelihood function. Thus, a consistent estimator of the regression parameters can be easily derived by maximizing the pseudo-profile likelihood. Unlike other bias-adjusted risk-set methods, including Ghosh (2008), Tsai (2009) and Qin & Shen (2010), the proposed estimation procedure does not involve estimation of the censoring distribution, so it is expected to be more stable when the censoring proportion is high.

2. Model and estimation methods

2.1. Data and model set-up

For subjects in the target disease population, let T0 denote the time from the disease incidence to the failure event of interest, W0 denote the calendar time of the disease incidence and X0 denote a p × 1 vector of covariates. Assume that the sampling time, ξ, is independent of (W0, T0, X0). An individual would be qualified to be sampled at time ξ only if T0 + W0ξ ⩾ 0. Denote by (W, T, X) the random variables from the prevalent population. The probability distribution of (W, T, X) is the same as the probability distribution of (W0, T0, X0) conditional on T0 + W0ξW0.

In practice, the observation of failure time T in the prevalent cohort is subject to right censoring due to the study ending or premature dropout. The censoring time measured from recruitment, C, is usually assumed to be independent of (T, A) given X. However, the total censoring time A + C and the survival time T are correlated, as they share the same A. Let Y = min(T, A + C) denote the follow-up time until failure or censoring, and let Δ = I (TA + C) be the indicator of failure. For subject i ∈ {1, …, n}, denote by xi the covariate vector, by yi and ai the observed survival time and truncation time, and by δi the indicator of an uncensored event time. The observed data (yi, ai, δi, xi) for i = 1, …, n are assumed to be independent and identically distributed realizations of (Y, A, Δ, X).

Denote by f(t | x) and S(t | x) the conditional density function and survival function of T0 = t given X0 = x, and let μ(x)=0uf(u|x)du be the conditional mean of T0 given X0 = x. We impose the following conditions for incident population random variables.

Assumption 1. The variable (T0, X0) is independent of when the disease incidence occurs, W0.

Assumption 2. Disease incidence occurs over calendar time at a constant rate.

Under Assumptions 1 and 2, the joint density function of (A, T) given X = x evaluated at (a, t) is f (t | x)μ(x)−1 I (t > a > 0) (Lancaster, 1990, Ch. 3), and the survival time T given X = x has a length-biased density function tf (t | x)μ(x)−1.

We assume that the survival time T0 in the incident population follows the Cox (1972) proportional hazards model λ(t | x) = λ(t) exp(β′x), where λ(t) is an unspecified, continuous baseline hazard function and β is a vector of p × 1 regression parameters. Let Λ(t)=0tλ(u)du be the cumulative baseline hazard function. Under Assumptions 1 and 2 and the independence of C and (T, A) given X, the full likelihood function is proportional to

(β,Λ)=i=1nf(yi|xi)δiS(yi|xi)1δiμ(xi)=i=1n{λ(yi)exp(βxi)}δiexp{Λ(yi)exp(βxi)}0exp{Λ(u)exp(βxi)}du. (1)

2.2. Brief review of existing methods

The likelihood (1) can be re-expressed as the product of the truncation likelihood conditional on A and the marginal likelihood of A:

(β,Λ)=T(β,Λ)×M(β,Λ)=i=1n{f(yi|xi)δiS(yi|xi)1δiS(ai|xi)}×i=1n{S(ai|xi)μ(xi)}.

Written in this way, we see that there is information about the regression parameter β in M(β, Λ). The truncation likelihood T can be further decomposed as the product of the partial likelihood (Kalbfleisch & Lawless, 1991)

P(β)=i=1n{exp(βxi)j=1nexp(βxj)I(ajyiyj)}δi,

and the residual likelihood R(β, Λ). Wang et al. (1993) showed that P is fully efficient with respect to T. However, under length-biased sampling the maximum partial likelihood estimator is expected to be inefficient, because it ignores information in M(β, Λ).

Various methods that better exploit the special structure of length-biased survival data have been proposed in the literature. Let G(t) be the survival function of the censoring time C, and let Ĝ (t) be the Kaplan–Meier estimator of G(t) based on {(yiai, 1 − δi) : i = 1, …, n}. Qin & Shen (2010) proposed to solve the weighted estimating equation

U1(β)=i=1nδi[xij=1nδixjexp(βxj){yjG^(yjaj)}1I(yjyi)j=1nδjexp(βxj){yjG^(yjaj)}1I(yjyi)]=0,

where the contribution of a subject in the risk set is inversely weighted by the probability of the subject being sampled and uncensored. This estimating method, however, might be unstable as the weight function yj1G^(yjaj)1 involves estimation of the tail probability of the censoring distribution. As an alternative, Qin & Shen (2010) considered solving the estimating equation

U2(β)=i=1nδi[xij=1nδixjexp(βxj){w^c(yj)}1I(yjyi)j=1nδiexp(βxj){w^c(yj)}1I(yjyi)]=0,

with w^c(y)=0yG^(u)du. The weight function ŵc(yj)−1 is the integral of the censoring survival function, which is more stable than the weight function yj1G^(yjaj)1 in U1. A major restriction of the two estimating equation-based methods is that the censoring time must not depend on the covariates. Moreover, the estimating equations only use covariate information from uncensored individuals, suggesting that there is still room for efficiency gains.

2.3. Maximum pseudo-profile likelihood estimator

The maximum likelihood estimator could be obtained by applying the semiparametric profile likelihood method (Murphy & van der Vaart, 2000) to deal with the nuisance parameter Λ. For length-biased sampling data, however, maximizing with respect to Λ for fixed β is computationally difficult because involves Λ in a complicated way. Instead of profiling out the nonparametric component Λ in , we propose to replace Λ(t) with a simple estimate that is consistent and has a n1/2-convergence rate. This approach has been used in various contexts under various names, including pseudo- and estimated-likelihood estimation (Gong & Samaniego, 1981; Pepe & Fleming, 1991; Severini & Wong, 1992; Zucker, 2005).

Our simple estimate is based on profiling the truncation likelihood T(β, Λ). Specifically, for fixed β, the truncation likelihood T(β, Λ) is maximized by the Breslow-type estimator

Λ^β(t)=0td{j=1nδjI(yju)}j=1nexp(βxj)I(ajuyj)

in the class of nondecreasing right-continuous functions which jump only at uncensored failure times. Note that Λ̂β(t) can be generalized to handle time-varying covariates. Profiling out Λ from the truncation likelihood T(β, Λ) yields the partial likelihood, that is, T(β, Λ̂β) = P(β). Replacing Λ with Λ̂β in the full likelihood , we obtain a pseudo-profile likelihood function,

(β,Λ^β)=T(β,Λ^β)×M(β,Λ^β)=P(β)×i=1nexp{Λ^β(ai)exp(βxi)}0exp{Λ^β(u)exp(βxi)}du.

We propose to estimate the regression parameter β by maximizing the pseudo-profile likelihood.

Assume that T0, and hence T, has a finite maximal support τ, where τ = sup{t : pr(T0t) < 1} < ∞. Then τ is also the maximal support for the truncation time random variable A, as A given T has a uniform distribution on [0, T]. We further assume that C is not degenerate at 0, that is, pr(C > 0) > 0. Then it can be shown that max Δi Yiτ as n → ∞. Thus, Λ(t) is estimable on the interval [0, τ]; as a result, the conditional mean of T0 given X is also estimable. Let Ni (t) = δiI (yit) be the counting process of observed failure events for subject i, and denote N¯(t)=n1i=1nNi(t) and Fu(t) = pr(Δ = 1, Yt). Define the functions S(k)(u,β)=n1i=1nxikexp(βxi)I(aiuyi)(k=0,1,2), and let 𝒮(k)(u, β) = E{X⊗k exp(β′X)I (AuY)} be the expectations. Assume that X is bounded, that the two classes of functions {ΔI (Yt) : t ∈ [0, τ]} and {X⊗k exp(β′ X)I (AtY) : t ∈ [0, τ], β ∈ Θ} are both Glivenko–Cantelli, as the class of indicator functions and the class of bounded monotone functions are Glivenko–Cantelli (van der Vaart & Wellner, 1996, Theorems 2.4.1 and 2.7.5). Moreover, because S(0)(t, β) is bounded away from zero, we can show that supt∈[0,τ],β∈Θ | Λ̂β(t) − Λβ(t) |→ 0 almost surely as n → ∞, where

Λβ(t)=0tdFu(u)𝒮(0)(u,β). (2)

The limit Λβ(t) of Λ̂β(t) defines a smooth mapping in β, and it passes through the true baseline cumulative hazard function Λ(t) when β equals the true parameter value. If we regard (2) as a known function of β, the function (β, Λβ) can be viewed as the full likelihood function derived under an induced parametric submodel λ(t | x) = λβ(t) exp(β′ X).

Replacing Λ with Λ̂β in (β, Λ), we obtain a log pseudo-profile likelihood function (β) = P(β) + M(β), where

P(β)=i=1n0τ[βxilog{S(0)(u,β)}]dNi(u)

is the log partial likelihood obtained by profiling out Λ from the truncation likelihood T, and

M(β)=i=1n{Λ^β(ai)exp(βxi)+logμ^β(xi)},

with μ^β(xi)=0exp{Λ^β(u)exp(βxi)}du. We show in the Appendix that, in a compact neighbourhood of the true regression parameter, (β) can be approximated by ℓ̃(β) = ℓ̃P(β) + ℓ̃M(β), where ˜P(β)=i=1n0τ[βxilog{𝒮(0)(u,β)}]dNi(u), ˜M(β)=i=1n{Λβ(ai)exp(βxi)+logμβ(xi)} and μβ(xi)=0exp{Λβ(u)exp(βxi)}du. Thus, and ℓ̃ have similar local behaviour in the compact neighbourhood, and the asymptotic properties of the maximum pseudo-profile likelihood estimator can be investigated through ℓ̃.

Define the limit function γ(β) = limn→∞ n−1ℓ̃(β) = limn→∞n−1 {ℓ̃P(β) + ℓ̃M(β)}. We denote the true parameter values of the proportional hazards model by {β0, λ0(·)}, and define Λ0(t)=0tλ0(u)du. Theorem 1 summarizes the consistency and asymptotic normality of β̂ that maximizes the log pseudo-profile likelihood function (β), with proofs given in the Appendix.

Theorem 1. Assume the following conditions hold: (a) β0 lies in the interior of a known compact set Θ inp; (b) X is bounded; (c) pr(Yt) is a continuous function for t ∈ [0, τ] and (d) ∂2γ(β0)/∂β′∂β is nonsingular. Then β̂β0 in probability as n → ∞. Moreover, n1/2 (β̂β0) converges in distribution to a zero mean multivariate normal distribution with variance-covariance matrix ∑(β0), where ∑(β0) is specified in the Appendix.

While the asymptotic variance (β0) may be estimated by its empirical version, the computation is quite complicated. Since we have established the asymptotic normality, it is computationally more convenient to use the bootstrap method. The performance of the proposed estimator is evaluated in § 3 via simulations.

2.4. Efficiency considerations

To investigate the potential efficiency gains in the proposed pseudo-profile likelihood estimator, we first consider the case that Λ is parameterized by a vector of q × 1 parameters ν, that is, Λ(t) = Λ(t, ν). For model identifiability, we assume without loss of generality that E(X) = 0. Define the log truncation likelihood function T = log(T). The proposed method is equivalent to solving the system of estimating equations ∂T/∂β + ∂M/∂β = 0 and ∂T/∂ν = 0. Let η = (∂T/∂β, ∂T/∂ν, ∂M/∂β) be a vector of score functions. Define

(a11a12a12a22)=E(2T/ββ2T/βν2T/βν2T/νν),

and let b1 = −E(∂2M/∂β′∂β) and b2 = −E(∂2M/∂β′∂ν). Denote ν* = (β, ν). Then the optimal linear combination of estimating functions is

E(ην*)var(η)1η=(a11a12b1a12a22b2)(a11a120a12a22000b1)1η=(Ip0Ip0Iqb2b11)η, (3)

where, for convenience, 0 denotes a matrix of 0s of appropriate dimensions and Ip is a p × p identity matrix. It can be verified that, when evaluated at the true parameter values, b2 = n/2 × E(X exp(2β′ X) × ∂/∂ν[Λ(A)2E{Λ(A) | X}2]). Hence, if b2 = 0, the system of estimating equations ∂T/∂β + ∂M/β = 0 and ∂T/∂ν = 0 is the optimal linear combination of estimating equations based on η. The partial likelihood method solves the system of estimating equations ∂T/∂β = 0 and ∂T/∂ν = 0, which also belongs to the class of linear combinations of estimating equations based on η. Thus, the proposed method is more efficient than the partial likelihood method when b2 = 0.

When the baseline hazard function λ is of infinite dimension, the proposed pseudo-profile likelihood method solves the system (van der Vaart, 1998, § 25.12)

n(T/β+T/β)=0,nΨhPΨh=0,(hH),

where Ψ is the score operator (Begun et al., 1983) for Λ based on the truncation likelihood T = log T and H is a infinite-dimensional class of direction h from which paths of one-dimensional submodels for Λ may approach the true parameter. We use ℙn to denote the empirical measure, and use P for the probability measures. Let L2(μ) denote the Hilbert space that contains square integrable functions with the inner product 〈g, hμ = ∫ g(u)h(u) (u) for g, hL2 (μ). It is easy to see that L2(Λ). Applying a similar argument as in van der Vaart (1998, § 25.12.1), we can show that the score operator Ψ : L2(Λ) → L2(Pβ) for Λ is given by Ψ(h)=0τh(u)dM(u). Let be a Hilbert space containing . The adjoint operator Ψ* : L2(Pβ) → of Ψ, which satisfies E{Ψ(g)h}=0τg(u)Ψ*(h)(u)dΛ(u) for all g and hL2(Pβ), can be shown to be Ψ*(g)(t) = E{gd M(t)}/dΛ(t). It can be further shown that Ψ*Ψ(h)(t) = E{h(t) exp(β′X)I (YtA)} and Ψ*(∂T/β)(t) = nE{X exp(β′X)I (YtA)} (Murphy & van der Vaart, 2000).

By a similar argument as above, we show in the Supplementary Material that the score operator Φ : L2(Λ) → L2(Pβ) for Λ based on the marginal likelihood M is Φ(h)=0τ[h(u)Xexp(βX){I(Au)pr(Au|X)}]dΛ(u). The adjoint operator Φ* : L2(Pβ) → of Φ can be shown to be Φ*(g)(t) = −E[g{I (At) − pr(At | X)} exp(β′X)]. Moreover, the adjoint operator Φ* satisfies Φ*Φ(h)(t)=E[0τh(u){I(Au)pr(Au|X)}dΛ(u){I(At)pr(At|X)}exp(2βX)] and Φ*(∂M/∂β)(t) = nE[{I (At) − pr(At | X)}{Λ(A) − E(Λ(A) | X)}X exp(2β′X)].

Analogous to (3) for parametric models, the optimal combination of estimating functions based on the score operators ∂T/∂β, ∂M/∂β and Ψ is given by ℙn(∂T/∂β + ∂T/∂ β) = 0 and (nP)Ψh+(b2*)b11T/β=0, hH, where b1 = − E(∂2M/∂β′β) and b2*=0τΦ*(M/β)(t)dΛ(t)=nE([Λ(A)2E{Λ(A)|X}2]Xexp(2βX)). Hence, if b2*=0, the proposed pseudo-partial likelihood method is the most efficient estimator in the class of linear combinations of estimating functions based on ∂T/∂β, ∂M/∂β and Ψ. The weight (b2*)Tb11 can be estimated by replacing Λ with Λ̂ in the corresponding empirical estimators. In general, solving the optimal combination of estimation equations is computationally intensive, and hence is impractical. Moreover, there is no guarantee that it works better than the proposed method for small samples.

3. Simulations and data analysis

3.1. Monte-Carlo simulations

We conducted simulations to assess the performance of the proposed methods. In each simulation, 2000 studies were generated, each with n = 400. The sampling time ξ was set to be 100, and the time of disease onset, W0, was simulated from a uniform distribution over [0, 100] to mimic the incidence of a stable disease. For each subject, we generated X10 from the Bernoulli distribution with pr(X10=1)=pr(X10=0)=0.5 and generated X20 from the standard normal distribution. The survival time T0 was independently generated from one of the three models: (I) an exponential distribution with hazard function 2exp(X10+X20), (II) a Weibull distribution with hazard function 2texp(X10+X20) or (III) a Weibull distribution with hazard function 0.5(t2)2exp(X10+X20). Thus, we simulated failure time distributions with constant, increasing and U-shape hazards. To form a prevalent cohort of sample size n, realizations of (W0, T0, X10, X20) were generated repeatedly until there were n subjects satisfying the sampling constraint W0 + T0τ. The time from enrolment ξ to loss to follow-up was generated from a uniform distribution so that the censoring rate was approximately 0, 30 and 50%.

We compared the finite-sample performance of the proposed pseudo-profile likelihood method with those of the weighted estimating equation methods studied in Qin & Shen (2010) and of the popular partial likelihood method for truncated survival time data. By applying these methods to estimate the Cox model λ(t | X1, X2) = λ0(t) exp(β1X1 + β2X2), we evaluated the relative efficiency by comparing the bootstrap variance of the maximum partial likelihood estimator to that of the other methods. Table 1 summarizes the empirical bias, empirical standard error and the relative efficiency of these four estimation methods. All four estimators are close to their estimands. In the absence of censoring, the pseudo-profile likelihood method has a similar efficiency gain as the weighted estimating equation methods in Qin & Shen (2010). Overall, the relative efficiency of the proposed estimator increases with censoring rate. When the censoring proportion reaches 50%, the pseudo-profile likelihood estimator yields a significant improvement over the maximum partial likelihood estimator, with an efficiency gain greater than 50% in the exponential and Weibull cases, and an efficiency gain greater than 20% in the U-shape hazard function scenario. In the presence of censoring, the proposed pseudo-profile method always outperforms its competitors. In some scenarios, weighted estimating equation methods fail to show improvement, as these methods only use covariate information from uncensored subjects.

Table 1.

Summary statistics for the estimated regression parameters under independent censoring

Proportion censored Estimated coefficient Partial WEE-1 WEE-2 Profile
Bias SE Bias SE Bias SE Bias SE RE
Scenario I: λ0(t) = 2
0% β̂1 6 133 −1 98 −1 98 −2 98 1.84
β̂2 5 83 0 65 0 65 −2 65 1.60
30% β̂1 6 151 −46 136 2 120 1 108 1.98
β̂2 6 94 −57 90 2 82 4 77 1.50
50% β̂1 10 171 −113 189 10 157 12 122 1.96
β̂2 6 112 −125 115 2 98 10 90 1.53
Scenario II: λ0(t) = 2t
0% β̂1 4 118 0 98 0 98 −1 97 1.45
β̂2 2 74 0 63 0 63 −1 63 1.40
30% β̂1 3 132 −21 137 9 121 3 105 1.58
β̂2 5 89 −28 90 4 80 2 73 1.46
50% β̂1 6 159 −101 206 10 154 5 118 1.82
β̂2 8 100 −97 118 6 96 5 79 1.62
Scenario III: λ0(t) = 0.5(t − 2)2
0% β̂1 6 112 5 104 5 104 5 104 1.17
β̂2 6 69 5 65 5 65 5 64 1.14
30% β̂1 10 134 10 143 10 130 9 122 1.21
β̂2 7 82 0 85 3 80 4 77 1.14
50% β̂1 9 151 −34 216 7 154 6 134 1.27
β̂2 7 97 −35 123 7 98 5 88 1.20

Partial, the maximum partial likelihood estimator; WEE-1 and WEE-2, estimators derived by solving U1(β) = 0 and U2(β) = 0; Profile, the maximum pseudo-profile likelihood estimator; Bias and ES, empirical bias (×1000) and empirical standard deviation (×1000) of 2000 regression parameter estimates; RE, the empirical variance of the maximum partial likelihood estimator divided by that of the maximum pseudo-profile likelihood estimator.

In addition to better efficiency, another advantage of the proposed pseudo-profile likelihood method is that it does not involve estimation of the censoring distribution. When this distribution depends on the covariate, the estimating equation methods may yield biased estimation. For demonstration, we simulated survival time data under Model (II). The censoring times for subjects with observed covariates X1 = 1 and X2 < 0 were generated from an exponential distribution with mean 5 exp(−X2), while the censoring times for other subjects were generated from a uniform distribution. The overall censoring proportion was set at approximately 30 and 50%. As summarized in Table 2, the estimating equation-based methods yield biased estimators, while the bias of the pseudo-profile estimators remains small.

Table 2.

Empirical bias and standard error of estimators of estimated regression parameters under covariate dependent censoring

Proportion censored Estimated coefficient Partial WEE-1 WEE-2 Profile
Bias SE Bias SE Bias SE Bias SE RE
30% β̂1 7 132 −388 135 −127 119 2 105 1.56
β̂2 8 87 48 84 30 79 4 73 1.44
50% β̂1 10 166 −819 175 −252 161 −2 128 1.69
β̂2 8 103 82 107 51 97 6 80 1.64

See Table 1 for abbreviations.

3.2. Analysis of Canadian Study of Health and Aging

In this section, we report the results of data analysis for a cohort of prevalent cases in one of the largest epidemiologic studies of dementia, the Canadian Study of Health and Aging. From February 1991 to May 1992, an extensive survey was carried out and a total of 1132 persons aged 65 and older with dementia were identified in this first phase of the study. For each study subject, a diagnosis of possible Alzheimer’s disease, probable Alzheimer’s disease, or vascular dementia was assigned, and the date of dementia onset was determined by interviewing care-givers. Information on mortality were collected between January 1996 and May 1997.

We considered a subset of the study data by excluding those with missing date of onset or missing dementia subtype classification. Moreover, as in Wolfson et al. (2001), those with observed survival times greater than or equal to 20 years were excluded because these subjects are considered unlikely to have Alzheimer’s disease or vascular dementia. As a result, a total of 807 dementia patients were included in our analysis. Among them 388 had a diagnosis of probable Alzheimer’s disease, 249 had possible Alzheimer’s disease and 170 had vascular dementia. In the second phase of the study, a total of 627 deaths were recorded, among whom 302 had a diagnosis of probable Alzheimer’s, 189 had possible Alzheimer’s and 136 had vascular dementia.

The stationarity assumption that the incidence of dementia is constant over time was found to be reasonably met for this data using the method suggested in Wang (1991). To compare the risk of death between different diagnoses, we fit a Cox proportional hazards model for the length-biased survival time data, with indicators of probable Alzheimer’s and vascular dementia as covariates. We applied the pseudo-profile likelihood method, the two weighted estimating equation methods in Qin & Shen (2010), and the partial likelihood method. The estimated regression coefficients are summarized in Table 3. The proposed method yields similar estimates of the regression parameters as do the estimating equation methods, and the bootstrap standard errors of the proposed estimator are smaller than those of its competitors. The proposed pseudo-profile likelihood method estimates a significant higher risk of death in patients with probable Alzheimer’s and those with vascular dementia. Specifically, as compared with patients with possible Alzheimer’s, the risk of death increased by 16% among those with probable Alzheimer’s and by 27% among those with vascular dementia. For β1, the variance ratio for the competitors to the proposed method is always at least 1.67. This suggests that if a competitor method were used in lieu of the proposed method, the study would need to recruit at least 760 more subjects to achieve the same precision.

Table 3.

Estimated regression coefficients for the Canadian study

β1, probable Alzheimer’s β2, vascular dementia
Method Estimate SE 95% CI Estimate SE 95% CI
Partial 0.030 0.089 (−0.142, 0.203) 0.113 0.109 (−0.103, 0.323)
EE-1 0.130 0.095 (−0.058, 0.312) 0.278 0.107 (0.070, 0.497)
EE-2 0.157 0.088 (−0.022, 0.328) 0.257 0.121 (0.038, 0.519)
Profile 0.150 0.068 (0.016, 0.278) 0.241 0.088 (0.066, 0.419)

Partial, maximum partial likelihood estimator; EE-1 and EE-2, estimators derived by solving U1(β) = 0 and U2(β) = 0; Profile, the pseudo-profile likelihood estimator; SE, the empirical standard deviation of 2000 regression parameter.

4. Remark

The validity of the proposed method relies on the assumption of stable disease. When the stationarity assumption fails to hold, it is not uncommon that knowledge about the distribution of disease incidence can be obtained from other sources. If H denotes the distribution of the truncation time in the disease population, then the transformed survival time H(T0) is truncated by a uniformly distributed random variable. Thus, it follows from the fact that the Cox model is invariant under monotone transformation, that the regression coefficients in the Cox model can be consistently estimated by applying the proposed method to the transformed data {H(ai), H(yi), δi} (i = 1, …, n).

Acknowledgments

The authors thank Professors Ian McDowell, Masoud Asgharian and Christina Wolfson for kindly sharing the Canadian Study of Health and Aging data. The core study was funded by the National Health Research and Development Program, Canada. Additional funding was provided by Pfizer Canada Incorporated through the Medical Research Council/Pharmaceutical Manufacturers Association of Canada Health Activity Program, Bayer Incorporated and the British Columbia Health Research Foundation. The authors also thank the referees, associate editor and editor for their comments which improved the presentation of this article.

Appendix.

Proofs

We begin by establishing the consistency of β̂. In view of the proof of van der Vaart (1998, Theorem 5.7), it suffices to show that, as n → ∞, supβ∈Θ|n−1(β) − γ(β) |→ 0 almost surely and that β0 is the unique maximizer of γ (β) in a compact neighbourhood of β0.

We first show that, for sufficiently large n, (β) has similar local behaviour to ℓ̃(β) in the compact neighbourhood Θ. Because {exp(β′X)I (AtY) : t ∈ [0, τ], β ∈ Θ} is Glivenko–Cantelli and the logarithmic transformation is monotone, log{S(0)(t, β)} − log{𝒮(0)(t, β)} converges to 0 uniformly over β ∈ Θ and t ∈ [0, τ]. Hence, supβ∈Θ | n−1P(β) − n−1ℓ̃P(β)| → 0 almost surely. Following the result that Λ̂β (t) converges to Λβ(t) uniformly over β ∈ Θ and t ∈ [0, τ], exp{Λ̂β(t) exp(β′X)} converges to exp{Λβ (t) exp(β′X)} uniformly over β ∈ Θ and t ∈ [0, τ]. Hence, μ̂β(x) converges to μβ (x) uniformly over β ∈ Θ. For a δn > 0 with δn → 0 as n → ∞, define the class = [ f (t) = {g(t) − Λβ (t)} exp(β′ X), where β∈Θ, g is nondecreasing and nonnegative and supt∈[0,τ] | g(t) − Λβ (t) | ⩽ n]. Thus, by definition, supf∈ℱ | P f | ⩽ n × supβ∈Θ| exp(β′X) |. Moreover, it follows from van der Vaart & Wellner (1996, Theorems 2.7.5 and 2.4.1) that ℱ is Glivenko–Cantelli. Hence, supf∈ℱ | n fP f | → 0 almost surely. For a sufficiently large n, |n1i=1n{Λ^β(ai)Λβ(ai)}exp(βXi)|supf|Pf|+supf|nfPf|. Thus, we show that supβΘ|n1i=1n{Λ^β(ai)Λβ(ai)}exp(βXi)|0 almost surely. By a similar argument, we can show that supβΘ|n1i=1n{μ^β(xi)μβ(xi)}|0 almost surely, and hence n−1M(β) − n−1 ℓ̃M(β) → 0 uniformly over β ∈ Θ. Thus, ℓ(β) = ℓP(β) + ℓM(β) and ℓ̃(β) = ℓP(β) + ℓ̃M(β) have similar local behaviour in Θ.

Next, because Θ is compact and the function m(β)=0τ[βXlog{𝒮(0)(u,β)}]dN(u)Λβ(A)exp(βX)logμβ(X) is continuous and dominated by an integrable function, the class of functions {m(β) : β ∈ Θ} is Glivenko–Cantelli (van der Vaart, 1998, Example 19.8). It follows from a uniform law of large numbers (Pollard, 1990) that supβ∈Θ | n−1 ℓ̃(β) − γ (β) | → 0 almost surely. Thus, supβ∈Θ | n−1(β) − γ (β) |→ 0 almost surely as n → ∞.

Below we prove that β0 is the unique maximizer in a neighbourhood of β0 by showing that ∂γ (β0)/∂β = 0 and ∂2γ(β0)/∂β′∂β is negative definitive at β = β0. Following the fact that the partial score function has expectation zero when evaluated at β = β0 and that E [β (A) exp(β′X)}/∂β |β=β0] = − E{μβ (X)−1μβ (X)/∂β|β=β0} by double expectation, we can show that ∂γ (β)/∂β = 0 when β =β0. Write Sβ (u | x) = exp{−Λβ (u) exp(β′X)}. The second derivative of γ(β) is

2γ(β)ββ=E[Δ{𝒮(1)(Y,β)2𝒮(0)(Y,β)2𝒮(2)(Y,β)𝒮(0)(Y,β)}] (A1)
E[2ββ{Λβ(A)exp(βX)}] (A2)
+E[0τSβ(u|X)μβ(X)2ββ{Λβ(u)exp(βX)}du] (A3)
E(0τSβ(u|X)μβ(X)[β{Λβ(u)exp(βX)}]2du) (A4)
+E([0τSβ(u|X)μβ(X)β{Λβ(u)exp(βX)}du]2). (A5)

By applying the double expectation technique, it can be shown that (A2) + (A3) = 0 for β = β0. Moreover, by the Cauchy–Schwarz inequality, both (A1) and (A4) + (A5) are negative semidefinite. Hence, it follows regularity condition (d) that ∂2γ(β)/∂β′β is negative definite at β = β0. Because the function γ(β) is continuous in β, there exists a compact neighbourhood Θ0 of β0 that β0 is the unique maximizer γ(β) in Θ0. This completes the proof of consistency.

We now prove the asymptotic normality of the maximum pseudo-profile likelihood estimator. A Taylor series expansion yields 0 = ∂(β)/|β=β̂ = ∂(β)/∂β |β=β0 + ∂2(β)/∂β′∂β|β=β* (β̂β0), where β* lies between β̂ and β0. Thus, by consistency of β̂, one has β* → β0 in probability and

n1/2(β^β0)={n12(β)ββ|β=β0}1{n1/2(β)β|β=β0}+op(1).

In what follows, we show that

n1/2{M(β)β˜M(β)β}=n1/2i=1nβ[{Λ^β(ai)Λβ(ai)}exp(βxi)] (A6)
n1/2{i=1n1μ^β(xi)μ^β(xi)β1μβ(xi)μβ(xi)β} (A7)

has an asymptotic independent and identically distributed representation. Let H be the joint probability measure of (A, X) and let Ĥ be the corresponding empirical measure for H. Then the right-hand side of (A6) can be expressed as

n1/2β0τ[{dN¯(u)S(0)(u,β)dFu(u)𝒮(0)(u,β)}exp(βX)I(uaτ)]dH^(a,x)=n1/2β0τ[dN¯(u)dFu(u)𝒮(0)(u,β)dFu(u)𝒮(0)(u,β)2{S(0)(u,β)𝒮(0)(u,β)}]×exp(βX)I(uaτ)dH^(a,x)+op(1)=n1/2i=1nβ0τ[{dNi(u)𝒮(0)(u,β)exp(βxi)I(aiuyi)dFu(u)𝒮(0)(u,β)2}×exp(βX)I(uaτ)]dH(a,x)=n1/2i=1nϕ1i(β)+op(1).

Next, applying the functional delta method, we have n1/2{μ^β(x)μβ(x)}=n1/2i=1nψi(β,x)+op(n1/2), where

ψi(β,x)=i=1n0τ0τ[Sβ(u|x)expβX){dNi(υ)𝒮(0)(υ,β)dFu(υ)𝒮(0)(υ,β)2exp(βxi)I(aiυyi}]du.

Thus, (A7) can be expressed as

n1/20τ[1μβ(x){μ^β(x)βμβ(x)β}1μβ(x)2μβ(x)β{μ^β(x)μβ(x)}]dH^(a,x)+op(1)=n1/2i=1n0τ[1μβ(x){ψi(β,x)βψi(β,x)μβ(x)μβ(x)β}]dH(a,x)+op(1)=n1/2i=1nϕ2i(β)+op(1).

Finally, applying the functional delta method, we can obtain the asymptotic representation for the partial score function: n1/2P(β0)/β=n1/2i=1nϕ3i(β0)+op(1), where

ϕ3i(β0)=0τ{xi𝒮(1)(u,β0)𝒮(0)(u,β0)}{dNi(u)exp(β0xi)I(aiuyi)dΛ0(u)}.

Let ϕi (β0) = ϕ1i (β0) + ϕ2i (β0) + ϕ3i (β0). We have n1/2(β)/β|β=β0=n1/2i=1nκi(β0)+op(1), where κi (β0) = ϕi (β0) − [∂{Λβ (ai) exp(β′xi)}/∂β + μβ (xi)−1∂μβ (xi)/∂β] |β=β0. Arguing as in the proof of consistency, we can show that, as n → ∞, n−1 {∂2(β)/∂ββ} |β=β0 → {∂2γ(β)/∂ββ} |β=β0 almost surely. Define Γ(β0) = E{κi (β0)⊗2}. Hence, under the regularity conditions, as n → ∞, n1/2 (β̂β0) converges weakly to a zero mean multivariate distribution with variance-covariance matrix (β0) = {∂2γ (β0)/β′β}−1 Γ (β0){∂2γ (β0)/ββ}−1.

Supplementary material

Supplementary material available at Biometrika online includes the derivation of the score operator Φ and the adjoint operator Φ* based on the marginal likelihood function M.

References

  1. Begun JM, Hall WJ, Huang W-M, Wellner JA. Information and asymptotic efficiency in parametric-nonparametric models. Ann Statist. 1983;11:432–52. [Google Scholar]
  2. Cox DR. Regression models and life-tables (with discussion) J. R. Statist. Soc. B. 1972;34:187–220. [Google Scholar]
  3. de Una-Alvarez J, Otero-Giraldez M, Alvarez-Llorente G. Estimation under length-bias and right-censoring: an application to unemployment duration analysis for married women. J Appl Statist. 2003;30:283–91. [Google Scholar]
  4. Ghosh D. Proportional hazards regression for cancer studies. Biometrics. 2008;64:141–8. doi: 10.1111/j.1541-0420.2007.00830.x. [DOI] [PubMed] [Google Scholar]
  5. Gong G, Samaniego FJ. Pseudo maximum likelihood estimation: theory and applications. Ann Statist. 1981;9:861–9. [Google Scholar]
  6. Kalbfleisch JD, Lawless JF. Regression models for right truncated data with applications to AIDS incubation times and reporting lags. Statist. Sinica. 1991;1:19–32. [Google Scholar]
  7. Lagakos S, Barraj L, De Gruttola V. Nonparametric analysis of truncated survival data, with application to AIDS. Biometrika. 1988;75:515–23. [Google Scholar]
  8. Lancaster T. Econometric methods for the duration of unemployment. Econometrica. 1979;47:939–56. [Google Scholar]
  9. Lancaster T. The Econometric Analysis of Transition Data. Cambridge, UK: Cambridge University Press; 1990. [Google Scholar]
  10. Murphy SA, van der Vaart AW. On profile likelihood. J Am Statist Assoc. 2000;95:449–65. [Google Scholar]
  11. Pepe MS, Fleming TR. A nonparametric method for dealing with mismeasured covariate data. J Am Statist Assoc. 1991;413:108–13. [Google Scholar]
  12. Pollard D. Empirical Processes: Theory and Applications. Hayward, CA: Institute of Mathematical Statistics; 1990. [Google Scholar]
  13. Qin J, Shen Y. Statistical methods for analyzing right-censored length-biased data under Cox model. Biometrics. 2010;66:382–92. doi: 10.1111/j.1541-0420.2009.01287.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Severini TA, Wong WH. Profile likelihood and conditionally parametric models. Ann Statist. 1992;4:1768–802. [Google Scholar]
  15. Tsai W-Y. Pseudo-partial likelihood for proportional hazards models with biased-sampling data. Biometrika. 2009;96:601–15. doi: 10.1093/biomet/asp026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. van der Vaart AW. Asymptotic Statistics. Cambridge: Cambridge University Press; 1998. [Google Scholar]
  17. van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. New York: Springer; 1996. [Google Scholar]
  18. Wang M-C. Nonparametric estimation from cross-sectional survival data. J Am Statist Assoc. 1991;86:130–43. [Google Scholar]
  19. Wang M-C, Brookmeyer R, Jewell N. Statistical models for prevalent cohort data. Biometrics. 1993;49:1–11. [PubMed] [Google Scholar]
  20. Wolfson C, Wolfson DB, Asgharian M, M’Lan CE. A reevaluation of the duration of survival after the onset of dementia. New Engl J Med. 2001;344:1111–6. doi: 10.1056/NEJM200104123441501. [DOI] [PubMed] [Google Scholar]
  21. Zelen M. Forward and backward recurrence times and length biased sampling: age specific models. Lifetime Data Anal. 2004;10:325–34. doi: 10.1007/s10985-004-4770-1. [DOI] [PubMed] [Google Scholar]
  22. Zelen M, Feinleib M. On the theory of screening for chronic diseases. Biometrika. 1969;56:601–14. [Google Scholar]
  23. Zucker DM. A pseudo-partial likelihood method for semiparametric survival regression with covariate errors. J Am Statist Assoc. 2005;100:1264–77. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material available at Biometrika online includes the derivation of the score operator Φ and the adjoint operator Φ* based on the marginal likelihood function M.


Articles from Biometrika are provided here courtesy of Oxford University Press

RESOURCES