Abstract
This paper considers semiparametric estimation of the Cox proportional hazards model for right-censored and length-biased data arising from prevalent sampling. To exploit the special structure of length-biased sampling, we propose a maximum pseudo-profile likelihood estimator, which can handle time-dependent covariates and is consistent under covariate-dependent censoring. Simulation studies show that the proposed estimator is more efficient than its competitors. A data analysis illustrates the methods and theory.
Keywords: Approximate likelihood, Cross-sectional sampling, Product-limit estimator, Random truncation, Screening trials
1. Introduction
When studying the natural history of a disease, the time from disease onset to an event or failure is usually the focus. An incident cohort approach, which studies initially disease-free subjects from disease onset to failure, can be very inefficient, especially if the disease is uncommon. A prevalent sampling design, which only includes diseased subjects who have not experienced the failure event at the time of recruitment, can be much more efficient. However, the observed survival time is subject to left truncation: those who have experienced the failure event before the recruitment time are not observable. Thus, individuals in the prevalent cohort tend to have slower progression of the disease than those in a typical incident study. As a result, statistical methods such as the Kaplan–Meier estimator that fail to account for left truncation can lead to substantial overestimation of the survival time.
In the case of stable disease, that is, the occurrence of disease onset follows a stationary Poisson process, the survival time in the prevalent cohort is a biased sample of that in the incident population, where the sampling weight is proportional to the length of the survival time. Similarly, the truncation time, from disease onset to recruitment, in the prevalent cohort is also a biased sample of the uniform truncation time in the incident population, and its distribution is related to the underlying survival distribution in a known fashion. We use the term length-biased sampling for left truncation under the assumption of stationary disease incidence. Examples of length-biased sampling include studies of cancer screening trials (Zelen & Feinleib, 1969; Zelen, 2004), HIV prevalent cohort studies (Lagakos et al., 1988) and unemployment duration (Lancaster, 1979; de Una-Alvarez et al., 2003).
This paper focuses on semiparametric estimation of the Cox proportional hazards model for right-censored survival data under length-biased sampling. Intuitively, efficient estimation can be achieved by maximizing the full semiparametric likelihood with respect to the regression parameter and the baseline hazard function. The maximum likelihood approach, however, involves high-dimensional maximization, and hence may cause computational concerns for large sample sizes. Estimation of a finite-dimensional parameter in the presence of an infinite-dimensional nuisance parameter has been studied by a number of authors. In particular, Severini & Wong (1992) and Zucker (2005) generalized the profile likelihood method by replacing the nuisance parameters in the full likelihood or the partial likelihood with a consistent estimator that may depend on the parameter of interest. In this paper, we follow their idea to propose a semiparametric estimation procedure for the Cox model under length-biased sampling. Specifically, we replace the hazard function in the full likelihood with a Breslow-type estimator for the hazard function to obtain a pseudo-profile likelihood function. Thus, a consistent estimator of the regression parameters can be easily derived by maximizing the pseudo-profile likelihood. Unlike other bias-adjusted risk-set methods, including Ghosh (2008), Tsai (2009) and Qin & Shen (2010), the proposed estimation procedure does not involve estimation of the censoring distribution, so it is expected to be more stable when the censoring proportion is high.
2. Model and estimation methods
2.1. Data and model set-up
For subjects in the target disease population, let T0 denote the time from the disease incidence to the failure event of interest, W0 denote the calendar time of the disease incidence and X0 denote a p × 1 vector of covariates. Assume that the sampling time, ξ, is independent of (W0, T0, X0). An individual would be qualified to be sampled at time ξ only if T0 + W0 ⩾ ξ ⩾ 0. Denote by (W, T, X) the random variables from the prevalent population. The probability distribution of (W, T, X) is the same as the probability distribution of (W0, T0, X0) conditional on T0 + W0 ⩾ ξ ⩾ W0.
In practice, the observation of failure time T in the prevalent cohort is subject to right censoring due to the study ending or premature dropout. The censoring time measured from recruitment, C, is usually assumed to be independent of (T, A) given X. However, the total censoring time A + C and the survival time T are correlated, as they share the same A. Let Y = min(T, A + C) denote the follow-up time until failure or censoring, and let Δ = I (T ⩾ A + C) be the indicator of failure. For subject i ∈ {1, …, n}, denote by xi the covariate vector, by yi and ai the observed survival time and truncation time, and by δi the indicator of an uncensored event time. The observed data (yi, ai, δi, xi) for i = 1, …, n are assumed to be independent and identically distributed realizations of (Y, A, Δ, X).
Denote by f(t | x) and S(t | x) the conditional density function and survival function of T0 = t given X0 = x, and let be the conditional mean of T0 given X0 = x. We impose the following conditions for incident population random variables.
Assumption 1. The variable (T0, X0) is independent of when the disease incidence occurs, W0.
Assumption 2. Disease incidence occurs over calendar time at a constant rate.
Under Assumptions 1 and 2, the joint density function of (A, T) given X = x evaluated at (a, t) is f (t | x)μ(x)−1 I (t > a > 0) (Lancaster, 1990, Ch. 3), and the survival time T given X = x has a length-biased density function tf (t | x)μ(x)−1.
We assume that the survival time T0 in the incident population follows the Cox (1972) proportional hazards model λ(t | x) = λ(t) exp(β′x), where λ(t) is an unspecified, continuous baseline hazard function and β is a vector of p × 1 regression parameters. Let be the cumulative baseline hazard function. Under Assumptions 1 and 2 and the independence of C and (T, A) given X, the full likelihood function is proportional to
(1) |
2.2. Brief review of existing methods
The likelihood (1) can be re-expressed as the product of the truncation likelihood conditional on A and the marginal likelihood of A:
Written in this way, we see that there is information about the regression parameter β in ℒM(β, Λ). The truncation likelihood ℒT can be further decomposed as the product of the partial likelihood (Kalbfleisch & Lawless, 1991)
and the residual likelihood ℒR(β, Λ). Wang et al. (1993) showed that ℒP is fully efficient with respect to ℒT. However, under length-biased sampling the maximum partial likelihood estimator is expected to be inefficient, because it ignores information in ℒM(β, Λ).
Various methods that better exploit the special structure of length-biased survival data have been proposed in the literature. Let G(t) be the survival function of the censoring time C, and let Ĝ (t) be the Kaplan–Meier estimator of G(t) based on {(yi − ai, 1 − δi) : i = 1, …, n}. Qin & Shen (2010) proposed to solve the weighted estimating equation
where the contribution of a subject in the risk set is inversely weighted by the probability of the subject being sampled and uncensored. This estimating method, however, might be unstable as the weight function involves estimation of the tail probability of the censoring distribution. As an alternative, Qin & Shen (2010) considered solving the estimating equation
with . The weight function ŵc(yj)−1 is the integral of the censoring survival function, which is more stable than the weight function in U1. A major restriction of the two estimating equation-based methods is that the censoring time must not depend on the covariates. Moreover, the estimating equations only use covariate information from uncensored individuals, suggesting that there is still room for efficiency gains.
2.3. Maximum pseudo-profile likelihood estimator
The maximum likelihood estimator could be obtained by applying the semiparametric profile likelihood method (Murphy & van der Vaart, 2000) to deal with the nuisance parameter Λ. For length-biased sampling data, however, maximizing ℒ with respect to Λ for fixed β is computationally difficult because ℒ involves Λ in a complicated way. Instead of profiling out the nonparametric component Λ in ℒ, we propose to replace Λ(t) with a simple estimate that is consistent and has a n1/2-convergence rate. This approach has been used in various contexts under various names, including pseudo- and estimated-likelihood estimation (Gong & Samaniego, 1981; Pepe & Fleming, 1991; Severini & Wong, 1992; Zucker, 2005).
Our simple estimate is based on profiling the truncation likelihood ℒT(β, Λ). Specifically, for fixed β, the truncation likelihood ℒT(β, Λ) is maximized by the Breslow-type estimator
in the class of nondecreasing right-continuous functions which jump only at uncensored failure times. Note that Λ̂β(t) can be generalized to handle time-varying covariates. Profiling out Λ from the truncation likelihood ℒT(β, Λ) yields the partial likelihood, that is, ℒT(β, Λ̂β) = ℒP(β). Replacing Λ with Λ̂β in the full likelihood ℒ, we obtain a pseudo-profile likelihood function,
We propose to estimate the regression parameter β by maximizing the pseudo-profile likelihood.
Assume that T0, and hence T, has a finite maximal support τ, where τ = sup{t : pr(T0 ⩽ t) < 1} < ∞. Then τ is also the maximal support for the truncation time random variable A, as A given T has a uniform distribution on [0, T]. We further assume that C is not degenerate at 0, that is, pr(C > 0) > 0. Then it can be shown that max Δi Yi → τ as n → ∞. Thus, Λ(t) is estimable on the interval [0, τ]; as a result, the conditional mean of T0 given X is also estimable. Let Ni (t) = δiI (yi ⩽ t) be the counting process of observed failure events for subject i, and denote and Fu(t) = pr(Δ = 1, Y ⩽ t). Define the functions , and let 𝒮(k)(u, β) = E{X⊗k exp(β′X)I (A ⩽ u ⩽ Y)} be the expectations. Assume that X is bounded, that the two classes of functions {ΔI (Y ⩽ t) : t ∈ [0, τ]} and {X⊗k exp(β′ X)I (A ⩽ t ⩽ Y) : t ∈ [0, τ], β ∈ Θ} are both Glivenko–Cantelli, as the class of indicator functions and the class of bounded monotone functions are Glivenko–Cantelli (van der Vaart & Wellner, 1996, Theorems 2.4.1 and 2.7.5). Moreover, because S(0)(t, β) is bounded away from zero, we can show that supt∈[0,τ],β∈Θ | Λ̂β(t) − Λβ(t) |→ 0 almost surely as n → ∞, where
(2) |
The limit Λβ(t) of Λ̂β(t) defines a smooth mapping in β, and it passes through the true baseline cumulative hazard function Λ(t) when β equals the true parameter value. If we regard (2) as a known function of β, the function ℒ(β, Λβ) can be viewed as the full likelihood function derived under an induced parametric submodel λ(t | x) = λβ(t) exp(β′ X).
Replacing Λ with Λ̂β in ℒ(β, Λ), we obtain a log pseudo-profile likelihood function ℓ(β) = ℓP(β) + ℓM(β), where
is the log partial likelihood obtained by profiling out Λ from the truncation likelihood ℒT, and
with . We show in the Appendix that, in a compact neighbourhood of the true regression parameter, ℓ(β) can be approximated by ℓ̃(β) = ℓ̃P(β) + ℓ̃M(β), where , and . Thus, ℓ and ℓ̃ have similar local behaviour in the compact neighbourhood, and the asymptotic properties of the maximum pseudo-profile likelihood estimator can be investigated through ℓ̃.
Define the limit function γ(β) = limn→∞ n−1ℓ̃(β) = limn→∞n−1 {ℓ̃P(β) + ℓ̃M(β)}. We denote the true parameter values of the proportional hazards model by {β0, λ0(·)}, and define . Theorem 1 summarizes the consistency and asymptotic normality of β̂ that maximizes the log pseudo-profile likelihood function ℓ(β), with proofs given in the Appendix.
Theorem 1. Assume the following conditions hold: (a) β0 lies in the interior of a known compact set Θ in ℝp; (b) X is bounded; (c) pr(Y ⩾ t) is a continuous function for t ∈ [0, τ] and (d) ∂2γ(β0)/∂β′∂β is nonsingular. Then β̂ → β0 in probability as n → ∞. Moreover, n1/2 (β̂ − β0) converges in distribution to a zero mean multivariate normal distribution with variance-covariance matrix ∑(β0), where ∑(β0) is specified in the Appendix.
While the asymptotic variance ∑(β0) may be estimated by its empirical version, the computation is quite complicated. Since we have established the asymptotic normality, it is computationally more convenient to use the bootstrap method. The performance of the proposed estimator is evaluated in § 3 via simulations.
2.4. Efficiency considerations
To investigate the potential efficiency gains in the proposed pseudo-profile likelihood estimator, we first consider the case that Λ is parameterized by a vector of q × 1 parameters ν, that is, Λ(t) = Λ(t, ν). For model identifiability, we assume without loss of generality that E(X) = 0. Define the log truncation likelihood function ℓT = log(ℒT). The proposed method is equivalent to solving the system of estimating equations ∂ℓT/∂β + ∂ℓM/∂β = 0 and ∂ℓT/∂ν = 0. Let η = (∂ℓT/∂β, ∂ℓT/∂ν, ∂ℓM/∂β) be a vector of score functions. Define
and let b1 = −E(∂2ℓM/∂β′∂β) and b2 = −E(∂2ℓM/∂β′∂ν). Denote ν* = (β, ν). Then the optimal linear combination of estimating functions is
(3) |
where, for convenience, 0 denotes a matrix of 0s of appropriate dimensions and Ip is a p × p identity matrix. It can be verified that, when evaluated at the true parameter values, b2 = n/2 × E(X exp(2β′ X) × ∂/∂ν′[Λ(A)2 − E{Λ(A) | X}2]). Hence, if b2 = 0, the system of estimating equations ∂ℓT/∂β + ∂ℓM/∂β = 0 and ∂ℓT/∂ν = 0 is the optimal linear combination of estimating equations based on η. The partial likelihood method solves the system of estimating equations ∂ℓT/∂β = 0 and ∂ℓT/∂ν = 0, which also belongs to the class of linear combinations of estimating equations based on η. Thus, the proposed method is more efficient than the partial likelihood method when b2 = 0.
When the baseline hazard function λ is of infinite dimension, the proposed pseudo-profile likelihood method solves the system (van der Vaart, 1998, § 25.12)
where Ψ is the score operator (Begun et al., 1983) for Λ based on the truncation likelihood ℓT = log ℒT and H is a infinite-dimensional class of direction h from which paths of one-dimensional submodels for Λ may approach the true parameter. We use ℙn to denote the empirical measure, and use P for the probability measures. Let L2(μ) denote the Hilbert space that contains square integrable functions with the inner product 〈g, h〉μ = ∫ g(u)h(u) dμ(u) for g, h ∈ L2 (μ). It is easy to see that ℋ ⊂ L2(Λ). Applying a similar argument as in van der Vaart (1998, § 25.12.1), we can show that the score operator Ψ : L2(Λ) → L2(Pβ,Λ) for Λ is given by . Let H̄ be a Hilbert space containing ℋ. The adjoint operator Ψ* : L2(Pβ,Λ) → H̄ of Ψ, which satisfies for all g ∈ H̄ and h ∈ L2(Pβ,Λ), can be shown to be Ψ*(g)(t) = E{gd M(t)}/dΛ(t). It can be further shown that Ψ*Ψ(h)(t) = E{h(t) exp(β′X)I (Y ⩾ t ⩾ A)} and Ψ*(∂ℓT/∂β)(t) = nE{X exp(β′X)I (Y ⩾ t ⩾ A)} (Murphy & van der Vaart, 2000).
By a similar argument as above, we show in the Supplementary Material that the score operator Φ : L2(Λ) → L2(Pβ,Λ) for Λ based on the marginal likelihood ℓM is . The adjoint operator Φ* : L2(Pβ,Λ) → H̄ of Φ can be shown to be Φ*(g)(t) = −E[g{I (A ⩾ t) − pr(A ⩾ t | X)} exp(β′X)]. Moreover, the adjoint operator Φ* satisfies and Φ*(∂ℓM/∂β)(t) = nE[{I (A ⩾ t) − pr(A ⩾ t | X)}{Λ(A) − E(Λ(A) | X)}X exp(2β′X)].
Analogous to (3) for parametric models, the optimal combination of estimating functions based on the score operators ∂ℓT/∂β, ∂ℓM/∂β and Ψ is given by ℙn(∂ℓT/∂β + ∂ℓT/∂ β) = 0 and , h ∈ H, where b1 = − E(∂2ℓM/∂β′ ∂β) and . Hence, if , the proposed pseudo-partial likelihood method is the most efficient estimator in the class of linear combinations of estimating functions based on ∂ℓT/∂β, ∂ℓM/∂β and Ψ. The weight can be estimated by replacing Λ with Λ̂ in the corresponding empirical estimators. In general, solving the optimal combination of estimation equations is computationally intensive, and hence is impractical. Moreover, there is no guarantee that it works better than the proposed method for small samples.
3. Simulations and data analysis
3.1. Monte-Carlo simulations
We conducted simulations to assess the performance of the proposed methods. In each simulation, 2000 studies were generated, each with n = 400. The sampling time ξ was set to be 100, and the time of disease onset, W0, was simulated from a uniform distribution over [0, 100] to mimic the incidence of a stable disease. For each subject, we generated from the Bernoulli distribution with and generated from the standard normal distribution. The survival time T0 was independently generated from one of the three models: (I) an exponential distribution with hazard function , (II) a Weibull distribution with hazard function or (III) a Weibull distribution with hazard function . Thus, we simulated failure time distributions with constant, increasing and U-shape hazards. To form a prevalent cohort of sample size n, realizations of (W0, T0, , ) were generated repeatedly until there were n subjects satisfying the sampling constraint W0 + T0 ⩾ τ. The time from enrolment ξ to loss to follow-up was generated from a uniform distribution so that the censoring rate was approximately 0, 30 and 50%.
We compared the finite-sample performance of the proposed pseudo-profile likelihood method with those of the weighted estimating equation methods studied in Qin & Shen (2010) and of the popular partial likelihood method for truncated survival time data. By applying these methods to estimate the Cox model λ(t | X1, X2) = λ0(t) exp(β1X1 + β2X2), we evaluated the relative efficiency by comparing the bootstrap variance of the maximum partial likelihood estimator to that of the other methods. Table 1 summarizes the empirical bias, empirical standard error and the relative efficiency of these four estimation methods. All four estimators are close to their estimands. In the absence of censoring, the pseudo-profile likelihood method has a similar efficiency gain as the weighted estimating equation methods in Qin & Shen (2010). Overall, the relative efficiency of the proposed estimator increases with censoring rate. When the censoring proportion reaches 50%, the pseudo-profile likelihood estimator yields a significant improvement over the maximum partial likelihood estimator, with an efficiency gain greater than 50% in the exponential and Weibull cases, and an efficiency gain greater than 20% in the U-shape hazard function scenario. In the presence of censoring, the proposed pseudo-profile method always outperforms its competitors. In some scenarios, weighted estimating equation methods fail to show improvement, as these methods only use covariate information from uncensored subjects.
Table 1.
Proportion censored | Estimated coefficient | Partial | WEE-1 | WEE-2 | Profile | |||||
---|---|---|---|---|---|---|---|---|---|---|
Bias | SE | Bias | SE | Bias | SE | Bias | SE | RE | ||
Scenario I: λ0(t) = 2 | ||||||||||
0% | β̂1 | 6 | 133 | −1 | 98 | −1 | 98 | −2 | 98 | 1.84 |
β̂2 | 5 | 83 | 0 | 65 | 0 | 65 | −2 | 65 | 1.60 | |
30% | β̂1 | 6 | 151 | −46 | 136 | 2 | 120 | 1 | 108 | 1.98 |
β̂2 | 6 | 94 | −57 | 90 | 2 | 82 | 4 | 77 | 1.50 | |
50% | β̂1 | 10 | 171 | −113 | 189 | 10 | 157 | 12 | 122 | 1.96 |
β̂2 | 6 | 112 | −125 | 115 | 2 | 98 | 10 | 90 | 1.53 | |
Scenario II: λ0(t) = 2t | ||||||||||
0% | β̂1 | 4 | 118 | 0 | 98 | 0 | 98 | −1 | 97 | 1.45 |
β̂2 | 2 | 74 | 0 | 63 | 0 | 63 | −1 | 63 | 1.40 | |
30% | β̂1 | 3 | 132 | −21 | 137 | 9 | 121 | 3 | 105 | 1.58 |
β̂2 | 5 | 89 | −28 | 90 | 4 | 80 | 2 | 73 | 1.46 | |
50% | β̂1 | 6 | 159 | −101 | 206 | 10 | 154 | 5 | 118 | 1.82 |
β̂2 | 8 | 100 | −97 | 118 | 6 | 96 | 5 | 79 | 1.62 | |
Scenario III: λ0(t) = 0.5(t − 2)2 | ||||||||||
0% | β̂1 | 6 | 112 | 5 | 104 | 5 | 104 | 5 | 104 | 1.17 |
β̂2 | 6 | 69 | 5 | 65 | 5 | 65 | 5 | 64 | 1.14 | |
30% | β̂1 | 10 | 134 | 10 | 143 | 10 | 130 | 9 | 122 | 1.21 |
β̂2 | 7 | 82 | 0 | 85 | 3 | 80 | 4 | 77 | 1.14 | |
50% | β̂1 | 9 | 151 | −34 | 216 | 7 | 154 | 6 | 134 | 1.27 |
β̂2 | 7 | 97 | −35 | 123 | 7 | 98 | 5 | 88 | 1.20 |
Partial, the maximum partial likelihood estimator; WEE-1 and WEE-2, estimators derived by solving U1(β) = 0 and U2(β) = 0; Profile, the maximum pseudo-profile likelihood estimator; Bias and ES, empirical bias (×1000) and empirical standard deviation (×1000) of 2000 regression parameter estimates; RE, the empirical variance of the maximum partial likelihood estimator divided by that of the maximum pseudo-profile likelihood estimator.
In addition to better efficiency, another advantage of the proposed pseudo-profile likelihood method is that it does not involve estimation of the censoring distribution. When this distribution depends on the covariate, the estimating equation methods may yield biased estimation. For demonstration, we simulated survival time data under Model (II). The censoring times for subjects with observed covariates X1 = 1 and X2 < 0 were generated from an exponential distribution with mean 5 exp(−X2), while the censoring times for other subjects were generated from a uniform distribution. The overall censoring proportion was set at approximately 30 and 50%. As summarized in Table 2, the estimating equation-based methods yield biased estimators, while the bias of the pseudo-profile estimators remains small.
Table 2.
Proportion censored | Estimated coefficient | Partial | WEE-1 | WEE-2 | Profile | |||||
---|---|---|---|---|---|---|---|---|---|---|
Bias | SE | Bias | SE | Bias | SE | Bias | SE | RE | ||
30% | β̂1 | 7 | 132 | −388 | 135 | −127 | 119 | 2 | 105 | 1.56 |
β̂2 | 8 | 87 | 48 | 84 | 30 | 79 | 4 | 73 | 1.44 | |
50% | β̂1 | 10 | 166 | −819 | 175 | −252 | 161 | −2 | 128 | 1.69 |
β̂2 | 8 | 103 | 82 | 107 | 51 | 97 | 6 | 80 | 1.64 |
See Table 1 for abbreviations.
3.2. Analysis of Canadian Study of Health and Aging
In this section, we report the results of data analysis for a cohort of prevalent cases in one of the largest epidemiologic studies of dementia, the Canadian Study of Health and Aging. From February 1991 to May 1992, an extensive survey was carried out and a total of 1132 persons aged 65 and older with dementia were identified in this first phase of the study. For each study subject, a diagnosis of possible Alzheimer’s disease, probable Alzheimer’s disease, or vascular dementia was assigned, and the date of dementia onset was determined by interviewing care-givers. Information on mortality were collected between January 1996 and May 1997.
We considered a subset of the study data by excluding those with missing date of onset or missing dementia subtype classification. Moreover, as in Wolfson et al. (2001), those with observed survival times greater than or equal to 20 years were excluded because these subjects are considered unlikely to have Alzheimer’s disease or vascular dementia. As a result, a total of 807 dementia patients were included in our analysis. Among them 388 had a diagnosis of probable Alzheimer’s disease, 249 had possible Alzheimer’s disease and 170 had vascular dementia. In the second phase of the study, a total of 627 deaths were recorded, among whom 302 had a diagnosis of probable Alzheimer’s, 189 had possible Alzheimer’s and 136 had vascular dementia.
The stationarity assumption that the incidence of dementia is constant over time was found to be reasonably met for this data using the method suggested in Wang (1991). To compare the risk of death between different diagnoses, we fit a Cox proportional hazards model for the length-biased survival time data, with indicators of probable Alzheimer’s and vascular dementia as covariates. We applied the pseudo-profile likelihood method, the two weighted estimating equation methods in Qin & Shen (2010), and the partial likelihood method. The estimated regression coefficients are summarized in Table 3. The proposed method yields similar estimates of the regression parameters as do the estimating equation methods, and the bootstrap standard errors of the proposed estimator are smaller than those of its competitors. The proposed pseudo-profile likelihood method estimates a significant higher risk of death in patients with probable Alzheimer’s and those with vascular dementia. Specifically, as compared with patients with possible Alzheimer’s, the risk of death increased by 16% among those with probable Alzheimer’s and by 27% among those with vascular dementia. For β1, the variance ratio for the competitors to the proposed method is always at least 1.67. This suggests that if a competitor method were used in lieu of the proposed method, the study would need to recruit at least 760 more subjects to achieve the same precision.
Table 3.
β1, probable Alzheimer’s | β2, vascular dementia | |||||
---|---|---|---|---|---|---|
Method | Estimate | SE | 95% CI | Estimate | SE | 95% CI |
Partial | 0.030 | 0.089 | (−0.142, 0.203) | 0.113 | 0.109 | (−0.103, 0.323) |
EE-1 | 0.130 | 0.095 | (−0.058, 0.312) | 0.278 | 0.107 | (0.070, 0.497) |
EE-2 | 0.157 | 0.088 | (−0.022, 0.328) | 0.257 | 0.121 | (0.038, 0.519) |
Profile | 0.150 | 0.068 | (0.016, 0.278) | 0.241 | 0.088 | (0.066, 0.419) |
Partial, maximum partial likelihood estimator; EE-1 and EE-2, estimators derived by solving U1(β) = 0 and U2(β) = 0; Profile, the pseudo-profile likelihood estimator; SE, the empirical standard deviation of 2000 regression parameter.
4. Remark
The validity of the proposed method relies on the assumption of stable disease. When the stationarity assumption fails to hold, it is not uncommon that knowledge about the distribution of disease incidence can be obtained from other sources. If H denotes the distribution of the truncation time in the disease population, then the transformed survival time H(T0) is truncated by a uniformly distributed random variable. Thus, it follows from the fact that the Cox model is invariant under monotone transformation, that the regression coefficients in the Cox model can be consistently estimated by applying the proposed method to the transformed data {H(ai), H(yi), δi} (i = 1, …, n).
Acknowledgments
The authors thank Professors Ian McDowell, Masoud Asgharian and Christina Wolfson for kindly sharing the Canadian Study of Health and Aging data. The core study was funded by the National Health Research and Development Program, Canada. Additional funding was provided by Pfizer Canada Incorporated through the Medical Research Council/Pharmaceutical Manufacturers Association of Canada Health Activity Program, Bayer Incorporated and the British Columbia Health Research Foundation. The authors also thank the referees, associate editor and editor for their comments which improved the presentation of this article.
Appendix.
Proofs
We begin by establishing the consistency of β̂. In view of the proof of van der Vaart (1998, Theorem 5.7), it suffices to show that, as n → ∞, supβ∈Θ|n−1ℓ(β) − γ(β) |→ 0 almost surely and that β0 is the unique maximizer of γ (β) in a compact neighbourhood of β0.
We first show that, for sufficiently large n, ℓ(β) has similar local behaviour to ℓ̃(β) in the compact neighbourhood Θ. Because {exp(β′X)I (A ⩽ t ⩽ Y) : t ∈ [0, τ], β ∈ Θ} is Glivenko–Cantelli and the logarithmic transformation is monotone, log{S(0)(t, β)} − log{𝒮(0)(t, β)} converges to 0 uniformly over β ∈ Θ and t ∈ [0, τ]. Hence, supβ∈Θ | n−1ℓP(β) − n−1ℓ̃P(β)| → 0 almost surely. Following the result that Λ̂β (t) converges to Λβ(t) uniformly over β ∈ Θ and t ∈ [0, τ], exp{Λ̂β(t) exp(β′X)} converges to exp{Λβ (t) exp(β′X)} uniformly over β ∈ Θ and t ∈ [0, τ]. Hence, μ̂β(x) converges to μβ (x) uniformly over β ∈ Θ. For a δn > 0 with δn → 0 as n → ∞, define the class ℱ = [ f (t) = {g(t) − Λβ (t)} exp(β′ X), where β∈Θ, g is nondecreasing and nonnegative and supt∈[0,τ] | g(t) − Λβ (t) | ⩽ ∊n]. Thus, by definition, supf∈ℱ | P f | ⩽ ∊n × supβ∈Θ| exp(β′X) |. Moreover, it follows from van der Vaart & Wellner (1996, Theorems 2.7.5 and 2.4.1) that ℱ is Glivenko–Cantelli. Hence, supf∈ℱ | ℙn f − P f | → 0 almost surely. For a sufficiently large n, . Thus, we show that almost surely. By a similar argument, we can show that almost surely, and hence n−1ℓM(β) − n−1 ℓ̃M(β) → 0 uniformly over β ∈ Θ. Thus, ℓ(β) = ℓP(β) + ℓM(β) and ℓ̃(β) = ℓP(β) + ℓ̃M(β) have similar local behaviour in Θ.
Next, because Θ is compact and the function is continuous and dominated by an integrable function, the class of functions {m(β) : β ∈ Θ} is Glivenko–Cantelli (van der Vaart, 1998, Example 19.8). It follows from a uniform law of large numbers (Pollard, 1990) that supβ∈Θ | n−1 ℓ̃(β) − γ (β) | → 0 almost surely. Thus, supβ∈Θ | n−1ℓ(β) − γ (β) |→ 0 almost surely as n → ∞.
Below we prove that β0 is the unique maximizer in a neighbourhood of β0 by showing that ∂γ (β0)/∂β = 0 and ∂2γ(β0)/∂β′∂β is negative definitive at β = β0. Following the fact that the partial score function has expectation zero when evaluated at β = β0 and that E [∂ {Λβ (A) exp(β′X)}/∂β |β=β0] = − E{μβ (X)−1 ∂μβ (X)/∂β|β=β0} by double expectation, we can show that ∂γ (β)/∂β = 0 when β =β0. Write Sβ (u | x) = exp{−Λβ (u) exp(β′X)}. The second derivative of γ(β) is
(A1) |
(A2) |
(A3) |
(A4) |
(A5) |
By applying the double expectation technique, it can be shown that (A2) + (A3) = 0 for β = β0. Moreover, by the Cauchy–Schwarz inequality, both (A1) and (A4) + (A5) are negative semidefinite. Hence, it follows regularity condition (d) that ∂2γ(β)/∂β′∂β is negative definite at β = β0. Because the function γ(β) is continuous in β, there exists a compact neighbourhood Θ0 of β0 that β0 is the unique maximizer γ(β) in Θ0. This completes the proof of consistency.
We now prove the asymptotic normality of the maximum pseudo-profile likelihood estimator. A Taylor series expansion yields 0 = ∂ℓ(β)/dβ|β=β̂ = ∂ℓ(β)/∂β |β=β0 + ∂2ℓ(β)/∂β′∂β|β=β* (β̂ − β0), where β* lies between β̂ and β0. Thus, by consistency of β̂, one has β* → β0 in probability and
In what follows, we show that
(A6) |
(A7) |
has an asymptotic independent and identically distributed representation. Let H be the joint probability measure of (A, X) and let Ĥ be the corresponding empirical measure for H. Then the right-hand side of (A6) can be expressed as
Next, applying the functional delta method, we have , where
Thus, (A7) can be expressed as
Finally, applying the functional delta method, we can obtain the asymptotic representation for the partial score function: , where
Let ϕi (β0) = ϕ1i (β0) + ϕ2i (β0) + ϕ3i (β0). We have , where κi (β0) = ϕi (β0) − [∂{Λβ (ai) exp(β′xi)}/∂β + μβ (xi)−1∂μβ (xi)/∂β] |β=β0. Arguing as in the proof of consistency, we can show that, as n → ∞, n−1 {∂2ℓ(β)/∂β′∂β} |β=β0 → {∂2γ(β)/∂β′∂β} |β=β0 almost surely. Define Γ(β0) = E{κi (β0)⊗2}. Hence, under the regularity conditions, as n → ∞, n1/2 (β̂ − β0) converges weakly to a zero mean multivariate distribution with variance-covariance matrix ∑(β0) = {∂2γ (β0)/∂β′ ∂β}−1 Γ (β0){∂2γ (β0)/∂β′ ∂β}−1.
Supplementary material
References
- Begun JM, Hall WJ, Huang W-M, Wellner JA. Information and asymptotic efficiency in parametric-nonparametric models. Ann Statist. 1983;11:432–52. [Google Scholar]
- Cox DR. Regression models and life-tables (with discussion) J. R. Statist. Soc. B. 1972;34:187–220. [Google Scholar]
- de Una-Alvarez J, Otero-Giraldez M, Alvarez-Llorente G. Estimation under length-bias and right-censoring: an application to unemployment duration analysis for married women. J Appl Statist. 2003;30:283–91. [Google Scholar]
- Ghosh D. Proportional hazards regression for cancer studies. Biometrics. 2008;64:141–8. doi: 10.1111/j.1541-0420.2007.00830.x. [DOI] [PubMed] [Google Scholar]
- Gong G, Samaniego FJ. Pseudo maximum likelihood estimation: theory and applications. Ann Statist. 1981;9:861–9. [Google Scholar]
- Kalbfleisch JD, Lawless JF. Regression models for right truncated data with applications to AIDS incubation times and reporting lags. Statist. Sinica. 1991;1:19–32. [Google Scholar]
- Lagakos S, Barraj L, De Gruttola V. Nonparametric analysis of truncated survival data, with application to AIDS. Biometrika. 1988;75:515–23. [Google Scholar]
- Lancaster T. Econometric methods for the duration of unemployment. Econometrica. 1979;47:939–56. [Google Scholar]
- Lancaster T. The Econometric Analysis of Transition Data. Cambridge, UK: Cambridge University Press; 1990. [Google Scholar]
- Murphy SA, van der Vaart AW. On profile likelihood. J Am Statist Assoc. 2000;95:449–65. [Google Scholar]
- Pepe MS, Fleming TR. A nonparametric method for dealing with mismeasured covariate data. J Am Statist Assoc. 1991;413:108–13. [Google Scholar]
- Pollard D. Empirical Processes: Theory and Applications. Hayward, CA: Institute of Mathematical Statistics; 1990. [Google Scholar]
- Qin J, Shen Y. Statistical methods for analyzing right-censored length-biased data under Cox model. Biometrics. 2010;66:382–92. doi: 10.1111/j.1541-0420.2009.01287.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Severini TA, Wong WH. Profile likelihood and conditionally parametric models. Ann Statist. 1992;4:1768–802. [Google Scholar]
- Tsai W-Y. Pseudo-partial likelihood for proportional hazards models with biased-sampling data. Biometrika. 2009;96:601–15. doi: 10.1093/biomet/asp026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van der Vaart AW. Asymptotic Statistics. Cambridge: Cambridge University Press; 1998. [Google Scholar]
- van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. New York: Springer; 1996. [Google Scholar]
- Wang M-C. Nonparametric estimation from cross-sectional survival data. J Am Statist Assoc. 1991;86:130–43. [Google Scholar]
- Wang M-C, Brookmeyer R, Jewell N. Statistical models for prevalent cohort data. Biometrics. 1993;49:1–11. [PubMed] [Google Scholar]
- Wolfson C, Wolfson DB, Asgharian M, M’Lan CE. A reevaluation of the duration of survival after the onset of dementia. New Engl J Med. 2001;344:1111–6. doi: 10.1056/NEJM200104123441501. [DOI] [PubMed] [Google Scholar]
- Zelen M. Forward and backward recurrence times and length biased sampling: age specific models. Lifetime Data Anal. 2004;10:325–34. doi: 10.1007/s10985-004-4770-1. [DOI] [PubMed] [Google Scholar]
- Zelen M, Feinleib M. On the theory of screening for chronic diseases. Biometrika. 1969;56:601–14. [Google Scholar]
- Zucker DM. A pseudo-partial likelihood method for semiparametric survival regression with covariate errors. J Am Statist Assoc. 2005;100:1264–77. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.