Composite Partial Likelihood Estimation Under Length-Biased Sampling, With Application to a Prevalent Cohort Study of Dementia

Chiung-Yu Huang; Jing Qin

doi:10.1080/01621459.2012.682544

. Author manuscript; available in PMC: 2013 Aug 31.

Published in final edited form as: J Am Stat Assoc. 2012 Oct 8;107(499):946–857. doi: 10.1080/01621459.2012.682544

Composite Partial Likelihood Estimation Under Length-Biased Sampling, With Application to a Prevalent Cohort Study of Dementia

Chiung-Yu Huang ¹, Jing Qin ²

PMCID: PMC3758493 NIHMSID: NIHMS460941 PMID: 24000265

Abstract

The Canadian Study of Health and Aging (CSHA) employed a prevalent cohort design to study survival after onset of dementia, where patients with dementia were sampled and the onset time of dementia was determined retrospectively. The prevalent cohort sampling scheme favors individuals who survive longer. Thus, the observed survival times are subject to length bias. In recent years, there has been a rising interest in developing estimation procedures for prevalent cohort survival data that not only account for length bias but also actually exploit the incidence distribution of the disease to improve efficiency. This article considers semiparametric estimation of the Cox model for the time from dementia onset to death under a stationarity assumption with respect to the disease incidence. Under the stationarity condition, the semiparametric maximum likelihood estimation is expected to be fully efficient yet difficult to perform for statistical practitioners, as the likelihood depends on the baseline hazard function in a complicated way. Moreover, the asymptotic properties of the semiparametric maximum likelihood estimator are not well-studied. Motivated by the composite likelihood method (Besag 1974), we develop a composite partial likelihood method that retains the simplicity of the popular partial likelihood estimator and can be easily performed using standard statistical software. When applied to the CSHA data, the proposed method estimates a significant difference in survival between the vascular dementia group and the possible Alzheimer’s disease group, while the partial likelihood method for left-truncated and right-censored data yields a greater standard error and a 95% confidence interval covering 0, thus highlighting the practical value of employing a more efficient methodology. To check the assumption of stable disease for the CSHA data, we also present new graphical and numerical tests in the article. The R code used to obtain the maximum composite partial likelihood estimator for the CSHA data is available in the online Supplementary Material, posted on the journal web site.

Keywords: Backward and forward recurrence time, Cross-sectional sampling, Random truncation, Renewal processes

1. INTRODUCTION

Dementia is a major worldwide health problem among older people. It is an irreversible, progressive brain disease characterized by loss of memory, language, perception, complex motor skills, and other intellectual functions. Death by dementia is usually caused by secondary infections or other physical problems that result from generally poorer health of dementia patients. In North America, the overall prevalence of dementia is estimated to be 6.4% in people older than 60 years. The disease becomes more common in older individuals, with the age-specific prevalence rate reaching a level of 12.8% in people aged 80–84 and of 30.1% in people older than 85 years (Ferri et al. 2005). As the population grows old, the prevalence of dementia in the developed world is expected to increase considerably over the next 20 years. In 2001, the number of dementia cases in North America was estimated to be 3.4 millions. This number is expected to increase to 5.1 million in 2020 and to 9.2 million in 2040.

With annual costs exceeding 100 billion, dementia is the third most costly illness in the United States, second only to heart disease and cancer. Research is actively underway to develop effective interventions to reduce the prevalence and incidence of the disease, improve the quality of life for patients and their caregivers, and reduce the resources needed to provide adequate medical care. To better allocate health services, it is key to evaluate the impact of dementia on life expectancy. In the United States, Medicare beneficiaries must have an estimated life expectancy of less than 6 months to be eligible for hospice. However, information on the survival of patients with specific types of dementia is limited and is usually estimated with substantial uncertainty, particularly for vascular dementia.

Ideally, to estimate survival after onset of dementia, prospective long-term follow-up studies should be conducted by continuously monitoring the incidence of onset of dementia in initially undiseased elderly people. In many situations, studies on incidence cohorts are usually not available, mainly due to cost and time constraint; thus, data on diseased individuals are often collected and analyzed. Between February 1991 and May 1992, the Canadian Study of Health and Aging (CSHA), one of the largest epidemiology studies of dementia (McDowell, Hill, and Lindsay 2005), randomly selected 14,026 persons aged 65 and older from 36 urban and surrounding rural areas in Canada. Among the 10,263 individuals who agreed to participate, a total of 1132 persons with dementia were identified by the Modified Mini-Mental State Examination (MMSE). For each case of dementia, a diagnosis of possible Alzheimer’s disease (referred to in this article as possible Alzheimer), probable Alzheimer’s disease (referred to as probably Alzheimer), or vascular dementia was assigned by the study staff, and the date of dementia onset was determined by interviewing caregivers. Based on the mortality data collected before May 1997, naively applying the Kaplan-Meier estimator yields a median survival of 6.6 years from the onset of dementia (Wolfson et al. 2001), which falls in the relatively wide range of 5 years to 9.3 years reported in previous studies. Moreover, the partial likelihood method for right-censored data estimates hazard ratios of 1.22 (P = 0.03) and 1.34 (P = 0.01) when comparing probable Alzheimer and vascular dementia to possible Alzheimer.

As pointed out by Wolfson et al. (2001) the CSHA had a prevalent cohort study design, because survival data were collected from a prevalent cohort of dementia patients who had not experienced the failure event (death) at the time of recruitment. Compared with the incident cohort approach, which follows initially undiseased individuals from disease onset to failure, the prevalent cohort approach can be more efficient especially when used to study a rare disease, as it usually requires a shorter follow-up period to accumulate enough number of failure events. Individuals in the prevalent cohort, however, are not a representative sample of the target disease population, because diseased individuals who died before the recruitment period would not be qualified to be sampled. In other words, the survival time in a prevalent cohort study is subject to left truncation, where the truncation time is the duration from disease onset to enrollment. As a result, the sampling scheme favors individuals who survive longer; that is, the enrollees tend to have a slow disease progression and thus, a longer duration between disease onset and failure. The Kaplan- Meier estimator, which fails to account for left truncation, can lead to substantial overestimation of the survival time.

To analyze right-censored survival data collected under prevalent sampling, a simple approach is to apply statistical methods for truncated data (see Andersen et al. 1993; Kalbfleisch and Prentice 2002; and references therein), among them the popular product-limit estimator (Wang 1991) for nonparametric estimation and the maximum partial likelihood estimator for semiparametric estimation of the Cox model (Kalbfleisch and Lawless 1991; Wang, Brookmeyer, and Jewell 1993). By applying the truncation product-limit estimator, the median survival after the onset of dementia is estimated to be 3.6 years, which is substantially shorter than the estimate given by the Kaplan-Meier curve. The partial likelihood method for right-censored data estimates hazard ratios of 1.03 (P = 0.75) and 1.12 (P = 0.32) when comparing probable Alzheimer and vascular dementia to possible Alzheimer. Thus, the significance in the difference between possible Alzheimer and the other two subtypes disappears after adjusting for left truncation. The aforementioned nonparametric and semiparametric estimators are fully efficient when the distribution of the random truncation time is completely unspecified (Wang, Brookmeyer, and Jewell 1993), but they can be very inefficient when the distribution of the truncation time can be parameterized.

In many applications, including the CSHA, it is reasonable to assume that the incidence of disease onset follows a Poisson process (see Zelen and Feinleib 1969; Simon 1980; Winter and Foldes 1988; de Uña-Álvarez 2004); that is, the disease incidence is stable over time. Under the stable disease condition the left-truncation variable is uniformly distributed, and the probability of a survival time being sampled is proportional to its length. In other words, the survival time in the prevalent cohort has a length-biased distribution. To emphasize this important property, the sampling of prevalent cases under the stable disease condition is termed length-biased sampling in this article. When applied to length-biased survival data, the truncation product-limit estimator and the maximum partial likelihood estimator for truncated data are consistent yet inefficient, as the estimation procedures do not exploit the special structure of length-biased sampling. On the other hand, the nonparametric and semiparametric maximum likelihood estimators under length-biased sampling (Vardi 1989); Asgharian, M’Lan, and Wolfson 2002) are efficient yet computationally intensive, as they do not have closed-form expressions. Further, their limiting distributions are intractable (Vardi and Zhang 1992; Asgharian and Wolfson 2005). Many authors, including Luo and Tsai (2009), Tsai (2009), and Qin and Shen (2010), have proposed alternative nonparametric and semiparametric methods that are formulated based on weighted risk sets. Specifically, the risk set at an event time considered by Tsai (2009) consists of all subjects who survive beyond the event time, and the weight function is proportional to the probability that a subject is in the risk set. On the other hand, the risk set considered by Qin and Shen (2010) consists of uncensored subjects who survive beyond the event time, and the weight function is proportional to the probability that a subject is uncensored and is in the risk set. These methods are easier to deal with and have relatively high efficiency; while they can be implemented in statistical software with additional programming, they cannot deal with covariate-dependent censoring.

In this article, we introduce a simple and efficient method for estimating the Cox model with survival data collected under length-biased sampling. Our methodology is motivated by the composite likelihood method developed by Lindsay (1988) and Arnold and Strauss (1988), whose idea dates back to Besag (1974). As recently highlighted in a special issue of Statistica Sinica in January 2011, the composite likelihood method has drawn a good deal of attention and has been widely applied to many important areas of research, including longitudinal data analysis, spatial analysis, and statistical genetics, among others. The composite likelihood method is based on a pseudo-likelihood constructed by compounding low-dimensional marginal or conditional densities. This approach is especially useful for large dimensional data where a fully specified model may not be available or difficult to estimate, as the full likelihood usually involves high-dimensional integrals. (Readers are referred to Varin (2008) and Varin, Reid, and Firth (2011) for a comprehensive review of the composite likelihood methods.)

We explore the application of the composite likelihood method to length-biased survival data. The proposed estimator not only enjoys a simple structure as the maximum partial likelihood estimator, but also allows for covariate-dependent censoring. Importantly, the estimation procedure can be easily performed using standard statistical software without additional programming efforts. When applied to the data from the CSHA, the proposed method yields smaller standard errors in the estimation of regression coefficients in the Cox model. It is also demonstrated through a series of simulations that the proposed estimator is always at least as efficient as other alternative estimators. Owning to its computational simplicity and the gain in efficiency, the proposed composite partial likelihood estimator is expected to be an attractive method for statistical practitioners.

2. METHODOLOGY

2.1 Notation

For individuals in the susceptible population, let T⁰ denote the time from the disease onset to the failure event of interest, and let X⁰ denote a p × 1 vector of covariates. For the CSHA, T⁰ is the survival time after the onset of dementia, and $X_{0} = (X_{1}^{0}, X_{2}^{0})$ is a vector of indicators for probable Alzheimer and vascular dementia. The conditional density function, survival function, and hazard function of the survival time T⁰ given X⁰ = x are denoted by f (t | x), S(t | x) and λ(t | x), respectively. We assume that the survival time in the target population follows the Cox proportional hazards model (Cox 1972)

λ (t | x) = λ (t) exp (β^{'} x),

where λ(t) is an unspecified baseline hazard function and β is a p × 1 vector of parameters. Define $Λ (t) = \int_{0}^{t} λ (u) d u$ ; thus, Λ is the baseline cumulative hazard function.

We denote by A⁰ the time between disease onset and study enrollment, and assume that A⁰ is independent of T⁰. In a prevalent cohort study, a diseased subject would be qualified to be sampled only if the failure event does not occur before the sampling time (see Figure 1A), that is, T⁰ ≥ A⁰. In other words, T⁰ is left truncated by A⁰. Denote by T, A, and X the survival time, truncation time, and the covariates for individuals in the prevalent cohort. Then the triplet (T,A, X) has the same joint distribution as (T⁰, A⁰, X⁰) | T⁰ ≥ A⁰.

(A) The horizontal lines denote time from onset of dementia to death in the susceptible population, ◦ denotes censoring, and ⊙ indicates the sampling time of an individual. The prevalent population (bolded lines) consists of subjects with dementia at the time of recruitment; dementia patients who died before the sampling period (dashed lines) are not qualified to be sampled. (B) A patient has disease onset at calendar timeW and, if still alive, is available for enrollment at calendar time ξ into the prevalent cohort. The time to death is T and the calendar time of death is W + T. A patient in the prevalent cohort is followed for a period of time C; thus, we observe Y = min{*T,A* + C}. For patients in the prevalent cohort, T must exceed A = ξ − W, the time from onset to enrollment, and thus, T is a left-truncated random variable.

The observation of the survival time in the prevalent cohort is usually subject to right censoring due to study end or premature dropout. Figure 1B depicts the time variables of a censored subject in the prevalent cohort. Instead of observing the actual value of T, we observe the censored survival time Y = min(T,A + C). In many applications, it is reasonable to assume that the censoring time after enrollment, C, is independent of (T,A) given X. Note, however, that the survival time T and the total censoring time A + C are dependent, as they share the same A. In other words, the survival time T is subject to informative censoring. Let G(t | x) denote the survival function of C given X = x. We assume that the observed data {(Yⁱ,Aⁱ , X_i, Δ_i), i = 1, … n} are independent and identically distributed copies of (Y, A, X, Δ), where Δ = I (T ≤ A + C).

2.2 The Partial Likelihood Method

Under length-biased sampling, the truncation variable A⁰ follows a uniform distribution and the joint density function of (T,A) given X = x evaluated at (t, a) is

\frac{f (t | x)}{μ (x)} I (t \geq a)

(1)

(Lancaster 1990, chap. 3), where μ (x) = ∫ uf (u | x)du is the conditional mean of T⁰ given X⁰ = x. It follows from Equation (1) and the Cox model that the full likelihood is proportional to

ℒ_{F} = \prod_{i = 1}^{n} \frac{f {(Y_{i} | X_{i})}^{Δ i} S {(Y_{i} | X_{i})}^{1 - Δ i}}{μ (X_{i})} = \prod_{i = 1}^{n} \frac{{λ (Y_{i}) exp (β^{'} X_{i})}^{Δ i} exp {- Λ (Y_{i}) exp (β^{'} X_{i})}}{\int_{0}^{\infty} exp {- Λ (u) exp (β^{'} X_{i})} d u} .

Direct maximization of ℒ_F with respect to (β,Λ) is expected to yield fully efficient estimators; however, this approach is computationally cumbersome not only because ℒ_F involves the nonparametric component Λ in a complicated way but also because it encounters high-dimensional maximization when the sample size is large.

A simple alternative to estimate β is to apply the partial likelihood method for truncated data (Kalbfleisch and Lawless 1991; Wang, Brookmeyer, and Jewell 1993), as survival data arising from length-biased sampling is a special case of truncated data. Note that the full likelihood ℒ_F can be expressed as ℒ_F = ℒ_T × ℒ_M, where ℒ_T is the truncation likelihood of Y conditioning on the truncation time A,

ℒ_{T} = \prod_{i = 1}^{n} \frac{f {(Y_{i} | X_{i})}^{Δ i} S {(Y_{i} | X_{i})}^{1 - Δ i}}{S (A_{i} | X_{i})} = \prod_{i = 1}^{n} \frac{{λ (Y_{i}) exp (β^{'} X_{i})}^{Δ i} exp {- Λ (Y_{i}) exp (β^{'} X_{i})}}{exp {- Λ (A_{i}) exp (β^{'} X_{i})}},

and ℒ_M is the marginal likelihood of A,

ℒ_{M} = \prod_{i = 1}^{n} \frac{S (A_{i} | X_{i})}{μ (X_{i})} = \prod_{i = 1}^{n} \frac{exp {- Λ (A_{i}) exp (β^{'} X_{i})}}{\int_{0}^{\infty} exp {- Λ (u) exp (β^{'} X_{i})} d u} .

The truncation likelihood can be further reexpressed as ℒ_T = ℒ_P × ℒ_R, where ℒ_P is the partial likelihood,

ℒ_{P} = \prod_{i = 1}^{n} {\frac{exp (β^{'} X_{i})}{\sum_{j = 1}^{n} exp (β^{'} X_{j}) I (A_{j} \leq Y_{i} \leq Y_{j})}}^{Δ_{i}},

that only depends on β and ℒ_R is the residual likelihood that depends on both β and Λ. Wang, Brookmeyer, and Jewell (1993) proposed to estimate β by maximizing ℒ_P. Under arbitrary random truncation, the authors showed that the maximum partial likelihood estimator is as efficient as the maximizer of ℒ_T, as the residual likelihood ℒ_R is ancillary with respect to ℒ_T.

It is well-known that, for Cox model with right censored data, profiling out the baseline hazard function from the full likelihood yields Cox’s partial likelihood (Johansen 1983; Murphy and van der Vaart 2000). Interestingly, we have a similar result for the truncation likelihood with truncated survival data: profiling out Λ from the truncation likelihood ℒ_T also yields the partial likelihood ℒ_P.

2.3 The Composite Partial Likelihood Method: Complete Data

The maximum partial likelihood estimator β̂_P that maximizes ℒ_P can be very inefficient because the information about β in the marginal likelihood ℒ_M is not used in the estimation procedure. To better exploit the special structure of length-based sampling, various methods have been proposed in the literature. In the case where all study subjects are followed until the occurrence of the failure event, Wang (1996) considered weighting subjects by the inverse of the length of survival time. Because the assigned weight is proportional to the probability of a subject being sampled, the weighted risk set has the same probability structure as that that would be formed by an incidence cohort. In other words, the regression parameter can be estimated by maximizing a pseudo-likelihood function that is asymptotically equivalent to

\prod_{i = 1}^{n} {\frac{exp (β^{'} X_{i})}{\sum_{j = 1}^{n} exp (β^{'} X_{j}) T_{j}^{- 1} I (A_{j} \leq T_{i} \leq t_{j})}} .

This method, however, is not applicable in the presence of right censoring, because the weight for a censored survival time cannot be determined.

Write V = T − A; thus, V denotes the residual lifetime after enrollment. In the absence of censoring, it follows from Equation (1) that the pair of random variables (A, V) has an exchangeable joint density function f (a + ν | x) / µ (x) for a ≥ 0 and υ ≥ 0, and the common marginal density function is

f_{A} (t | x) = f v (t | x) = \frac{S (t | x)}{μ (x)} .

A simple way to exploit this exchangeability is by applying the composite conditional likelihood method (Arnold and Strauss 1988), as the conditional density function of V given A is identical to that of A given V. As the result, the truncation density of T = A + V conditioning on A is the same as the conditional density of T conditioning on V. Consider the composite conditional likelihood given by the product of the conditional likelihood of V given A and the conditional likelihood of A given V:

\prod_{i = 1}^{n} {\frac{f (T_{i} | X_{i})}{S (A_{i} | X_{i})} \times \frac{f (T_{i} | X_{i})}{S (V_{i} | X_{i})}} = \prod_{i = 1}^{n} [\frac{λ (T_{i}) exp (β^{'} X_{i}) exp {- Λ (T_{i}) exp (β^{'} X_{i})}}{exp {- Λ (A_{i}) exp (β^{'} X_{i})}} \times \frac{λ (T_{i}) exp (β^{'} X_{i}) exp {- Λ (T_{i}) exp (β^{'} X_{i})}}{exp {- Λ (V_{i}) exp (β^{'} X_{i})}}],

(2)

where V_i = T_i − A_i. Hence, Equation (2) is equivalent to the product of the truncation likelihood of T conditioning on A and the conditional likelihood of T conditioning on V.

Following the profile likelihood argument for the partial likelihood ℒ_P discussed in the previous section, we profile out Λ from Equation (2) to obtain the composite partial likelihood, up to a constant,

{\prod_{i = 1}^{n} [\frac{2 exp (β^{'} X_{i})}{\sum_{j = 1}^{n} exp (β^{'} X_{j}) {I (A_{j} \leq T_{i} \leq T_{j}) + I (V_{j} \leq T_{i} \leq T_{j})}}]}^{2} .

Thus, β can be estimated by solving

\frac{1}{n} \sum_{i = 1}^{n} [X_{i} - [\sum_{j = 1}^{n} X_{j} exp (β^{'} X_{j}) {I (A_{j} \leq T_{i} \leq T_{j}) + I (V_{j} \leq T_{i} \leq T_{j})}] / [\sum_{j = 1}^{n} exp (β^{'} X_{j}) {I (A_{j} \leq T_{i} \leq T_{j}) + I (V_{j} \leq T_{i} \leq T_{j})}]] = 0 .

It is easy to see that, in the absence of censoring, the proposed maximum composite partial likelihood estimator is equivalent to the maximum partial likelihood estimator applied to the pooled left-truncated survival data {(T_i, A_i, X_i), i = 1, …, n} and {(T_i, V_i, X_i), i = 1, …, n}, where in the augmented dataset V_i is treated as the truncation time for T_i.

2.4 The Composite Partial Likelihood Method: Censored Data

In the presence of right censoring, the residual survival time V in the prevalent cohort is subject to right censoring. Let Ṽ = min {T − A, C} be the observed residual lifetime, that is, Ṽ = V for uncensored subjects and Ṽ = C = Y − A for censored subjects. Intuitively, the joint distribution of A and Ṽ are not exchangeable, because the truncation time is always observed but the residual lifetime could be censored. Hence, it is necessary to modify the composite conditional likelihood (Equation (2)) to accommodate censoring.

To construct a composite conditional likelihood under right censoring, we consider the bivariate failure time (A, Ṽ) for subjects with uncensored failure time. The joint density function of (A, Ṽ) conditional on Δ = 1 is:

(A = a, \tilde{V} = υ) | (Δ = 1, X = x) \sim \frac{f (a + υ | x)}{μ (x)} \times \frac{G (υ | x)}{pr (Δ = 1 | X = x)}, a \geq 0, υ \geq 0.

Thus, it can be observed that A and Ṽ do not have an exchangeable joint distribution despite conditioning on Δ = 1.Moreover, straightforward algebra yields

\tilde{V} = υ | (Δ = 1, X = x) \sim \frac{S (υ | x)}{μ (x)} \times \frac{G (υ | x)}{pr (Δ = 1 | X = x)}, υ \geq 0 .

Hence, we have

A = a | (Δ = 1, \tilde{V} = υ, X = x) \sim \frac{f (a + υ | x)}{S (υ | x)}, a \geq 0, υ \geq 0.

Interestingly, given that the survival time is uncensored, A conditional on Ṽ = ν has the conditional density function

\frac{f (a + υ | x)}{S (υ | x)}, a \geq 0, υ \geq 0,

which is identical to the conditional density function of V = T − A given A in the prevalent population. Hence, for uncensored survival times, we can construct a composite likelihood based on the conditional density of A conditional on Ṽ as well as that of Ṽ conditional on A. On the other hand, for censored survival times, the conditional density of A conditional on the censored residual lifetime Ṽ involves the distribution of the censoring time; hence, it is difficult to be used directly in the construction of the composite likelihood.

We now construct a composite conditional likelihood for length-biased and right-censored survival data. Write $m = \sum_{i = 1}^{n} Δ_{i}$ ; thus, m is the total number of subjects with a uncensored failure time. For convenience, we assume the first m subjects are uncensored; that is, Δ_i = 1 for i = 1, …, m and Δ_i = 0 for i = m + 1, …, n. A composite conditional likelihood can be formulated as

\prod_{i = 1}^{m} {\frac{f (Y_{i} | X_{i})}{S (A_{i} | X_{i})} \times \frac{f (Y_{i} | X_{i})}{S ({\tilde{V}}_{i} | X_{i})}} \times \prod_{i = m + 1}^{n} \frac{S (Y_{i} | X_{i})}{S (A_{i} | X_{i})} = \prod_{i = 1}^{n} \frac{{λ (Y_{i}) exp {(β^{'} X_{i})}^{2 Δ_{i}} exp {- (1 + Δ_{i}) Λ (Y_{i}) exp (β^{'} X_{i})}}{exp [- {Λ (A_{i}) + Δ_{i} Λ ({\tilde{V}}_{i})} exp (β^{'} X_{i})]}

(3)

For any fixed β, the composite conditional likelihood is maximized by

\hat{Λ} (t, β) = \sum_{i = 1}^{n} \frac{2 Δ_{i} I (Y_{i} \leq t)}{\sum_{j = 1}^{n} exp (β^{'} X_{j}) {I (A_{j} \leq Y_{i} \leq Y_{j}) + Δ_{i} I ({\tilde{V}}_{j} \leq Y_{i} \leq Y_{j})}},

which is a Breslow-type estimator for Λ. Replacing Λ with Λ̂ in Equation (3) yields the composite partial likelihood, up to a constant,

ℒ = \prod_{i = 1}^{n} \times {\frac{2 exp (β^{'} X_{i})}{\sum_{j = 1}^{n} exp (β^{'} X_{j}) {I (A_{j} \leq Y_{i} \leq Y_{j}) + Δ_{j} I ({\tilde{V}}_{j} \leq Y_{i} \leq Y_{j})}}}^{2 Δ_{i}} .

Note that the composite partial likelihood only depends on β. Hence, the maximum composite partial likelihood estimator β̂ can be obtained directly by solving the (normalized) estimating equation

U (β) = \frac{1}{n} \sum_{i = 1}^{n} Δ_{i} [X_{i} - [\sum_{j = 1}^{n} X_{i} exp (β^{'} X_{j}) {I (A_{j} \leq Y_{i} \leq Y_{j}) + Δ_{j} I ({\tilde{V}}_{j} \leq Y_{i} \leq Y_{j})}] / [\sum_{j = 1}^{n} exp (β^{'} X_{j}) {I (A_{j} \leq Y_{i} \leq Y_{j}) + Δ_{j} I ({\tilde{V}}_{j} \leq Y_{i} \leq Y_{j})}]] = 0 .

It is easy to see that the proposed estimator is equivalent to the maximum partial likelihood estimator applied to the truncated survival data pooled from {(Yⁱ, Aⁱ, Xⁱ, Δ_i), i = 1, … , n} and {(Yⁱ, Ṽⁱ, Xⁱ, Δ_i = 1), i = 1, …, m}, where the augmented dataset is formed by the m uncensored subjects with Ṽⁱ being treated as the truncation time for Y_i.

In practice, we can treat the pooled-truncated survival data as clustered survival data, and use the robust variance estimator, which is available in standard statistical software, to estimate the standard errors of the estimated regression coefficients. Thus, the proposed estimator can be easily obtained using standard software. The maximum composite partial likelihood estimator is consistent and asymptotically normal. Its large-sample properties are summarized in the Appendix.

Finally, in the case where time-dependent covariates are involved in modeling the survival time distribution, theoretical justification for the composite partial likelihood method remains valid. The construction of the augmented dataset requires information about the covariate values in the entire time interval (Ṽⁱ, Y_i); however, in practice, it can be difficult to determine the covariate values in the subinterval (Ṽⁱ, A_i) when Ṽⁱ < A_i. A careful investigation on the extension of the proposed methodology to time-dependent covariates is warranted.

3. NUMERICAL STUDIES

We conducted simulation studies to examine the finite-sample performance of the proposed methodology. We set the sampling time ξ to be 100 and simulated the onset time of a stable disease, W⁰, from a uniform distribution over [0, 100]. The covariates $X_{1}^{0} and X_{2}^{0}$ were independently generated from the Bernoulli distribution with $\Pr (X_{1}^{0} = 1) = \Pr (X_{1}^{0} = 0) = 0.5$ and the standard normal distribution. The failure time T⁰ was generated from three Cox models with different hazard functions: (I) $2 exp (X_{1}^{0} + X_{2}^{0})$ , (II) $2 t exp (X_{1}^{0} + X_{2}^{0})$ , and (III) $0.5 {(t - 2)}^{2} exp (X_{1}^{0} + X_{2}^{0})$ . These models correspond to constant, increasing, and U-shape hazards. To form a prevalent cohort of sample size n, realizations of $(W^{0}, T^{0}, X_{1}^{0}, X_{2}^{0})$ were generated repeatedly until there are n subjects satisfying the sampling constraint W⁰ + T⁰ ≥ ξ. The time from enrollment ξ to loss to follow-up was generated from a uniform distribution so that the censoring rate was approximately 0%, 30%, and 50%. In each simulation, we generated 2000 datasets, each with a sample size of n = 400.

To compare the performance of the proposed maximum composite partial likelihood estimator to that of existing methods, we applied the proposed estimator, the popular maximum partial likelihood method for truncated survival data, the estimating equation method (EE-2) studied by Qin and Shen (2010), and the maximum pseudo-partial likelihood estimator studied by Tsai (2009). Note that the maximum partial likelihood estimator solves the partial score equation

U_{p} (β) = \sum_{i = 1}^{n} Δ_{i} [X_{i} - \frac{\sum_{j = 1}^{n} X_{j} exp (β^{'} X_{j}) I (Y_{j} \geq Y_{i} \geq A_{j})}{\sum_{j = 1}^{n} exp (β^{'} X_{j}) I (Y_{j} \geq Y_{i} \geq A_{j})}] = 0,

while Qin and Shen (2010) proposed to solve the estimating equation

U_{1} (β) = \sum_{i = 1}^{n} Δ_{i} [X_{i} - \frac{\sum_{j = 1}^{n} Δ_{j} X_{j} exp (β^{'} X_{j}) {{\overset{⌢}{w}}_{c} (Y_{j})}^{- 1} I (Y_{j} \geq Y_{i})}{\sum_{j = 1}^{n} Δ_{j} exp (β^{'} X_{j}) {{\overset{⌢}{w}}_{c} (Y_{j})}^{- 1} I (Y_{j} \geq Y_{i})}] = 0,

where ${\hat{w}}_{c} (t) = \int_{0}^{t} \hat{G} (u) d u$ and Ĝ(u) was the Kaplan-Meier estimator of the censoring time distribution. Moreover, Tsai’s estimator is equivalent to the solution of the estimating equation

U_{2} (β) = \sum_{i = 1}^{n} Δ_{i} [X_{i} - \frac{\sum_{j = 1}^{n} {\hat{W}}_{T} (Y_{j}, Δ_{j}, Y_{i}) X_{j} exp (β^{'} X_{j}) I (Y_{j} \geq Y_{i})}{\sum_{j = 1}^{n} {\hat{W}}_{T} (Y_{j}, Δ_{j}, Y_{i}) exp (β^{'} X_{j}) I (Y_{j} \geq Y_{i})}] = 0,

where

\hat{W_{T}} (t, δ, u) = δ {\frac{{\overset{⌢}{w}}_{c} (t) - {\overset{⌢}{w}}_{c} (t - u)}{{\overset{⌢}{w}}_{c} (t)}} + (1 - δ) {\frac{\hat{G} (t - u) - \hat{G} (t)}{1 - \hat{G} (t)}} .

Table 1 summarizes the empirical bias and the empirical standard error of these four estimators under different Cox models. The simulation result suggested that all four estimators are close to their estimands, as indicated by small empirical bias. As expected, the proposed estimator outperformed the maximum partial likelihood estimator in all scenarios, with the relative efficiency ranging from 1.08 to 1.73. The proposed estimator had smaller bootstrap standard errors than the estimating equation method studied by Qin and Shen (2010) under right-censoring, while having similar efficiency gains in the absence of censoring. Intuitively, this is because the estimation function U₁(β) uses only uncensored subjects to form the weighted risk set, while the proposed estimator uses both censored and uncensored subjects. Overall, the proposed estimator was as or slightly more efficient than the maximum pseudo-partial likelihood estimator in all scenarios. The weight function employed in the maximum pseudo-partial likelihood estimator (Tsai 2009) involves the estimated survival function of the censoring time, and hence, it can be unstable in later time intervals. Our estimator, on the other hand, does not involve estimation of the censoring distribution, hence, is in general more stable. The average of the 2000 robust standard error estimates was very close to the standard deviation of the 2000 estimated coefficients, suggesting satisfactory performance of the robust standard error estimator. Note that we applied Breslow’s method to handle ties in the failure times. In fact, in all simulation studies (results not shown), Breslow’s method always yielded smaller mean square errors than the other methods that handle ties. In summary, the proposed methodology was easy to perform in standard software, and enjoyed better or similar gains in efficiency over its competitors.

Table 1.

Summary of simulation studies

		PL		QS		PPL		CPL
Proportion censored	Coef	Bias	SE	Bias	SE	Bias	SE	Bias	SE	ASE	RE
Scenario I: λ₀(t) = 2
0%	β̂₁	8	132	3	98	4	98	5	100	98	1.75
	β̂₂	4	83	1	66	1	66	2	69	66	1.48
30%	β̂₁	7	157	13	141	0	126	5	120	120	1.70
	β̂₂	7	102	5	92	0	84	1	82	83	1.54
50%	β̂₁	10	171	12	153	−1	141	3	130	127	1.73
	β̂₂	5	108	4	100	−2	92	0	89	88	1.48
Scenario II: λ₀(t) = 2
0%	β̂₁	7	119	5	99	5	99	6	101	99	1.38
	β̂₂	6	77	4	65	4	65	5	67	65	1.29
30%	β̂₁	3	137	2	125	1	115	2	116	113	1.41
	β̂₂	6	85	4	81	3	75	4	76	74	1.25
50%	β̂₁	7	159	8	151	0	138	5	129	130	1.53
	β̂₂	6	102	8	95	−1	89	4	87	85	1.40
Scenario III: λ₀(t) = 5(t − 2)²
0%	β̂₁	5	111	3	104	3	104	5	106	106	1.09
	β̂₂	8	70	7	65	7	65	8	67	65	1.09
30%	β̂₁	10	135	7	131	7	124	9	127	124	1.12
	β̂₂	7	83	5	81	7	78	7	80	76	1.08
50%	β̂₁	9	156	10	157	7	144	9	146	143	1.16
	β̂₂	7	94	7	93	6	88	7	89	88	1.13

Open in a new tab

NOTE: β̂₁ and β̂₂ are the estimated regression coefficients, where the true parameter values are (1, 1). PL, the maximum partial likelihood estimator; QS, the estimator studied by Qin and Shen (2010); PPL, the maximum pseudo-partial likelihood estimator studied by Tsai (2009); CPL, the proposed maximum composite partial estimator; Bias and ES are the empirical bias (×1000) and empirical standard deviation (×1000) of 2000 regression parameter estimates; ASE is the averaged robust standard error estimate; RE is the empirical variance of the maximum partial likelihood estimator divided by that of the maximum composite partial likelihood estimator.

4. ANALYSIS OF THE CANADIAN STUDY OF HEALTH AND AGING

In this section, we analyze the prevalent cohort survival data from the CSHA by applying the proposed method as well as its competitors. After excluding those with missing date of onset or missing dementia subtype classification and those who survived more than 20 years after dementia onset (hence, were considered unlikely to have dementia; see Asgharian, M’Lan, and Wolfson 2002), a total of 807 dementia patients were included in our analysis; among them, 249 (31%) had possible Alzheimer, 388 (48%) had probable Alzheimer, and 170 (21%) had vascular dementia. In May 1997, 627 deaths (78%) were recorded; among them, 189 had a diagnosis of possible Alzheimer, 302 had probable Alzheimer, and 136 had vascular dementia at enrollment. The goal of our analysis was to evaluate the impact of different subtypes of dementia on life expectancy using the prevalent cases whose vital status were determined in May 1997.

4.1 Assessment of the Stable Disease Condition

To assess whether the incidence of each dementia subtype is constant over time, Asgharian, Wolfson, and Zhang (2006) proposed to compare the distribution of the truncation time A and that of the residual survival time V. The authors proved that, under mild conditions, A and V have identical distributions if and only if the incidence of disease is constant over time. Denote by S_A and S_V the survival functions of A and V. Let Ŝ_A(t) be the empirical survival function obtained by using {A₁, …, A_n}, and let Ŝ_V (t) be the Kaplan-Meier curve obtained by using {(Ṽ¹, Δ₁), …, (Ṽⁿ, Δ_n)}. Hence, if the stationarity assumption with respect to disease incidence holds, the two survival curves are expected to be close to each other. The visual inspection, however, can be very arbitrary. In what follows, we describe a graphical assessment method and its corresponding numerical test based on the equality of S_A and S_V.

Step 1.
Calculate the Kolmogorov-Smirnov test statistic D = sup_t∈[0,τ] | Ŝ_A(t) − Ŝ_V(t) | that compares the Kaplan-Meier estimates of S_A and S_V.
Step 2.
Apply the E-M algorithm proposed by Vardi (1989) to obtain the nonparametric maximum likelihood estimate of the survival time distribution under the stationarity condition. Let f̂ and f̂ be the estimated density and survival time function.
Step 3.
Approximate the distribution of D under the null via simulation:
1. For the ith individual, simulate a total survival time $T_{i}^{*}$ from the estimated conditional density of T given the observed truncation time A_i; that is, simulate $T_{i}^{*}$ from the discrete density function f̂ (t)/Ŝ (A_i) I (t ≥ A_i). Next, generate a censoring time $C_{i}^{*}$ independently from the estimated censoring distribution using the Kaplan-Meier estimator. Thus, the observed residual lifetime and the the censoring indicator simulated for the ith person are ${\tilde{V}}_{i}^{*} = min (T_{i}^{*} - A_{i}, C_{i}^{*}) and Δ_{i}^{*} = I (T_{i}^{*} - A_{i} \leq C_{i}^{*})$ .
2. Calculate the Kolmogorov-Smirnov test statistic $D^{*} = {sup}_{t \in [0, τ]} | {\hat{S}}_{A} (t) - {\hat{S}}_{V}^{*} (t) |, where S_{V}^{*} (t)$ is the Kaplan-Meier estimator based on the simulated data $({\tilde{V}}_{i}^{*}, Δ_{i}^{*}), i = 1, \dots, n$ .
Step 4.
Repeat Step 3 B times, and let $D_{1}^{*}, \dots D_{B}^{*}$ be the Kolmogorov-Smirnov test statistic in these B repetitions. Thus, an approximate p-value for the Kolmogorov-Smirnov test D = sup_t∈[0,τ] | Ŝ_A(t) − Ŝ_V(t) | can be estimated by the empirical probability that the value of D exceeds the observed value of $D_{1}^{*}, \dots D_{B}^{*}$ through simulation. To assess how unusual the observed residual lifetime distribution is under the stationarity condition, one may plot Ŝ_V along with the Kaplan-Meier estimate ${\hat{S}}_{V}^{*}$ from a few, say 20, simulated datasets.

We have applied the proposed graphical and numerical tests to data simulated from the models described in Section 3. Overall, the Type I error rate is very close to the predetermined nominal level (0.05) in all scenarios with 2000 repetitions, which supports the validity of the proposed test. We thus applied the proposed tests to data from each dementia subtype group. Figure 2 shows that, within each diagnostic group, the estimated curve of S_V falls well inside the range determined by the simulated processes. The p-values for the Kolmogorov-Smirnov test are 0.20, 0.97, and 0.62, respectively, for possible Alzheimer, probable Alzheimer, and vascular dementia with 2000 repetitions. Thus, there is little evidence to dispute the stationarity condition required by the proposed composite partial likelihood method.

Estimated survival functions for V. Plots of the estimated survival functions for the residual lifetime in the prevalent cohort. The bold curve corresponds to the Kaplan-Meier estimator using the observed data and the gray curves are estimated survival curves from 20 (out of 2000) datasets that were simulated under the null.

4.2 Nonparametric Estimation Within Subgroups

Ignoring the prevalent sampling design, the median survival times estimated by naive use of the Kaplan-Meier curve were 7.1, 6.6, and 5.8 years, respectively, for patients with possible Alzheimer, probable Alzheimer, and vascular dementia. On the other hand, the unbiased but inefficient truncation product-limit estimator estimated median survival times of 3.5, 3.6, and 3.3 years for the three dementia subtypes. The differences between the Kaplan-Meier estimates and the truncation product-limit estimates illustrate that failing to adjust for sampling bias in the prevalent cohort leads to substantial overestimation of survival in patients with dementia.

We also applied the truncation product-limit estimator to the augmented dataset described in Section 2.4 to estimate the survival time distribution under the stationarity assumption with respect to disease incidence. Figure 3 shows the estimated survival curves for different dementia subtypes. It can be observed that individuals with vascular dementia had the worst survival, while the estimated survival curves for the other two subtypes were very close to each other. The estimated median survival times for the three dementia subtypes were 4.0 (possible Alzheimer; 95% bootstrap confidence interval, 3.3–4.7), 3.7 (probable Alzheimer; 95% bootstrap confidence interval, 2.8–4.5), and 3.1 years (vascular dementia; 95% bootstrap confidence interval, 2.3–4.2). These estimates were not significantly different than those given by the truncation product-limit estimator, as their corresponding 95% bootstrap confidence intervals contained the estimated median survival time obtained by applying the truncation product-limit estimator.

Estimated survival curves for possible Alzheimer (dashed line), probable Alzheimer (solid line), and vascular dementia (dotted line) given by Vardi’s estimator.

For policy planners, it is of interest to know the life expectancy of a patient after being admitted into an assisted living facility. Based on the estimated survival curves, the median residual survival times for patients who survive longer than 5 years after onset of dementia are 2.7 years (possible Alzheimer), 2.8 years (probable Alzheimer), and 2.4 years (vascular dementia). Similarly, the estimated median residual survival times are 2.6 years (possible Alzheimer), 2.1 years (probable Alzheimer), and 1.1 years (vascular dementia) given that these patients survive longer than 8 years after onset of dementia.

4.3 Comparisons of Different Dementia Subtypes

To compare the effects of dementia subtypes on mortality, we fit a Cox proportional hazards model with indicators of possible Alzheimer and vascular dementia as covariates. The estimated regression coefficients obtained by applying different methods were summarized in Table 2. By applying the proposed methodology, it was estimated that, as compared to patients with possible Alzheimer, the risk of death increased by 12% among those with probable Alzheimer and by 24% among those with vascular dementia. Thus, vascular dementia had the worst prognosis of all causes. Note that our results are different than what were reported by Qin and Shen (2010), because their analysis included those who survived longer than 20 years after dementia onset. It could be observed that both the partial likelihood method and the pseudo-partial likelihood method (Tsai 2009) estimated nonsignificant effects of possible Alzheimer and vascular dementia, as the corresponding 95% bootstrap confidence intervals contain 0. The proposed method yielded similar estimates as the estimation equation method proposed by Qin and Shen (2010), and both estimated a significantly higher risk of death in patients with vascular dementia. The proposed composite partial method had the smallest bootstrap standard errors when compared with other estimators. For both β₁ and β₂, the variance ratio for the competitors to the proposed method is always at least 1.45. This suggests that if a competitor method were used in lieu of the proposed method, the CHSA would need to recruit at least 510 more subjects to achieve the same precision, thus highlighting the practical value of employing a more efficient estimator.

Table 2.

Estimated regression coefficients of the Cox model for the CSHA study

	β₁, Probable Alzheimer			β₂, Vascular dementia
Method	Coef	SE	95% CI	Estimate	SE	95% CI
PL	0.030	0.087	(−0.149, 0.195)	0.113	0.109	(−0.108, 0.314)
QS	0.159	0.093	(−0.023, 0.348)	0.260	0.119	(0.044, 0.522)
PPL	0.064	0.082	(−0.093, 0.223)	0.161	0.106	(−0.035, 0.365)
CPL	0.117	0.068	(−0.011, 0.250)	0.214	0.087	(0.052, 0.381)

Open in a new tab

NOTE: PL, the maximum partial likelihood estimator; QS, the estimator studied by Qin and Shen (2010); PPL, the maximum pseudo-partial likelihood estimator studied by Tsai (2009); CPL, the proposed-maximum composite partial estimator; Coef and SE are the estimated coefficient and the empirical standard deviation of 2000 regression parameter estimates; 95% CI is the 95% bootstrap confidence interval given by the 2.5th and 97.5th percentiles of the 2000 estimates.

5. DISCUSSION

Statistical methods for left-truncated survival data have been widely applied to study important problems such as estimating the incubation period from human immunodeficiency virus (HIV) infection to the development of acquired immunodeficiency syndrome (AIDS; Brookmeyer and Gail 1987) as well as evaluating the impact of deferring initiating of the highly active antiretroviral therapies (HAART) treatment on AIDS and death (When To Start Consortium 2009). In these applications, the stationarity assumption with respect to disease incidence is likely to be violated, because HIV infection rate increased rapidly in the 1980s and then plateaued in the early 1990s due to extensive prevention and education efforts. Although the composite partial likelihood estimators yield invalid inferential results when the incidence of the disease is not constant over time, the proposed methods can be applied to a transformed dataset to obtain unbiased estimates when the information about the underlying truncation time distribution, denoted by H, is available from other sources. In fact, it is easy to verify that, by applying the monotone transformation H, the transformed survival time H(T⁰) is left truncated by a uniformly distributed random variable. Because the Cox model is invariant under monotone transformation, the regression coefficients can be consistently estimated by applying the proposed composite partial likelihood estimator to the transformed data {H(A_i), H(Y_i), Δⁱ, Xⁱ, i = 1, …, n}.

In the Appendix, we established the strong consistency of Λ̂ (t) for t ∈ [0, τ], where τ is the maximal support of Y = min(T, A + C). In fact, despite the fact that the censoring time C may have a smaller maximal support than T⁰, under length-biased sampling τ equals the maximal support of T⁰, denoted by τ₁. Intuitively, this is because the truncation random variable A has the same maximal support as T⁰. Heuristically, suppose pr(C > 0) > 0, then for any ε > 0 we have

Pr {max_{1 \leq i \leq n} Δ_{i} Y_{i} < τ_{1} - ε} = {[1 - E {\int_{τ_{1} - ε}^{τ_{1}} \frac{f (u | x)}{μ (x)} w_{c} (u | x) d u}]}^{n} > 0,

where $w_{c} (t | x) = \int_{0}^{t} G (u | x) d u$ , and it follows that max_1≤i≤n ΔⁱY_i → τ¹ as n → ∞. Thus, we can show that τ = τ ₁. An important implication of this result is that under length-biased sampling, the mean survival time µ(x) can be estimated consistently. This is in contrary to the conventional survival analysis where the mean survival time is usually not estimable due to right censoring.

The motivation for the use of composite likelihood is usually to avoid the computational challenges in evaluating and maximizing the likelihood for a possibly high-dimensional response vector. In this article, we applied the composite likelihood approach to improve the efficiency of the popular partial likelihood method by incorporating additional information in the distribution of the observed truncation time. Our simulation study showed that proposed estimator performed better or as well as the other alternative methods for survival data under length-biased sampling. We also observed an efficiency gain of 57% and higher for the composite partial likelihood estimator over the partial likelihood estimator in the data analysis of the CSHA.

Although we focus on semiparametric estimation of the Cox proportional hazards model in this article, a simple and efficient nonparametric estimator for the baseline hazard function can be easily derived by applying the truncation product-limit estimator (Wang, Jewell, and Tsai 1986) to the augmented dataset described above. Specifically, the new nonparametric estimator can be obtained by replacing the total mass of the risk set at time t, $\sum_{i = 1}^{n} I (A_{i} \leq t \leq Y_{i})$ , in the truncation product-limit estimator with $\bar{ϕ} (\hat{β}) = n^{- 1} \sum_{i = 1}^{n} {\hat{ϕ}}_{i} (\hat{β})$ . Similar to the semiparametric maximum composite partial likelihood estimator, the nonparametric estimator enjoys the efficiency improvement over the product-limit estimator. In summary, the proposed methodology not only enjoys efficiency gains over its competitors but also can be easily performed using standard statistical software; hence, it is expected to be an attractive method for statistical practitioners.

Supplementary Material

R script

NIHMS460941-supplement-R_script.r^{(1.3KB, r)}

Acknowledgments

The authors thank Professors Ian McDowell, Masoud Asgharian, and Christina Wolfson for kindly sharing the Canadian Study of Health and Aging data. The core study was funded by the National Health Research and Development Program (NHRDP) of Health Canada Project 6606-3954-MC(S). Additional funding was provided by Pfizer Canada Incorporated through the Medical Research Council/Pharmaceutical Manufacturers Association of Canada Health Activity Program, NHRDP Project 6603-1417-302(R), Bayer Incorporated, and the British Columbia Health Research Foundation Projects 38 (93-2) and 34 (96-1). The authors also thank the Associate Editor, the referee, Dr Dean Follmann, and Dr Michael Proschan for their comments that improved the presentation of this article.

APPENDIX: LARGE-SAMPLE PROPERTIES

To simplify the discussion, we impose the following regularity conditions. We assume that X is bounded, and the true regression parameter β₀ lies in a compact set ℬ. Let S(t) and G(t) be the marginal survival functions of T⁰ and C, respectively. We assume that pr(C > 0) > 0 and that both S(t) and G(t) are absolutely continuous for t ∈ [0, τ], where τ = sup{t : pr(Y ≥ t) > 0} is the maximal support of Y. Let F^u(t) = pr(Y ≤ t, Δ = 1) be the distribution function of uncensored failure times. Write N_i (u) = Δ_iI(Y_i ≤ u) and R_j (u) = 2⁻¹{I (A_j ≤ u ≤ Y_j) + Δ_jI (Ṽ_j ≤ u ≤ Y_j)}. Define $\sum_{i = 1}^{n} R_{i} (t) = 2^{- 1} \sum_{i = 1}^{n} {I (A_{i} \leq t \leq Y_{i}) + Δ_{i} I ({\tilde{V}}_{i} \leq t \leq Y_{i})}$ for k = 0,1,2, and let s^(k)(t, β) = E{S^(k) (t, β)}. Thus, U can be reexpressed as

U (β) = \frac{1}{n} \sum_{i = 1}^{n} \int_{0}^{τ} {X_{i} - \frac{S^{(1)} (u, β)}{S^{(0)} (u, β)}} d N_{i} (u) .

It is easy to see that, for each fixed β, U defines a functional of the empirical processes $S^{(k)} (t, β) = n^{- 1} \sum_{i = 1}^{n} X_{i}^{\otimes k} exp (β^{'} X_{i}) R_{i} (t)$ . By the strong law of large numbers, the four empirical processes converge almost surely to their limits uniformly as n → ∞. Moreover, the functional defined by U is continuous with respect to the supremum norm under regularity conditions. Hence, it follows from the extended strong law of large numbers as given in Appendix III by Andersen and Gill (1982) that sup_β∈ℬ | U(β) − Ũ(β) | → 0 almost surely as n → ∞, with

\tilde{U} (β) = E {X_{1} N_{1} (τ)} - \int_{0}^{τ} \frac{s^{(1)} (u, β)}{s^{(0)} (u, β)} d F^{u} (u) .

Under length-biased sampling, we have E{dN₁(t) | X₁ = x} = f (t | x) µ (x)⁻¹ w_c(t | x) dt and

E [R_{1} (t) | X_{1} = x] = \frac{1}{2} {p r (A_{1} \leq t \leq Y_{1} | X_{1} = x) + p r (Δ_{1} = 1, {\tilde{V}}_{1} \leq t \leq Y_{1} | X_{1} = x)} = \frac{1}{2} {\frac{S (t | x)}{μ (x)} w_{c} (t | x) + \frac{S (t | x)}{μ (x)} w_{c} (t | x)} = \frac{S (t | x)}{μ (x)} w_{c} (t | x),

where $S^{(0)} (u, β), S^{(1)} (u, β), n^{- 1} \sum_{i = 1}^{n} N_{i} (u), {and n}^{- 1} \sum_{i = 1}^{n} X_{i} N_{i} (τ)$ . It can be verified that Ũ(β₀) = 0 if β₀ is the true regression parameter.

Define the stochastic process $w_{c} (t | x) = \int_{0}^{t} G (u | x) d u$ R_i (u)d Λ(u) and

ϕ_{i} (β) = \int_{0}^{τ} {X_{i} - \frac{s^{(1)} (u, β)}{s^{(0)} (u, β)}} d M_{i} (u, β) .

Let Γ(β) = −∂U (β)/∂β and Γ̃ (β) = −∂Ũ(β)/∂β. In light of the functional form representation of U, the large sample properties are studied by applying the functional delta method (Andersen et al. 1993, chap. II.8). The results are summarized in the following theorem.

Theorem 1. Suppose that X is bounded and the true regression parameter β₀ lies in a compact set ℬ. Then the maximum composite likelihood estimator β̂ converges to β₀ almost surely when n→∞. Provided Γ̃ (β₀) is nonsingular, n^1/2(β̂ − β₀) is asymptotically normal with mean 0 and a variance–covariance matrix given by Ω̃ (β₀) = Γ̃ (β₀)⁻¹Σ̃ (β₀)Γ̃(β₀)⁻¹, , where Σ̃ (β₀) = E{φ₁(β₀)′φ₁(β₀)}.

Proof. We establish the consistency of β̂ as follows. Consider a log pseudo-likelihood function

K (β) = \frac{1}{n} \sum_{i = 1}^{n} \int_{0}^{τ} [({β - β_{0})}^{'} X_{i} - log {\frac{S^{(0)} (β, u)}{S^{(0)} (β_{0}, u)}}] d N_{i} (u),

and define the function

\tilde{K} (β) = {(β - β_{0})}^{'} \cdot E {X_{1} N_{1} (τ)} - \int_{0}^{τ} log {\frac{S^{(0)} (β, u)}{S^{(0)} (β_{0}, u)}} d F^{u} (u) .

It is easy to verify that U(β) and Ũ(β) are derivatives of K(β) and K̃ (β), with K(β₀) = K̃ (β₀) = 0. Applying Taylor expansion, we have K(β) − K̃ (β) = (β − β₀)′{U(β^*) − Ũ(β^*)}, where β^* lies on the line segment between β and β₀. Hence, it follows that sup_β∈ℬ | U(β) − Ũ(β) |→ 0 almost surely and that sup sup_β∈ℬ | K(β) − K̃ (β) |→ 0 almost surely as n→∞.

Define Γ (β) = dU(β)/dβ and Γ̃ (β) = dŨ(β)/dβ, that is,

Γ (β) = \int_{0}^{τ} [- \frac{S^{(2)} (u, β)}{S^{(0)} (u, β)} + {\frac{S^{(1)} (u, β)}{S^{(0)} (u, β)}}^{\otimes 2}] {\frac{1}{n} \sum_{i = 1}^{n} d N_{i} (u)}

and

\tilde{Γ} (β) = \int_{0}^{τ} [- \frac{s^{(2)} (u, β)}{s^{(0)} (u, β)} + {\frac{s^{(1)} (u, β)}{s^{(0)} (u, β)}}^{\otimes 2}] d F^{u} (u) .

Applying the Cauchy-Schwartz inequality, we can show that Γ and Γ̃ are both negative definite, and it follows that K and K̃ are concave. Arguing as in the proof of Lenglart’s theorem (Andersen and Gill 1982, Appendix II), we can show that the unique maximum of K, β̂, converges in probability to the unique maximum of K̃, that is, β₀. Thus, we establish the consistency of β̂.

Next, we show the asymptotic normality of U(β) for a fixed β. The regularity conditions imply that s⁽⁰⁾(u, β) is bounded away from zero for u ∈ [0, τ] and β ∈ ℬ. Hence, the functional defined by U is compactly differentiable with respect to the supremum norm. The asymptotic normality of the four empirical processes in U can be easily established, as they are sums of independent and identically distributed random processes. Therefore, by the functional delta method (Andersen et al. 1993, chap. II.8), we can show that $M_{i} (t, β) = N_{i} (t) - \int_{0}^{t} exp (β^{'} X_{i})$ . It follows the classic central limit theorem that n^1/2U(β) is asymptotically normal with mean zero and variance–covariance matrix Σ̃(β) = E{φ₁(β)′φ₁(β).

Applying the Taylor series expansion, we have U(β̂) − U(β₀) = Γ (β^*)(β̂ − β₀), where β^* lies on the line segment between β̂ and β₀. Arguing as before, one can show that sup_β∈ℬ | Γ (β) − Γ̃ (β) |→ 0. Because β̂ is consistent for β₀ and Γ̃ is continuous at β₀ under the regularity conditions, we have Γ (β); hence, Γ (β^*) converges to Γ̃ (β₀) almost surely. By Slutsky’s theorem, $U (β) = n^{- 1} \sum_{i = 1}^{n} ϕ_{i} (β) + o_{p} (n^{- 1 / 2})$ converges to a mean zero normal distribution with a variance–covariance matrix Γ̃(β₀)⁻¹Σ̃(β₀)Γ̃(β₀)⁻¹. Thus, we prove Theorem 1.

The Breslow-type estimator Λ̂ (t, β) for the baseline cumulative hazard function can be reexpressed as $\sqrt{n} (\hat{β} - β_{0})$ . It is easy to see that Λ̂ (t, β) is a continuous functional of two empirical processes, S⁽⁰⁾(u, β) and $\hat{Λ} (t, β) = \int_{0}^{t} {S^{(0)} (u, β)}^{- 1} d {n^{- 1} \sum_{i = 1}^{n} N_{i} (u)}$ , with respect to the supremum norm. The almost surely convergence of the two processes implies sup_{t∈[0,τ],β∈ℬ} | Λ̂ (t, β) − Λ (t, β) |→ 0, where $n^{- 1} \sum_{i = 1}^{n} N_{i} (u)$ . Then the strong consistency of Λ̂ (t) ≡ Λ̂ (t, β̂) for Λ (t) = Λ (t, β₀) follows the strong consistency of β̂ for β₀.

Define $Λ (t, β) = \int_{0}^{t} {s^{(0)} (u, β)}^{- 1} F^{u} (u)$ and let

{\overset{⌢}{ϕ}}_{i} (β) = \int_{0}^{τ} {X_{i} - \frac{S^{(1)} (u, β)}{S^{(0)} (u, β)}} d {\hat{M}}_{i} (u, β) .

It can be shown that Σ̃ (β₀) is consistently estimated by ${\hat{M}}_{i} (t, β) = N_{i} (t) - \int_{0}^{t} exp (β^{'} X_{i}) R_{i} (u) d \hat{Λ} (u, β)$ , where $\hat{Σ} (\hat{β}) = n^{- 1} \sum_{i = 1}^{n} {{\hat{ϕ}}_{i} (\hat{β})} - \bar{ϕ} {(\hat{β})}^{'} {{\hat{ϕ}}_{i} (\hat{β}) - \bar{ϕ} (\hat{β})}$ . Thus, the asymptotic variance Ω̃(β₀) in Theorem 1 can be consistently estimated by Ω̂(β̂) = Γ (β̂)⁻¹Σ̂ (β̂)Γ (β̂)⁻¹.

Footnotes

SUPPLEMENTARY MATERIALS

R code: R script to obtain the maximum composite partial likelihood estimator is provided in the file cm p partial. R available from the JASA web site.

Contributor Information

Chiung-Yu Huang, Mathematical Statistician, Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892 (huangchi@niaid.nih.gov).

Jing Qin, Mathematical Statistician, Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892 (jingqin@niaid.nih.gov).

REFERENCES

Andersen PK, Borgan O, Gill RD, Keiding N. Statistical Models Based on Counting Processes. New York: Springer-Verlag; 1993. [947,955,956] [Google Scholar]
2.Andersen PK, Gill RD. Cox’s Regression Model for Counting Processes: A Large Sample Study. The Annals of Statistics. 1982;10:1100–1120. [955,956] [Google Scholar]
Arnold BC, Strauss D. Bivariate DistributionsWith Exponential Conditionals. Journal of the American Statistical Association. 1988;83:522–527. [947,949] [Google Scholar]
Asgharian M, M’Lan CE, Wolfson DB. Length-Biased Sampling With Right Censoring: An Unconditional Approach. Journal of the American Statistical Association. 2002;97:201–209. [947,952] [Google Scholar]
Asgharian M, Wolfson DB. Asymptotic Behavior of the Unconditional NPMLE of the Length-Biased Survivor Function From Right Censored Prevalent Cohort Data. The Annals of Statistics. 2005;33:2109–2131. [947] [Google Scholar]
Asgharian M, Wolfson DB, Zhang X. Checking Stationarity of the Incidence Rate Using Prevalent Cohort Survival Data. Statistics in Medicine. 2006;25:1751–1767. doi: 10.1002/sim.2326. [952] [DOI] [PubMed] [Google Scholar]
Besag JE. Spatial Interaction and the Statistical Analysis of Lattice Systems. Journal of the Royal Statistical Society, Series B . 1974;34:192–236. [946,947] [Google Scholar]
Brookmeyer R, Gail MH. Biases in Prevalent Cohorts. Biometrics. 1987;43:739–749. [954] [PubMed] [Google Scholar]
Cox DR. Regression Models and Life-Tables. Journal of the Royal Statistical Society, Series B . 1972;34:187–220. (with discussion), [948] [Google Scholar]
de Uña-Álvarez J. Nonparametric Estimation Under Length-Biased Sampling and Type I Censoring: A Moment Based Approach. The Annals of the Institute of Statistical Mathematics. 2004;56:667–681. [947] [Google Scholar]
Ferri CP, Prince M, Brayne C, Brodaty H, Fratiglioni L, Ganguli M, Kathleen Hall KH, Hendrie H, Huang Y, Jorm A, Mathers C, Menezes PR, Rimmer E, Scazufca M for Alzheimers Disease International. Global Prevalence of Dementia: A Delphi Consensus Study. Lancet. 2005;366:2112–2117. doi: 10.1016/S0140-6736(05)67889-0. [946] [DOI] [PMC free article] [PubMed] [Google Scholar]
Johansen S. An Extension of Cox’s Regression Model. International Statistical Review. 1983;51:165–174. [949] [Google Scholar]
Kalbfleisch JD, Lawless JF. Regression Models for Right Truncated DataWith Applications to AIDS Incubation Times and Reporting Bias. Statistica Sinica. 1991;1:19–32. [947,948] [Google Scholar]
Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. 2nd ed. New York: Wiley; 2002. [947] [Google Scholar]
Lancaster T. The Econometric Analysis of Transition Data. Cambridge: Cambridge University Press; 1990. [948] [Google Scholar]
Lindsay BG. Composite Likelihood Methods. Contemporary Mathematics. 1988;80:221–239. [947] [Google Scholar]
Luo X, Tsai WY. Nonparametric Estimation for Right-Censored Length-Biased Data: A Pseudo-Partial Likelihood Approach. Biometrika. 2009;96:873–886. [947] [Google Scholar]
McDowell I, Hill G, Lindsay J. An Overview of the Canadian Study of Health and Aging. International Psychogeriatrics. 2005;13:7–18. doi: 10.1017/s1041610202007949. [946] [DOI] [PubMed] [Google Scholar]
Murphy SA, van derVaart AW. On Profile Likelihood. Journal of the American Statistical Association. 2000;95:449–465. [949] [Google Scholar]
Qin J, Shen Y. Statistical Methods for Analyzing Right- Censored Length-Biased Data Under Cox Model. Biometrics. 2010;66:382–392. doi: 10.1111/j.1541-0420.2009.01287.x. [947,951,953] [DOI] [PMC free article] [PubMed] [Google Scholar]
Simon R. Length Biased Sampling in Etiologic Studies. American Journal of Epidemiology. 1980;111:444–452. doi: 10.1093/oxfordjournals.aje.a112920. [947] [DOI] [PubMed] [Google Scholar]
Tsai W-Y. Pseudo-Partial Likelihood for ProportionalHazards Models With Biased-Sampling Data. Biometrika. 2009;96:601–615. doi: 10.1093/biomet/asp026. [947,951,953] [DOI] [PMC free article] [PubMed] [Google Scholar]
Vardi Y. Multiplicative Censoring, Renewal Processes, Deconvolution and Decreasing Density: Nonparametric Estimation. Biometrika. 1989;76:751–761. [947,952] [Google Scholar]
Vardi Y, Zhang C-H. Large Sample Study of Empirical Distributions in a Random-Multiplicative Censoring Model. The Annals of Statistics. 1992;20:1022–1039. [947] [Google Scholar]
Varin C. On Composite Marginal Likelihoods. Advances in Statistical Analysis. 2008;92:1–28. [947] [Google Scholar]
Varin C, Reid N, FirthReid D. An Overviewof Composite Likelihood Methods. Statistica Sinica. 2011;21:5–42. [947] [Google Scholar]
Wang M-C. Nonparametric Estimation From Cross-Sectional Survival Data. Journal of the American Statistical Association. 1991;86:130–143. [947] [Google Scholar]
Wang M-C. Hazards Regression Analysis for Length-Biased Data. Biometrika. 1996;83:343–354. [949] [Google Scholar]
Wang M-C, Brookmeyer R, Jewell N. Statistical Models for Prevalent Cohort Data. Biometrics. 1993;49:1–11. [947,948,949] [PubMed] [Google Scholar]
Wang M-C, Jewell N, Tsai W-Y. Asymptotic Properties of the Product Limit Estimate UnderRandom Truncation. The Annals of Statistics. 1986;14:1597–1605. [955] [Google Scholar]
When To Start Consortium. Timing of Initiation of Antiretroviral Therapy in AIDS-Free HIV-1-Infected Patients: A Collaborative Analysis of 18 HIV Cohort Studies. Lancet. 2009;373:1352–1363. doi: 10.1016/S0140-6736(09)60612-7. [954] [DOI] [PMC free article] [PubMed] [Google Scholar]
Winter BB, Foldes A. A Product-Limit Estimator for Use With Length-Biased Data. Canadian Journal of Statistics. 1988;16:337–355. [947] [Google Scholar]
Wolfson C, Wolfson DB, Asgharian M, M’Lan CE. A Reevaluation of the Duration of Survival After the Onset of Dementia. New England Journal of Medicine. 2001;344:1111–1116. doi: 10.1056/NEJM200104123441501. [947] [DOI] [PubMed] [Google Scholar]
Zelen M, Feinleib M. On the Theory of Screening for Chronic Diseases. Biometrika. 1969;56:601–614. [947] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

R script

NIHMS460941-supplement-R_script.r^{(1.3KB, r)}

[R1] Andersen PK, Borgan O, Gill RD, Keiding N. Statistical Models Based on Counting Processes. New York: Springer-Verlag; 1993. [947,955,956] [Google Scholar]

[R2] 2.Andersen PK, Gill RD. Cox’s Regression Model for Counting Processes: A Large Sample Study. The Annals of Statistics. 1982;10:1100–1120. [955,956] [Google Scholar]

[R3] Arnold BC, Strauss D. Bivariate DistributionsWith Exponential Conditionals. Journal of the American Statistical Association. 1988;83:522–527. [947,949] [Google Scholar]

[R4] Asgharian M, M’Lan CE, Wolfson DB. Length-Biased Sampling With Right Censoring: An Unconditional Approach. Journal of the American Statistical Association. 2002;97:201–209. [947,952] [Google Scholar]

[R5] Asgharian M, Wolfson DB. Asymptotic Behavior of the Unconditional NPMLE of the Length-Biased Survivor Function From Right Censored Prevalent Cohort Data. The Annals of Statistics. 2005;33:2109–2131. [947] [Google Scholar]

[R6] Asgharian M, Wolfson DB, Zhang X. Checking Stationarity of the Incidence Rate Using Prevalent Cohort Survival Data. Statistics in Medicine. 2006;25:1751–1767. doi: 10.1002/sim.2326. [952] [DOI] [PubMed] [Google Scholar]

[R7] Besag JE. Spatial Interaction and the Statistical Analysis of Lattice Systems. Journal of the Royal Statistical Society, Series B . 1974;34:192–236. [946,947] [Google Scholar]

[R8] Brookmeyer R, Gail MH. Biases in Prevalent Cohorts. Biometrics. 1987;43:739–749. [954] [PubMed] [Google Scholar]

[R9] Cox DR. Regression Models and Life-Tables. Journal of the Royal Statistical Society, Series B . 1972;34:187–220. (with discussion), [948] [Google Scholar]

[R10] de Uña-Álvarez J. Nonparametric Estimation Under Length-Biased Sampling and Type I Censoring: A Moment Based Approach. The Annals of the Institute of Statistical Mathematics. 2004;56:667–681. [947] [Google Scholar]

[R11] Ferri CP, Prince M, Brayne C, Brodaty H, Fratiglioni L, Ganguli M, Kathleen Hall KH, Hendrie H, Huang Y, Jorm A, Mathers C, Menezes PR, Rimmer E, Scazufca M for Alzheimers Disease International. Global Prevalence of Dementia: A Delphi Consensus Study. Lancet. 2005;366:2112–2117. doi: 10.1016/S0140-6736(05)67889-0. [946] [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] Johansen S. An Extension of Cox’s Regression Model. International Statistical Review. 1983;51:165–174. [949] [Google Scholar]

[R13] Kalbfleisch JD, Lawless JF. Regression Models for Right Truncated DataWith Applications to AIDS Incubation Times and Reporting Bias. Statistica Sinica. 1991;1:19–32. [947,948] [Google Scholar]

[R14] Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. 2nd ed. New York: Wiley; 2002. [947] [Google Scholar]

[R15] Lancaster T. The Econometric Analysis of Transition Data. Cambridge: Cambridge University Press; 1990. [948] [Google Scholar]

[R16] Lindsay BG. Composite Likelihood Methods. Contemporary Mathematics. 1988;80:221–239. [947] [Google Scholar]

[R17] Luo X, Tsai WY. Nonparametric Estimation for Right-Censored Length-Biased Data: A Pseudo-Partial Likelihood Approach. Biometrika. 2009;96:873–886. [947] [Google Scholar]

[R18] McDowell I, Hill G, Lindsay J. An Overview of the Canadian Study of Health and Aging. International Psychogeriatrics. 2005;13:7–18. doi: 10.1017/s1041610202007949. [946] [DOI] [PubMed] [Google Scholar]

[R19] Murphy SA, van derVaart AW. On Profile Likelihood. Journal of the American Statistical Association. 2000;95:449–465. [949] [Google Scholar]

[R20] Qin J, Shen Y. Statistical Methods for Analyzing Right- Censored Length-Biased Data Under Cox Model. Biometrics. 2010;66:382–392. doi: 10.1111/j.1541-0420.2009.01287.x. [947,951,953] [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] Simon R. Length Biased Sampling in Etiologic Studies. American Journal of Epidemiology. 1980;111:444–452. doi: 10.1093/oxfordjournals.aje.a112920. [947] [DOI] [PubMed] [Google Scholar]

[R22] Tsai W-Y. Pseudo-Partial Likelihood for ProportionalHazards Models With Biased-Sampling Data. Biometrika. 2009;96:601–615. doi: 10.1093/biomet/asp026. [947,951,953] [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] Vardi Y. Multiplicative Censoring, Renewal Processes, Deconvolution and Decreasing Density: Nonparametric Estimation. Biometrika. 1989;76:751–761. [947,952] [Google Scholar]

[R24] Vardi Y, Zhang C-H. Large Sample Study of Empirical Distributions in a Random-Multiplicative Censoring Model. The Annals of Statistics. 1992;20:1022–1039. [947] [Google Scholar]

[R25] Varin C. On Composite Marginal Likelihoods. Advances in Statistical Analysis. 2008;92:1–28. [947] [Google Scholar]

[R26] Varin C, Reid N, FirthReid D. An Overviewof Composite Likelihood Methods. Statistica Sinica. 2011;21:5–42. [947] [Google Scholar]

[R27] Wang M-C. Nonparametric Estimation From Cross-Sectional Survival Data. Journal of the American Statistical Association. 1991;86:130–143. [947] [Google Scholar]

[R28] Wang M-C. Hazards Regression Analysis for Length-Biased Data. Biometrika. 1996;83:343–354. [949] [Google Scholar]

[R29] Wang M-C, Brookmeyer R, Jewell N. Statistical Models for Prevalent Cohort Data. Biometrics. 1993;49:1–11. [947,948,949] [PubMed] [Google Scholar]

[R30] Wang M-C, Jewell N, Tsai W-Y. Asymptotic Properties of the Product Limit Estimate UnderRandom Truncation. The Annals of Statistics. 1986;14:1597–1605. [955] [Google Scholar]

[R31] When To Start Consortium. Timing of Initiation of Antiretroviral Therapy in AIDS-Free HIV-1-Infected Patients: A Collaborative Analysis of 18 HIV Cohort Studies. Lancet. 2009;373:1352–1363. doi: 10.1016/S0140-6736(09)60612-7. [954] [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] Winter BB, Foldes A. A Product-Limit Estimator for Use With Length-Biased Data. Canadian Journal of Statistics. 1988;16:337–355. [947] [Google Scholar]

[R33] Wolfson C, Wolfson DB, Asgharian M, M’Lan CE. A Reevaluation of the Duration of Survival After the Onset of Dementia. New England Journal of Medicine. 2001;344:1111–1116. doi: 10.1056/NEJM200104123441501. [947] [DOI] [PubMed] [Google Scholar]

[R34] Zelen M, Feinleib M. On the Theory of Screening for Chronic Diseases. Biometrika. 1969;56:601–614. [947] [Google Scholar]

PERMALINK

Composite Partial Likelihood Estimation Under Length-Biased Sampling, With Application to a Prevalent Cohort Study of Dementia

Chiung-Yu Huang

Jing Qin

Abstract

1. INTRODUCTION