Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2012 Feb 7.
Published in final edited form as: J Am Stat Assoc. 2011 Dec 1;106(496):1434–1449. doi: 10.1198/jasa.2011.tm10156

Maximum Likelihood Estimations and EM Algorithms with Length-biased Data

Jing Qin 1, Jing Ning 2, Hao Liu 3, Yu Shen 4
PMCID: PMC3273908  NIHMSID: NIHMS352232  PMID: 22323840

SUMMARY

Length-biased sampling has been well recognized in economics, industrial reliability, etiology applications, epidemiological, genetic and cancer screening studies. Length-biased right-censored data have a unique data structure different from traditional survival data. The nonparametric and semiparametric estimations and inference methods for traditional survival data are not directly applicable for length-biased right-censored data. We propose new expectation-maximization algorithms for estimations based on full likelihoods involving infinite dimensional parameters under three settings for length-biased data: estimating nonparametric distribution function, estimating nonparametric hazard function under an increasing failure rate constraint, and jointly estimating baseline hazards function and the covariate coefficients under the Cox proportional hazards model. Extensive empirical simulation studies show that the maximum likelihood estimators perform well with moderate sample sizes and lead to more efficient estimators compared to the estimating equation approaches. The proposed estimates are also more robust to various right-censoring mechanisms. We prove the strong consistency properties of the estimators, and establish the asymptotic normality of the semi-parametric maximum likelihood estimators under the Cox model using modern empirical processes theory. We apply the proposed methods to a prevalent cohort medical study. Supplemental materials are available online.

Keywords: Cox regression model, EM algorithm, Increasing failure rate, Non-parametric likelihood, Profile likelihood, Right-censored data

1. INTRODUCTION

When the observed failure times are not randomly selected from the target population of interest but with probability proportional to their underlying length, we have length-biased time-to-event data. Length-biased data are naturally encountered in applications of renewal processes (Cox and Miller, 1977; Vardi, 1982; Dewanji and Kalbfleisch, 1987; Vardi, 1989), industrial applications (Kvam, 2008), etiologic studies (Simon, 1980), genome-wide linkage studies (Terwilliger et al., 1997), epidemiologic cohort studies (Keiding, 1991; Gail and Benichou, 2000; Gordis, 2000; Sansgiry and Akman, 2000; Scheike and Keiding, 2006), cancer prevention trials (Zelen and Feinleib, 1969; Zelen, 2004), and studies of labor economy (Lancaster, 1990; McClean and Devine, 1995; De Uña Álvarez et al., 2003). In observational studies, a prevalent cohort design that draws samples from individuals with a condition or disease at the time of enrollment is generally more efficient and practical. The recruited patients who have already experienced an initiating event are followed prospectively for the failure event (e.g. disease progression or death) or are right censored. Under this sampling design, individuals with longer survival times measured from the onset of the disease are more likely to be included in the cohort, whereas those with shorter survival times are selectively excluded. Length-biased sampling thereby manifests in the observations, because the “observed” time intervals from initiation to failure within the prevalent cohort tend to be longer than those arising from the underlying distribution of the general population. How to properly adjust for potential selection bias in analyzing length-biased data has been a longstanding statistical problem. Although we use a prevalent cohort study in medical applications here to illustrate length-biased data, it is apparent that the issues caused by biased sampling are common in many potential applications and sampling designs.

In a seminal paper, Vardi (1989) described the multiplicative censorship model, which connected four well-investigated statistical problems: A. Estimating a non-parametric distribution function under multiplicative censoring, B. Estimating the underlying distribution in renewal processes, C. Solving a nonparametric deconvolution problem, and D. Estimating a monotone decreasing density function. Vardi (1989) presented problems A and C, which have a natural connection with the measurement error problem and inverse problem discussed by van der Vaart and Wellner (1992) and Bickel and Ritov (1994). Most importantly, Vardi (1989) and Wang (1991) showed that the nonparametric maximum likelihood estimation (NPMLE) of the survival distribution under multiplicative censoring (problem B) is equivalent to the nonparametric estimation for survival distribution of the observed length-biased data. The large sample properties of the corresponding NPMLE is established in Asgharian et al. (2002), and the asymptotic efficiency of the NPMLE follows from Asgharian and Wolfson (2005) and van der Vaart (1998, Theorem 25.47). In this paper we explore the potential to extend the approach of Vardi (1989) to nonparametric estimations in more general settings and to semiparametric regression models.

The Cox proportional hazards model, the most popular semiparametric model for regression analysis of traditional survival data, assumes a nonparametric baseline hazard function and a regression function of the covariates (Cox, 1972, 1975). Only limited literature exists that describes modeling risk factors on the distribution of the underlying population when observed failure times are subject to length bias. Recently, Tsai (2009) generalized the pseudo-partial likelihood approach of Wang (1996) to model right-censored length-biased data. Qin and Shen (2010) proposed inverse weighted estimating equation approaches for right-censored length-biased data under the proportional hazards model. These approaches do not provide a straightforward way to analyze length-biased data if the censoring time depends on the covariates, and may not yield efficient estimators. For traditional survival data, Zeng and Lin (2007) demonstrated that estimating equations approaches under either the semiparametric Cox model or the transformation models are less efficient than the profile maximum likelihood estimation approach. (For related works on the profile likelihood for traditional survival data, see Nielsen et al. (1992); Klein (1992); Murphy (1994, 1995); Murphy and van der Vaart (2000); Zeng et al. (2005); Zeng and Lin (2007).) For right-censored length-biased data, we expect the similar efficiency advantage of the maximum likelihood estimation (MLE) method, and the robustness of the method to various assumptions of the censoring distribution.

Implementing the profile likelihood method is much more challenging when working with right-censored length-biased data compared to traditional survival data. One significant difference is that the full profile likelihoods have positive support on both censored and failure time points for length-biased data in contrast to the MLEs for traditional survival data and the conditional likelihood estimates for length-biased data. We propose new expectation-maximization (EM) algorithms for the maximum likelihood estimation of the nonparametric and semiparametric Cox regression models for right-censored length-biased data. One new aspect of our method is that we derive the likelihood for the unobserved (i.e. left-truncated) subpopulation given the observed length-biased data in the full likelihood, which serves as the missing data mechanism in the EM algorithm. In constrast to the EM algorithm of Vardi (1989), which estimates the underlying distribution function via estimation of the biased distribution, our EM algorithm directly estimates the target unbiased distribution function. As a result, any model and parameter constraints for the target distribution function can be directly imposed.

The rest of the paper is organized as follows. In Section 2, we introduce a new EM algorithm for the nonparametric estimation of the target distribution given length-biased data. In Section 3, we apply the new EM algorithm to estimate a distribution function with an increasing failure rate constraint. In Section 4, we propose the maximum semiparametric likelihood estimation under the Cox proportional hazards model and derive the large sample properties for length-biased data. We provide a convenient profile estimation approach based on the EM algorithm, with which the standard software for Cox regression can be adapted for right-censored length-biased data. We describe our simulation studies in Section 5 and the application of our method to a data example in Section 6. Section 7 contains some concluding remarks.

2. A NEW EM ALGORITHM FOR ESTIMATING NONPARAMETRIC SURVIVAL FUNCTION

Consider a prevalent cohort study in which the subjects are diagnosed with a disease and are at risk for a failure event. Let , be the duration from the disease onset to failure with the unbiased density function f(t) = dF(t)/dt and survival function S(t). The observed data include the backward recurrence time A (from disease onset to the study entry), forward recurrence time V (from the study entry to failure), and length-biased time T = A+V. Based on the renewal theory (Vardi, 1982, 1989; Lancaster, 1990, Chapter 3), the joint distribution of (A, V) is

f(a+v)μ,a,v>0,whereμ=tf(t)dt.

When the prevalent cohort is followed prospectively, V is subject to right censoring. The censoring time, denoted by C, is measured from the study entry. Let δ = I(V < C) be the censoring indicator and assume that (A, V) is independent of C. Let X = min(A + V, A + C). Denote the observed data as (Xi, Ai, δi), i = 1, 2, …, n. The density function of the observed biased T is defined as g(y) = dG(y)/dy, where dG(y) = ydF(y), and the survival function of is

S(t)=tdF(u)=μtu1dG(u).

Therefore, the likelihood for the observed data (Xi, Ai, δi) is proportional to

i=1nfδi(Xi)S1δi(Xi)μi=1n[dG(Xi)]δii=1n[xXix1dG(x)]1δi. (1)

Vardi (1989) proposed an EM-algorithm for the NPMLE of G. Using the relationship between G and F, dF(t) = t−1dG(t)/∫t−1dG(t), the NPMLE for F can be derived. However, it is often difficult to impose constraints on F when F is estimated from the NPMLE of G, because the constraints on F may not be easily translated to the constraints on G.

As demonstrated in Vardi (1989), to maximize (1) it is sufficient to consider the discrete version of distribution F, i.e., p( = ti) = pi, nonparametrically on the point masses at

t1<t2<<tk,

where t1, …, tk are the ordered unique failure and censoring times for {X1, …, Xn}, kn. In principle, the length-biased observations (A, T) can be equivalently generated from a truncation model with

AU(0,τ^),TF,on(0,τ^), (2)

where τ̂ = tk, A and are independent, dF(ti) = pi and i=1kpi=1, and (A, T) is observed if and only if A. The probability of observing a length-biased observation under this setting is π = P(A) = E()/τ̂.

We propose an EM algorithm with a different missing mechanism to directly estimate the target distribution, F. For a cohort subject to left truncation, the biased samples on n subjects denoted by O = {(X1, δ1, A1), ···, (Xn, δn, An), AiXi, i = 1, ···, n}, are observed, whereas the data on m subjects are left truncated. Here the latent left-truncated data are denoted by O={(T1,A1),,(Tm,Am),Ai>Ti,i=1,2,,m}. The random integer m then follows a negative binomial distribution with parameter π. The probability mass function of m is

(m+n1m)(1π)mπn,m=0,1,2,andE(mO)=n(1π)/π.

Following the principle of the EM algorithm, we think of {O, O*} as the ‘complete data’, and consider that pseudo missing data also referred to as “ghosts” data in Turnbull (1976) are O={(T1,A1),,(Tm,Am),m} and the observed ‘incomplete data’ are O. We derive the full likelihood including the component of the truncated observations. The log-likelihood based on the complete data {O, O*} is

j=1k[i=1nI(Ti=tj)+i=1mI(Ti=tj)]logpj, (3)

where TiAi, i = 1, 2, ···, n and Tl<Al, l = 1, ···, m. Then conditional on the observed data,

E[i=1nI(Ti=tj)|O]=i=1n{δiI(Ti=tj)+(1δi)P(Ti=tjTiAi,TiXi)}=i=1n[δiI(Xi=tj)+(1δi)I(Xitj)pjsXif(s)ds],

because (1 − δi)P (Ti = tj|TiAi, TiXi) = (1 − δi)P (T = tj | TXi) = (1 − δi)I(Xitj)pi/S(Xi). Conditional on the observed data O, the expectation for the missing left-truncated data can be expressed as

E{E[i=1mI(Ti=tj)|m]|O,T<A}.

Under the truncation model specified in (2),

EI(T=tj|O,T<A)=Pr(T=tj,A>tj)/Pr(T<A)=pj(1tj/τ^)1π.

This together with E(m | O) = n(1 − π)/π,

E{E[i=1mI(Ti=tj)|m]|O,T<A}=n(1π)π(1tj/τ^)pj1π=nπ(1tj/τ^)pj.

Subject to i=1kpi=1 and pi ≥ 0, we maximize the expected complete-data log-likelihood conditional on the observed data via the EM algorithm,

E(p)=j=1kwjlogpj, (4)

where p = (p1, ···, pk), and

wj=i=1n[δiI(Xi=tj)+(1δi)pjI(Xitj)j=1kpjI(Xitj)]+nπ(1tj/τ^)pj.

By simple algebra, j=1kwj=n+n(1π)/π=n/π. The following iterative EM algorithm can be used to solve j for j = 1, ···, k.

  • Step 1

    Select an arbitrary pj(0) satisfying j=1kpj(0)=1,pj(0)0.

  • Step 2
    Solve pj(1) by maximizing (4), so that we replace pj(0) with
    p^j(1)=π^(0)n{i=1n[δiI(Xi=tj)+(1δi)p^j(0)I(Xitj)j=1kp^j(0)I(Xitj)]+nπ^0(1tjτ^)p^j(0)}, (5)

    where π^(0)=j=1ktjp^j(0)/τ^.

With a given convergence criterion, we can solve pj iteratively. Let j denote the MLE of pj, j = 1, ···, k, the NPMLE F^(t)=j=1kp^jI(tjt),π^=tdF^(t)/τ^, π̂ = ∫ tdF̂(t)/τ̂, and

Q1n(t)=1n1i=1nδiI(Xit),Q0n(t)=1n0i=1n(1δi)I(Xit),

where n1=i=1nδi and n0=i=1n(1δi). Thus, the limiting form of (5) is

dF^(t)=π^n1n1dQ1n(t)+π^n0n1dF^(t)0tdQ0n(s)1F^(s)+(1t/τ^)dF^(t). (6)

Remark 1

In contrast to the NPMLE for traditional survival analyses, which has jumps only at the observed failure time points, the proposed NPMLE for length-biased data has jumps at all observed but unique points including censored times, similar to that of Vardi (1989).

Remark 2

Equation (6) for the constructed EM algorithm with the unbiased distribution function F is equivalent to that for Vardi’s EM algorithm based on a ‘multiplicative-censorship’ model with the biased distribution function G. Denoting (t) = tdF̂(t)/μ̂, where μ̂ = π̂τ̂, we re-express equation (6) as an equation of Ĝ,

dG^(t)=n1n1dQ1n(t)+n0n1dG^(t)t0t[rsr1dG^(r)]1dQ0n(s),

which is the same equation derived by Vardi (1989) and Vardi and Zhang (1992). The advantage of the new EM algorithm is that it directly estimates the target distribution function of the unbiased data, which allows one to directly impose constraints on F. This advantage will be further elucidated in the next two sections.

Remark 3

The ‘missing’ data (i.e. left-truncated failure times), { T1,,Tm} are assumed not subject to right censoring. It is clear that whether T* is subject to right censoring or not is irrelevant in the derivation of the above EM algorithm.

Remark 4

The development of the methods and large sample properties is focused on [0, τ] throughout the paper, where τ is a finite upper bound of the support of the population survival times, and Λ(τ) < ∞. In practice, τ can be estimated by tk=maxi=1nXi. We prove (in Appendix A.2) the following lemma that τ̂tkτ in probability, and that the convergence rate is faster than n1/2.

Lemma 1

Suppose that E(C) > 0 and τ < ∞. Then for 1 > η > 1/2, nη(τ̂τ) =op(1).

3. NONPARAMETRIC MAXIMUM LIKELIHOOD ESTIMATION WITH INCREASING FAILURE RATE

In some applications, it is known or assumed that the survival function for the target population has an increasing failure rate (Barlow and Proschan, 1975; Padgett and Wei, 1980; Tsai, 1988). The maximum likelihood estimation of a distribution function with an increasing failure rate was derived for traditional right-censored data by Padgett and Wei (1980), and for left-truncated and right-censored data by Tsai (1988). Using the same notation as in Section 2, the observed right-censored length-biased data are denoted by (X, A, δ). Let λ(t) denote the hazard function for the target cumulative density function F. Let z1 < ··· < zk* denote the distinct ordered failure times {X1, ···, Xn}. Let the size of the risk set at time x be denoted by R(x)=i=1nI(AixXi) and the size of failure at time x be denoted by d(x)=i=1nI(Xi=x,δi=1). Under the increasing failure rate constraint, Tsai (1988) proposed a maximum conditional likelihood estimator of λ, conditional on the truncation time A,

λ^(y)={0y<z1,λ^jzjy<zj+1;j=1,2,,k1,λ^kzky

where

λ^j=max1rjminjsk1{i=rsd(zi)/i=rsR(zi)(zi+1zi)}.

By applying the new EM algorithm, we consider a full likelihood estimation of the hazard function for the target population. Define λ(tj) = λj; pj can be expressed as λjexp(0tjλ(s)ds); thus the expected complete-data log-likelihood function in (4) is

E(λ)=i=1kwilogλii=1kwij=1itj1tjλ(t)dt, (7)

where λ = (λ1, ··· λk), and t1 < ··· < tk is defined in §2. Because the hazard function λ (.) increases with time,

E(λ)i=1kwilogλij=1k[i=jkwi(tjtj1)]λj1,

where λ0 = 0. Taking a partial derivative of λj to the right side of the above inequality, we have

wjλji=j+1kwi(tj+1tj)=0. (8)

Using arguments similar to those of Marshall and Proschan (1965) and Padgett and Wei (1980), the solution to equation (8) also maximizes the expected log-likelihood ℓE(λ) defined in (7),

λj=wji=j+1kwi(tj+1tj).

Applying the pool-adjacent-violators algorithm, we can then achieve monotonicity for the NPMLE of λ(·),

λ^(x)={0x<t1λ^jtjx<tj+1;j=1,2,,k1,λ^kx=tk,

where λ^j=max1rjminjsk1{i=rswi/i=rs[l=i+lkwl](ti+1ti)}. Although the formula for the proposed NPMLE of the monotone hazard function bears some similarity to that of Tsai (1988), the full likelihood approach is essentially different from the conditional likelihood approach of Tsai (1988), where the estimated function has jumps only at distinct failure time points. By using the information of the left-truncated data in the full likelihood function, the NPMLE is expected to be more efficient and smoother than the maximum conditional likelihood estimate. We will further compare the two approaches in empirical studies.

As the hazard function λ (t) is increasing on t ∈ [0, τ], the corresponding cumulative hazard function Λ(t) is convex. Let Λ̂n(·) denote the estimator obtained by the EM algorithm together with the pool-adjacent-violators algorithm. Then Λ̂n(·) is the greatest convex minorant of Λn(·), where Λn(·) is the NPMLE Λ(·) when there is no constraint on its shape. The strong consistent results of Λn(·) uniformly on [0, τ] can be easily derived from the uniform consistency of its survival function, established by Asgharian and Wolfson (2005). The consistency of Λ̂n can be inferred, because the pool-adjacent-violator algorithm leads to a continuous map for Λn(·) to Λ̂n(·). The technical details are provided in the Appendix A.3.

4. MLE UNDER COX REGRESSION MODEL

4.1 Full Likelihood and Score Functions

Since the cornerstone work of Cox (1972, 1975), the proportional hazards model has become the standard regression model for analyzing traditional right-censored survival data. Specifically, the covariate-specific hazard function is specified as

λZ(t)=λ(t)exp(βZ)

where Z is a covariate vector and the baseline hazard function λ(t) is not specified parametrically. Breslow (1972) showed that by inserting an estimator (Breslow’s estimator) of the hazard function with a fixed β into the full likelihood, the profile likelihood for β is reduced to Cox’s partial likelihood for β. Later, Kalbfleisch and Prentice (1973) and Andersen et al. (1992) proved that the rank-based likelihood method is also equivalent to the partial likelihood method. When survival data are subject to biased sampling, neither the Cox’s partial likelihood approach nor Kalbfleisch and Prentice’s rank-based likelihood method can be directly applied. This is because the observed biased data do not follow the proportional hazards model that is assumed for unbiased data from the target population; and because the rank-based likelihood method is not applicable, due to the dependency of the length-biased data on the magnitude of the length.

The density function of an unbiased given Z is denoted by f(t | Z) and the corresponding survival function by S(t | Z). For random but length-biased samples of n subjects, the observed data consist of { Inline graphic ≡ (Ai, Xi, δi, Zi), i = 1, ···, n}, which are n i.i.d. copies of Inline graphic ≡ (A, X, δ, Z). The full likelihood function of the observed data is proportional to

Ln=i=1nfδi(XiZi)S(1δi)(XiZi)μβ,Λ(Zi) (9)

where μβ,Λ(Zi)=0tf(tZi)dt=0S(tZi)dt. The identifiability of the model can be established, similar to the case for the Cox model under traditional survival data, where the identifiability has been established (Elbers and Ridder, 1982). By decomposing the full likelihood to the product of the conditional likelihood of X given A and the marginal likelihood of A, we have

Ln=[i=1nfδi(XiZi)S1δi(XiZi)S(AiZi)][i=1nS(AiZi)μβ,Λ(Zi)].

Although the estimating equation derived from the likelihood conditional on A (the first component in Ln) shares the same advantage of Cox’s partial likelihood by canceling the baseline hazard function (Wang et al., 1993; Kalbfleisch and Lawless, 1991), the conditional likelihood approach is generally less efficient than the full likelihood approach.

Using the notation of counting process, we denote Ni(t) = I(Xit)δi, Yi(t) = I(Xit) for i = 1, ···, n. The log-likelihood function of (9) can be expressed as

n(β,Λ)=n1i=1n{0τ(βZi+logλ(t))dNi(t)ΛZi(Xi)logμβ,Λ(Zi)}, (10)

where Λ(t)=0tλ(s)ds, and ΛZi(t)=0texp(βZi)dΛ(s). The estimation for MLE of β and the infinite dimensional parameter Λ can be computationally intractable if directly maximizing (10) or solving its score equations. We thereby propose an alternative computational approach, which is a generalization of the EM algorithm for NPMLE discussed in Section 2 under the Cox proportional hazards model.

The semiparametric MLE for the baseline hazard function Λ is obtained by maximizing the likelihood over the set of piece-wise constant functions. Of note, the estimator can have jumps at both censored and uncensored times by observing that the likelihood function achieves its maximum for the hazard function with jumps on {t1, ···, tk}, where t1 < ··· < tk denotes distinct failure and censored time points. Similar to the argument in (Vardi, 1989, page 754), for any estimator of Λ (t) that jumps outside of the event times {t1, ···, tk} one can find a greater likelihood with jumps on {t1, ···, tk} only. A detail explanation is in Supplemental Materials available online.

4.2. MLE and EM Algorithm

For i = 1, ···, n, let Tij, j = 1, 2, …, mi be the truncated latent data corresponding to covariate Zi. We develop the EM algorithm based on the discretized version of Λ(u) =Σutj λj, where λj is the positive jump at time tj for j = 1, ···, k, and λ = (λ1, ···, λk). For notational convenience, denote fi(t) = dF (t | Zi). The log-likelihood based on the complete data is then

j=1ki=1n[I(Ti=tj)+l=1miI(Til=tj)]logfi(tj)

Conditional on the observed data relative to the ith subject, Inline graphic = {Xi, Ai, δi, Zi}, we obtain the expectation that

wij=E[I(Ti=tj)+l=1miI(Til=tj)|Oi]=δiI(Xi=tj)+(1δi)pijI(Xitj)j=1kpijI(Xitj)+τ^μi(1tj/τ^)pij (11)

where

pij=λjexp(βZi)exp{l=1jλlexp(βZi)},andμi=j=1ktjpij.

Thus, the expected complete-data log-likelihood function conditional on the observed data is as follows:

E(β,λ)=i=1nj=1kwijlogfi(tj)=j=1kw+jlogλj+i=1nwi+βZil=1kj=lki=1nwijexp(βZi)λl,

where w+j=i=1nwij, and wi+=j=1kwij. In the M-step, we maximize the expected complete-data log-likelihood function conditional on the observed data with respect to the baseline hazard function at tj, for j = 1, ···, k,

E(β,λ)λj=w+jλjl=jki=1nwilexp(βZi)=0,

which leads to a closed form of λj as a function of β, denoted by

λj(β)=w+jl=jki=1nwilexp(βZi). (12)

Here, λj is the maximizer of the M-step. Next, we maximize the expected complete-data log-likelihood function with respect to β

E(β,λ)β=i=1nwi+Zil=1kj=lki=1nwijZiexp(βZi)λl. (13)

By inserting λj(β) of (12) into the equation (13), β can be solved from the following equation,

i=1nwi+Zil=1kw+l{i=1nj=lkwijZiexp(βZi)i=1nj=lkwijexp(βZi)}=0, (14)

which is equivalent to maximizing the complete-data profile likelihood function for λ. With the estimated λj (j = 1, ···, k) and β, one can update the expectation of the likelihood via wij in (11) and repeat the M-step until the estimators of β and λj (j = 1, ···, k) converge.

At the M-step, the estimating equation (14) reveals that we may use the existing software for conventional right-censored data to estimate the covariate coefficient β under the Cox proportional hazards model. To simplify the description, consider a model with one covariate Z. First we need to create a vector with a length of nk for the weight function defined by Wnk = (w11, ···, w1k, w21, ···, w2k, ···, wn1, ···, wnk), which is estimated at the E-step. The corresponding failure time data and covariate vectors are constructed with the same length as Wnk, Tnk = (t1, ···, tk, ···, t1, ···, tk) and Znk = (Z1, ···, Z1, ···, Zn, ···, Zn), respectively. By using the function “coxph” in S-PLUS (or R) with the “weights” option, we obtain the estimator of β at the M-step from

>coxph(Surv(Tnk,Δ)Znk,weights=Wnk),

where the censoring indicator, Δ = (1, ···, 1), is an identity vector of length nk.

Note that the algorithm computes β and λ iteratively through the EM steps. The value of the complete log-likelihood ℓE(β, λ) increases with each EM step. More specifically, our EM algorithm falls in the general scheme of the ECM algorithm, a variation of EM methods proposed by Meng and Rubin (1993). The convergence of our EM algorithm to the local maximizer is guaranteed by the same conditions that ensure the convergence of the ECM algorithm, as proved in Meng and Rubin (1993). The uniqueness of the NPMLE is guaranteed by the Assumption 5 in Appendix A.1.

4.3. Asymptotic Properties

In this section, we establish the strong consistency and asymptotic normality of the MLE under the regularity conditions list in Appendix A.1. For asymptotic proof, we denote the MLE by (β̂n, Λ̂n), and let (β0, Λ0) be the true value. In Appendix A.4, we prove the strong consistency by the classical Kullback-Leibler information approach, which has been successfully applied for NPMLE in traditional survival analysis (Murphy, 1994; Parner, 1998).

Theorem 1

Under the regularity conditions listed in Appendix A.1, the MLE (β̂n, Λ̂n) are consistent: β̂n converges to β0, and Λ̂n(t) converges to Λ0(t) almost surely and uniformly in t for t ∈ [0, τ] as n → ∞.

The computation for the MLE of Λ is based on the discretized version Λ̂n(t) = Σttj λ̂j. The existence and uniqueness of the NPMLE can be proved based on the log-likelihood function ℓn(β, λ), in terms of {β, λ} where λ ≡ {λ1, …, λk}, and

n(β,λ)=i=1n[0τβZidNi(t)+l=1kδilogλl]l=1kλli=1neβZi1(tlti)i=1nlog[0τexp(eβZil=1kλl1(tls))ds]. (15)

Let λ̂(·, β) be the maximizer of ℓn(β, λ) for given β. The existence and uniqueness of NPMLE are guaranteed by Assumption 5: the information matrix of the profile likelihood evaluated at the true value β0 is positive defiinite.

Next, we will apply the Z-theorem for the infinite-dimensional estimating equations to prove the weak convergence of the estimators (van der Vaart and Wellner, 1996, Theorem 3.3.1, p. 310). The score equation of β is

U1n(β,Λ)1ni=1n{0τZidNi(u)0τYi(u)ZieβZidΛ(u)+0τZi(uτS(vZi)dv)eβZiΛ(u)/μβ,Λ(Zi)}. (16)

To obtain the MLE of Λ(·), consider a submodel defined by dΛη = (1 + ηh)dΛ, where h is a bounded and integrable function. Taking the derivative of ℓn(β, Λη) with respect to η, evaluating it at η = 0, and setting h(·) = 1(· ≤ t), we have the score equation for Λ

U2n(t,β,Λ)1ni=1n{0tdNi(u)0tYi(u)eβZidΛ(u)+0t(uτS(vZi)dv)eβZidΛ(u)/μβ,Λ(Zi)}. (17)

Denote the vector for score functions by Un(·, β, Λ) ≡ {U1n(β, Λ), U2n(t, β, Λ)}, and its expectation E0 under the true values (β0, Λ0) by

U0(·,β,Λ){U10(β,Λ),U20(·,β,Λ)},

where U10(β, Λ) = E0{U1n(β, Λ)} and U20(t, β, Λ) = E0{U2n(t, β, Λ)}. Both the score function Un and its expectation U0 are defined on the parameter set Inline graphic × Inline graphic, where the set Inline graphic is assumed to be compact in ℝp, and the set Inline graphic consists of nondecreasing functions in the space of functions with bounded variation. Let ψ̂n = (β̂n, Λ̂n), ψ = (β, Λ) and ψ0 = (β0, Λ0).

By the definition of MLE, Un(·, ψ̂n) = 0. As the true parameter ψ0 satisfies U0(·, ψ0) = 0, n{U0(·,ψ^n)U0(·,ψ0)}=n{U0(·,ψ^n)Un(·,ψ^n)}. In the Appendix A.5, we prove a stochastic approximation n{U0(·,ψ^n)Un(·,ψ^n)}n{Un(·,ψ^0)U0(·,ψ0)}=op(1). Denote that ψ0 is the Fréchet derivative of the map U0(·, ψ) evaluated at ψ0. By the definition of the Fréchet derivative, U.ψ0{n(ψ^nψ0)}=n{Un(·,ψ0)U0(·,ψ0)}+oP(1).

The estimating function evaluated at ψ0, nUn(ψ0)=n{Un(·,ψ0)U0(·,ψ0)}, is a sum of i.i.d. random quantities. We will prove by the empirical processes theory that nUn(ψ0)converges weakly to Inline graphic = ( Inline graphic, Inline graphic), where Inline graphic is a Gaussian random vector and Inline graphic is a tight Gaussian process. The covariance matrix for is Inline graphic is Σ11 = E0{U1n(β0, Λ0)⊗2}, and the covariance between Inline graphic(s) and Inline graphic(t) is Σ22(s, t) = E0{U2n(s, β0, Λ0)U2n(t, β0, Λ0)}. By the Z-theorem for the infinite-dimensional estimating equations (van der Vaart and Wellner, 1996), we have

Theorem 2

Under the regularity conditions listed in the Appendix A.1, n(ψ^nψ0)converges weakly to a tight mean zero Gaussian process U.ψ01(W).

Note that the asymptotic distribution of the sequence n(ψ^nψ0) is completely determined by the tightness of U.ψ01(W) and its marginal covariance function. We characterize the Fréchet derivative ψ0, viewed as an operator on the parameter space of Inline graphic × Inline graphic. Define for l = 0, 1, 2,

K1(l)(u)=E0{Zleβ0ZS0(uZ)μ01(Z)0uSC(sZ)ds}K2(l)(t,u)=uτE0{Zle2β0Zμ01(Z)S0(vZ)(Λ0(tv)Φ0(tZ)μ01(Z))}dv,

where SC(u | Z) = P(Cu | Z), μ0(Z) = μβ0, Λ0 (Z), S0(tZ)=exp{exp(β0Z)Λ0(t)}, and

Φ0(tZ)=0t(uτS0(vZ)dv)dΛ0(u).

By Assumption 5, the Fisher information of β for known Λ0 is positive definite

J0{0τK1(2)(u)dΛ0(u)+0τK2(2)(τ,u)dΛ0(u)}. (18)

Then the Fréchet derivative ψ0 can be written in the following form:

U.ψ0(β,Λ)(σ11σ12σ21σ22)(βΛ)(σ11(β)+σ12(Λ),σ21(β)(·)+σ22(Λ)(·)), (19)

where

σ11(β)={0τK1(2)(u)dΛ0(u)+0τK2(2)(τ,u)dΛ0(u)}βσ12(Λ)=0τK1(1)(u)dΛ(u)+0τK2(1)(τ,u)dΛ(u)σ21(β)(t)={0tK1(1)(u)dΛ0(u)+0τK2(1)(t,u)dΛ0(u)}βσ22(Λ)(t)=0tK1(0)(u)dΛ(u)+0tK2(0)(t,u)dΛ(u).

We show the invertibility of ψ0 by translating the operator into the Fredholm integral equations of the second kind (Tricomi, 1985, Chapter 2). We prove in the appendix A.5 that the inverse of U.ψ01 exists, and the inverse is continuous with the following form

U.ψ01(β,Λ)(σ111+σ111σ12Φ1σ21σ111σ111σ12Φ1Φ1σ21σ111Φ1)(βΛ), (20)

where the functional Φ=σ22σ21σ111σ12. We shown in the appendix that Φ has an inverse Λ → Φ−1(Λ) expressed in the following form as a function of t

Φ1(Λ)(t)=0tdΛ(u)K1(0)(u)0τ(0tH(u,v)K1(0)(u)du)dΛ(v), (21)

where H(u, v) is the solution of the following integral equation

H(u,v)=Q(u,v)K1(0)(v)+0τH(u,s)Q(s,v)K1(0)(v)ds, (22)
Q(t,u)=(K1(1)(t)λ0(t)+0τK.2(1)(t,v)dΛ0(v))J01(K1(1)(u)+K2(1)(τ,u))K.2(0)(t,u), (23)

and with the notation K.2(l)(t,u)=K2(l)(t,u)/t for l = 0, 1,

K.2(l)(t,u)=uτE0{Zle2β0ZS0(vZ)μ01(Z)(λ0(tv)λ0(t)tτS0(sZ)μ01(Z)ds)}dv.

By Theorem 2, n(β^nβ0) converges in distribution to a mean zero normal random vector characterized by

σ111(W1)+σ111σ12Φ1σ21σ111(W1)σ111σ12Φ1(W2), (24)

where the Gaussian process Φ−1( Inline graphic) has the following form

Φ1(W2)(t)=0tdW2(u)K1(0)(u)0τ(0tH(u,v)K1(0)(u)du)dW2(v).

Note that the stochastic integral is well defined via the formula of integration by the parts because the functions u1/K1(0)(u) and v0tH(u,v)/K1(0)(u)du are of bounded variation on [0, τ].

Additionally, the process n(Λ^nΛ0) converges weakly to a tight Gaussian process

Φ1σ21σ111(W1)+Φ1(W2), (25)

where the processes

σ21σ111(W1)(t)={0tK1(1)(u)dΛ0(u)+0τK2(1)(t,u)dΛ0(u)}J01W1.

If the baseline function Λ0 is known, then n(β^nβ0) converges in distribution to a Gaussian random variate σ111(W1) with mean zero and the sandwich covariance matrix J0111J01. Because of the variation associated with the profile-likelihood estimator Λ̂n, the asymptotic variance of n(β^nβ0) is more complicated, with the extra terms indicated in (24). The variance-covariance matrix may be estimated by its empirical plug-in version, but the computation can be extremely complicated as it needs to solve the integral equation (22). We describe alternative methods for this computation in the following section.

4.4. Variance Estimation

Unlike the estimation of the regression coefficients, variances of the estimator of the MLE, β̂n cannot be obtained directly from existing software such as R or SAS, because they cannot incorporate the variations caused by the profile likelihood estimator. Instead, we can use bootstrapping techniques or use the information matrix to estimate the variance of the estimators. When working with the observed full likelihood with unknown parameters (β, Λ), the total number of parameters has the same order as the number of observed distinct times, which often yields an information matrix of a high dimension. Murphy and van der Vaart (1999) showed that the inverse of the information matrix for the profile likelihood provides valid variance estimates for the finite-dimensional parameters of interest, i.e., β̂n under the semiparametric models. There is no general analytical formula for calculating the profile information matrix; thus we describe a numerical EM-aided differentiation approach (Murphy and van der Vaart, 1999; Chen and Little, 1999) to approximate the profile information matrix. Chen and Little (1999) also proved that the score function of the profile likelihood for the observed data is the same as the expected complete-data score function conditional on the observed data at the convergent point. Therefore, the second derivative of the log profile likelihood evaluated at the MLE β̂n can be approximated by ∂2E(β, λ(β))/∂β2|β=β̂n, which is the first derivative of the expected complete-data score function conditional on the observed data and profiled over λ (β) = (λ 1(β), ···, λk(β)) given by (12). By perturbation around the profile MLE β̂n, the information matrix for β can then be calculated as follows:

  1. Perturb the lth component of β̂n = (β̂1, ···, β̂p) by a small value ε = 1/n in the neighborhood of β̂l (in both directions). The perturbed estimator is denoted by β̂;,l = (β1, ···, βl ± ε, ···, βk), where l = 1, ···, p.

  2. Approximate the lth row of the information matrix of β by
    1ε{E(β,λ(β))β|β=β^ε,l},

    where the hazard function λ (β) is obtained from (12) using the M-step described in Section 4.2.

When estimating the variance of λ̂n, the bootstrap approach can be used. This re-sampling approach is valid, given our established asymptotic normality for the MLE of β̂n and λ̂n. In this case, we should obtain the variances of both β̂n and λ̂n.

5. SIMULATIONS

We performed simulation studies to evaluate the proposed methods and the corresponding EM algorithms for two settings: nonparametric MLE with an increasing failure rate, and the profile likelihood estimators under the Cox proportional hazards model for length-biased data. We aimed to assess the small sample accuracy and precision of our estimators, and to compare their performance with those of the existing methods under each setting. Each study comprised 1000 repetitions. Sample sizes of 200 and 400 were used.

5.1. Estimating a Distribution Function with an Increasing Failure Rate

We generated independent pairs of (A, ) with failure times from a Weibull distribution (F(t) = 1 − exp{−(t/α2)α1}) with α1 = 2 and α2 = 1 and truncation times from a uniform distribution to ensure the stationarity assumption. Here, the specified Weibull distribution has an increasing failure rate. The censoring variables measured from the examination time were independently generated from uniform distributions.

Table 1 compares the performance of our proposed estimator and that of Tsai’s estimator (Tsai, 1988), denoted by Fc. When F (t) is greater than 0.5, both estimators achieved outstanding accuracy. In contrast, when F(t) is small, both had downward biases, and the bias decreased with increasing sample sizes. As expected, the empirical standard deviations of our estimator were as much as 25% lower than those of Tsai’s estimator.

Table 1.

Summary Statistics of Simulations for the Estimated Distribution Function with Increasing Failure Rate

Sample Size C% F(t) p(t) c(t)
Est. ESD ESMSE Est. ESD ESMSE
200 10% 0.10 0.060 0.024 0.047 0.066 0.032 0.047
0.25 0.211 0.044 0.059 0.211 0.045 0.060
0.50 0.486 0.045 0.047 0.469 0.046 0.056
0.75 0.745 0.030 0.030 0.732 0.034 0.038
0.90 0.898 0.018 0.018 0.892 0.020 0.021
200 30% 0.10 0.068 0.028 0.043 0.065 0.033 0.048
0.25 0.222 0.045 0.053 0.210 0.046 0.061
0.50 0.490 0.046 0.047 0.468 0.049 0.058
0.75 0.748 0.031 0.032 0.732 0.036 0.041
0.90 0.901 0.019 0.019 0.892 0.022 0.024
400 10% 0.10 0.065 0.019 0.040 0.075 0.025 0.035
0.25 0.217 0.031 0.045 0.221 0.033 0.043
0.50 0.495 0.033 0.033 0.478 0.033 0.039
0.75 0.748 0.021 0.021 0.738 0.023 0.026
0.90 0.899 0.012 0.012 0.895 0.014 0.015
400 30% 0.10 0.080 0.021 0.029 0.075 0.025 0.036
0.25 0.236 0.033 0.036 0.221 0.033 0.044
0.50 0.501 0.033 0.034 0.477 0.034 0.041
0.75 0.751 0.022 0.022 0.737 0.025 0.028
0.90 0.901 0.013 0.013 0.895 0.016 0.016

Note: p(t) is the proposed estimator; c is Tsai’s conditional estimator; C%=censoring percentage; Est.= average of estimates; ESD = empirical standard deviation; ESMSE = empirical square root of the mean squared error = (bias2 + ESD2)1/2.

5.2. Estimating Regression Coefficients Under the Cox Model

We generated unbiased failure times () from the proportional hazards model with two covariates, where β = (β1, β2) = (0.5, 1), the binary covariate Z1 ~ Bernoulli(1,0.5), the continuous covariate Z2 ~ uniform(−0.5,0.5), and the baseline hazard function is t. The censoring times C were independently generated either from uniform distributions or from the specified covariate dependent distributions (see Table 2).

Table 2.

Summary Statistics of Simulations for Estimating Regression Coefficients under Cox Model with β = (β1, β2) = (0.5, 1). Mean SE is the mean of the estimated standard errors

Cohort Size C% Proposed approach EE - I EE - 2 Tsai’s Method
Est. ESD Mean SE 95 % CP Est. ESD Est. ESD Est. ESD
200 15% (.49,.98) (.11,.20) (.11,.19) (.96,.95) (.50,1.01) (.14,.25) (.51,1.04) (.13,.24) (.51,1.01) (.12,.22)
30% (.48,.94) (.11,.21) (.11,.20) (.94,.93) (.47,.91) (.19, .33) (.51,1.01) (.16,.28) (.50,1.01) (.14,.25)
50% (.46,.93) (.12,.21) (.12,.20) (.93,.94) (.42,.85) (.23,.40) (.51,1.02) (.19,.34) (.49,1.01) (.18,.31)
400 15% (.49,.98) (.08,.14) (.08,.14) (.95,.95) (.50,.99) (.10,.18) (.51,1.02) (.09,.17) (.52,1.01) (.09,.15)
30% (.48,.97) (.08,.15) (.08,.14) (.93,.93) (.46,.94) (.14,.24) (.50,1.01) (.11,.21) (.50,1.00) (.10,.17)
50% (.48,.94) (.08,.15) (.08,.15) (.94,.92) (.42,.84) (.18,.31) (.51,1.02) (.14,.24) (.49,1.01) (.13,.22)
λc = t exp (0.5Z1 + 05Z2).
200 30% (.48,.95) (.11,.20) (.11,.20) (.94,.93) (.39,.89) (.18,.31) (.46,.97) (.15,.27) (.58,1.07) (.13,.23)
400 30% (.48,.97) (.08,.15) (.08,.14) (.96,.96) (.39,.91) (.13,.23) (.45,.97) (.11,.20) (.58,1.08) (.10,.17)
λc = t exp (Z2)
200 30% (.48,.96) (.11,.20) (.11,.20) (.95,.94) (.52,.82) (.18,.31) (.53,.92) (.15,.28) (.49,1.15) (.14,0.24)
400 30% (.48,.96) (.08,.15) (.08,.14) (.93,.93) (.50,.80) (.12,.22) (.51,.91) (.11,.19) (.50,1.15) (.10,.17)
C ~ Z1U (0, 1) + (1 − Z1)U (0, 1.8)
200 30% (.49,.96) (.12,.21) (.12,.20) (.95,.93) (.82,.92) (.20,.33) (.71,1.00) (.17,.28) (.41,1.02) (.15,.25)
400 30% (.48,.96) (.09,.14) (.08,.14) (.92,.95) (.82,.89) (.11,.22) (.69,.97) (.12,.19) (.70,.96) (.09,.17)

For a light (15%) or moderate (30%) censoring percentage, the mean estimates of the coefficients agreed well with the true parameter no matter whether the censoring distribution was dependent on or independent of the covariates. Even with heavy censoring (50%), the inferences associated with the proposed method were fairly accurate for all of the scenarios investigated: the means of the estimated standard errors were close to the empirical standard errors and the coverage of the 95% confidence intervals were reasonable, ranged from 92% to 96%.

In Table 2, we also show a comparison of the performance between the proposed MLE and the existing estimation methods for length-biased data. The estimating equation approaches of Qin and Shen (2010) are as follow:

EEI:i=1nδi[Zij=1nI(XjXi)δj{XjSC(XjAj)}1Zjexp(βZj)j=1nI(XjXi)δj{XjSC(XjAj)}1exp(βZj)]=0,EEII:i=1nδi[Zij=1nI(XjXi)δj{WC(Xj)}1Zjexp(βZj)j=1nI(XjXi)δj{WC(Xj)}1exp(βZj)]=0,

where SC(·) is the survival function of the censoring time C, and WC(t)=0tSC(s)ds. Tsai (2009) proposed the pseudo-partial likelihood method with the following estimating equations based on the score statistics,

EEPL:i=1n0{Zij=1nZjYj(t)exp(βZj)W(t,Xj)j=1nYj(t)exp(βZj)W(t,Xj)}dNi(t)=0,

where W(t,Xj)=δjWC(Xj)WC(Xjt)WC(Xj)+(1δj)SC(Xj)SC(Xjt)SC(Xj).

When the censoring was independent of the covariates, all the types of estimators had a small bias with light censoring (15%). With moderate (30%) or heavy censoring (50%), the biases associated with the MLE were much smaller than those associated with EE I. The MLE method always exhibited clearly superior efficiency, with smaller empirical standard errors. For instance, the standard errors associated with the estimating equations were 1.12 to 1.62 times greater, and the standard errors associated with the pseudo-partial likelihood method were 1.09 to 1.50 times greater than those associated with the MLE method based on a sample size of 200. When the censoring distribution was dependent on the covariates, the estimators obtained by EE-I, EE-II and EE-PL were biased compared with those obtained by the MLE method. In summary, the MLE approach is the most efficient one among the four methods, and is also the most robust to various censoring mechanisms.

6. A REAL DATA EXAMPLE

Dementia is a progressive degenerative medical condition and is one of the leading causes of all deaths in the United States and Canada. The Canadian Study of Health and Aging was a multicenter epidemiologic study of dementia, in which 14,026 subjects 65 years or older were randomly chosen throughout Canada to receive an invitation for a health survey. A total of 10,263 subjects agreed to participate in the study (Wolfson et al., 2001). The participants were then screened for dementia, and 1132 people were identified as having the disease. The individuals with dementia were followed until their deaths or last follow-up dates in 1996, and their dates of dementia onset were ascertained from their medical records.

After excluding subjects with missing data regarding the date of disease onset or classification of dementia subtype, a total of 818 patients remained {393 with probable Alzheimer’s disease, 252 with possible Alzheimer’s disease and 173 with vascular dementia. Other study variables included the approximate date of dementia onset, date of screening for dementia, date of death or censoring and death indicator variable. Given the prevalent cases ascertained cross-sectionally, Asgharian et al. (2006) validated the stationarity assumption, which was defined as that the incidence of dementia did not change over the period of the study.

At the end of the study, 638 out of 818 patients had died and the others were right censored. Within this elderly cohort, it seems reasonable to assume that the overall death rate increases with age. Applying the NPMLE approaches described in Sections 2 and 3, we estimated the hazard function for each subtype of dementia, and plotted the survival of patients with probable Alzheimer’s disease, possible Alzheimer’s disease and vascular dementia with and without the constraint of the increasing risk of death (see Figure 1). It is not surprising that the estimated survival curves with the constraint, i.e., additional information, have narrower confidence intervals than their corresponding survival curves without the constraint. As pointed out by one referee, a monotone hazards constraint may not hold for death due to dementia, particularly for patients with vascular dementia, because of the uncertainty associated with the cause of death.

Figure 1.

Figure 1

Estimated survival curves according to subtypes of dementia, with and without the constraint of increasing risk of death with age

Using the diagnosis subtype of possible Alzheimer’s disease as the baseline cohort, we defined two indicator variables for the other two subtypes of dementia for the Cox proportional hazards model. Applying the proposed method in Section 4, the estimated covariate Effects of two subtypes of dementia and their standard errors are listed in Table 3. The results showed that the long-term survival distributions were statistically significantly different between the group with vascular dementia and the group with possible Alzheimer’s dementia, and marginally different between the group with probable Alzheimer’s dementia and the group with possible Alzheimer’s dementia. We also analyzed the same data set using the estimating equation methods (EE-I and EE-II) by Qin and Shen (2010) and the pseudo-partial likelihood method by Tsai (2009). The estimated coefficients with the associated standard errors obtained by both EE-II and EE-PL indicated no statistically significant survival differences between the three subtypes of dementia. The results from EE-I suggested there existed a statistically significant survival difference between the group with vascular dementia and the group with possible Alzheimer’s dementia, but no statistically significant survival differences between the group with vascular dementia and the group with probable Alzheimer’s dementia. The discrepancy in the inferences is most likely caused by the loss in efficiency when using the estimating equation method or the pseudo-partial likelihood method compared to using the MLE method.

Table 3.

Estimates (Standard Errors) of Regression Coefficients Using Length-biased Adjusted Methods for Dementia Data.

MLE EE-I EE-II EE-PL
Probable Alzheimer 0.125 (0.062) 0.109 (0.092) 0.134 (0.091) 0.064(0.081)
Vascular Dementia 0.185 (0.077) 0.245 (0.110) 0.208 (0.110) 0.164(0.111)

7. CONCLUDING REMARKS

We have proposed new EM algorithms for length-biased data to obtain full likelihood maximum estimators under three settings, and the missing data mechanism in the EM-algorithm is the left truncation for the length-biased data. In constrast to Vardi’s (1989) EM algorithm for estimating nonparametric survival distributions, the advantage of the new EM algorithm is that one can directly estimate the non-parametric survival distribution or hazard function of the unbiased failure time .

One major challenge to maximum likelihood estimation when it involves infinite dimensional parameters is computational intractability. We have implemented the new EM algorithm together with the profile likelihood method for jointly estimating the baseline hazard function and the covariate coefficients under the Cox regression model for length-biased data. Commercially available statistics software for the Cox model can be adapted for easy computation. The EM algorithm is not computational intensive even with continuous covariates, since pij and λj can be obtained easily from the closed form expressions.

Similar to the NPMLE for traditional survival data, the method we have proposed requires the observation of at least one failure time to ensure that the large sample properties hold in the settings for length-biased data. As shown in our empirical studies, the proposed computational algorithms to solve MLE perform well in terms of accuracy, and are more efficient compared to the existing estimating equation approaches, which are more efficient than the conditional approach (Wang et al., 1993). Without assuming a known parametric distribution of Z as in Bergeron et al. (2008), maximizing the likelihood function (1) is equally efficient to maximinizing the full likelihood including the marginal distribution of Z.

Parallel to the observation of Zeng and Lin (2007) for traditional survival data, estimators obtained from MLE are much more robust and efficient than those from the estimating equation approaches, and well suited for the proportional hazards regression model and other nonparametric estimations for length-biased right-censored data. The proposed EM algorithm may be further generalized to other semiparametric models, and the tools for model checking should be developed for length-biased right-censored data.

Supplementary Material

websupplementary

Acknowledgments

This research was partially supported by National Institute of Health grant R01-CA079466.

We thank one Associate Editor and two Referees for their very constructive comments. We also thank Professor Masoud Asgharian and investigators of the Canadian Study of Health and Aging (CHSA) for providing us the dementia data from CHSA. The data reported in the example were collected as part of the CHSA. The core study was funded by the Seniors' Independence Research Program, through the National Health Research and Development Program of Health Canada (Project no.6606-3954-MC(S)). Additional funding was provided by Pfizer Canada Incorporated through the Medical Research Council/Pharmaceutical Manufacturers Association of Canada Health Activity Program, NHRDP Project 6603-1417-302(R), Bayer Incorporated, and the British Columbia Health Research Foundation Projects 38 (93-2) and 34 (96-1). The study was coordinated through the University of Ottawa and the Division of Aging and Seniors, Health Canada.

A APPENDIX

A.1 Assumptions

Denote the Euclidean norm by |·|. For a vector z = (z1, …, zp)′, |z| ≡ (|z1|2 + ··· + |zp|2)1/2. To avoid measurability issues, the probability is understood as the outer probability (van der Vaart and Wellner, 1996). We adopt the convention that 0/0 = 0. We assume the following regularity assumptions:

  1. The true value of the hazard function λ0(·) is continuously differentiable. In addition, the upper bound τ of the support of the cumulative hazard function Λ0 is finite.

  2. The parameter β is in a compact set Inline graphic that contains β0. The parameter set Inline graphic for the baseline function contains all nondecreasing functions Λ satisfying Λ (0) = 0 and Λ (τ) < ∞.

  3. The residual censoring time C satisfies P(C > V) > 0. Its survival function SC(·) is continuous.

  4. For the covariate vector Z, the terms E0|Z|2 and E0|eβZ| are bounded.

  5. The information matrix −∂2E[ℓn(β, λ̂ (·, β)]/∂β2 evaluated at the true values β0 is positive definite.

  6. If P(bZ = c0) = 1 for some constant c0, then b = 0.

Assumption 1 and 3 imply that the bivariate function Q(t, u) defined in (23) is continuous on [0, τ] × [0, τ]. Assumption 3 implies that the censoring may occur after V. Assumption 5 is a classical condition that appears in the study of the Cox model for traditional survival data (Andersen et al., 1992, Condition VII2.1.(e), page 497). It implies that the matrix J0 defined in (18) is positive definite, which is the information matrix for β when the baseline function Λ0 is known. The positive definiteness of J0 can be also implied by the fact the model is identifiable (Rothenberg, 1971). Assumption 6 means that there is no covariate colinearity, which ensure the model identifiability.

A.2 Proof of the lemma on the convergence of τ̂ = tk to τ

Recall that Xi = min(Ai+Vi, Ai+Ci), and tk = max{X1, ···, Xn}. For any arbitrary small ε > 0,

P(max1inXiτ>ε)=[P(X1<τε)]n={1E[I(A1+V1>τε)SC(τεA1)]}n={1τετw(y)f(y)dy/μ}n={1w(ξ)f(ξ)ε/μ}n,

where w(y)=0ySC(τεa)da>0,τεξτ,f(ξ)>0, and the last equality uses the mean value theorem. Given the above equation, for any arbitrary small η > 0,

P(nηmax1inXiτ>ε)={1w(ξ)f(ξ)εnη/μ}n,

where τεnηξτ. Note that when n → ∞,

w(ξ)=0ξSC(τεnηa)da=τξεnητεnηSC(s)ds0τSC(s)ds,0τSC(s)ds=E[0τI(C>s)ds]=E[min(C,τ)].

Thus, as long as E[C] > 0 which is inferred by Assumption 3, we have w(ξ) → w(τ) > 0 and f(ξ) → f(τ) > 0, for n → ∞, since τ is the upper bound for the support of population time and Λ(τ) < ∞. Therefore, when n → ∞ we have that

P(nητ^τ>ε)exp(w(τ)f(τ)εn1η/μ),

which implies that τ̂τ in a rate higher than n1/2 but lower than n (i.e. when 1/2 < η < 1, the above probability converges to zero).

Note that by the Borel-Cantelli Lemma, τ̂nτ in probability implies that for every subsequence, there is a sub-subsequence {n′} such that τ̂nτ almost surely (Ferguson, 1996, page 8). This combined with the fact that τ can be consistently estimated by τ̂n complete the proof for strong consistency of Λ̂n. As τ̂nτ, these facts also infer that the mean μ of the population failure time can be consistently estimated by 0τ^exp(Λ^n(s))ds.

A.3 Consistency of Λ̂n with increasing failure rate

Let ||·||τ denote the supremum norm over [0, τ]. We have the following strong consistency result for the NPMLE Λ̂n(·) proposed in Section 3:

Proposition 1

Suppose Λ is convex on its support [0, τ]. Under the regularity conditions,

||Λ^nΛ||τ||ΛnΛ||τa.s.0.

We adapt the proof in Huang and Wellner (1995). Let εn = ||Λn − Λ||τ. As argued in Asgharian and Wolfson (2005), εn → 0 almost surely when n → ∞. Since Λ is convex on [0, τ], it must be continuous on [0, τ]. The function Λ − εn is convex and is a minorant of Λn, i.e., Λ (s) − εn ≤ Λn(s) for all 0 ≤ sτ. By the definition of Λ̂n, we have that for all 0 ≤ sτ,

Λ(s)εnΛ^n(s)Λn(s).

It follows that −εn ≤ Λ̂n(s) − Λ(s) ≤ Λn(s) − Λ(s) ≤ εn for all 0 ≤ sτ. The conclusion of the proposition follows as ||Λ̂n − Λ||τεn goes to 0 almost surely.

A.4 Consistency: Proof of Theorem 1

Note that the log-likelihood function ℓn(β, λ) is strictly concave in λ as each function of λ in ℓn(β, λ) is concave or strictly concave and the summation of concave functions is concave. Hence, for each β in a compact set Inline graphic, we can find a unique maximizer of λ̂(·, β) of the likelihood function ln(β, λ). The existence of the NPMLE for {β, λ} follows by the compactness of Inline graphic for the profile likelihood ℓn(β, λ̂(·, β)), which is continuous in β. The uniqueness of the NPMLE is guaranteed by Assumption 5 for large samples.

The technical details of the consistency proof are similar to those of Murphy (1995) or Parner (1998). We provide only a sketch of the proof. As the MLE (β̂n, Λ̂n) maximizes the log-likelihood function ℓn, the method is to use the empirical Kullback-Leibler distance ℓn(β̂n, Λ̂n) − ℓn(β0, Λ0), which must always be nonnegative. If (β̂n, Λ̂n) converges at all, say, to (β*, Λ*), then ℓn(β̂n, Λ̂n) − ℓn(β0, Λ0) must converge to the negative Kullback-Leibler distance between Pβ*; Λ* and Pβ0, Λ0 by the strong law of large numbers, where Pβ; Λ is the probability measure under the parameter (β, Λ). The Kullback-Leibler distance between Pβ*; Λ* and Pβ0, Λ0 therefore must be zero, and we conclude that Pβ*; Λ* = Pβ0, Λ0 almost surely. It then follows by model identifiability that β* = β0 and Λ* = Λ0.

We need to find, for any subsequence of (β̂n, Λ̂n), a further convergent subsequence. The first step is to show that (β̂n, Λ̂n) stays bounded. As β̂n is in a compact set, it must stay bounded. Because (β̂n, Λ̂n) maximizes the likelihood function, ℓn(β̂n, Λ̂n) − ℓn(β̄, Λ̄) ≥ 0 for each (β̄, Λ̄) in the parameter set. Recall that τ̂ = tk. We show that Λ̂n(τ̂) stays bounded, lim¯nΛ^n<. We use the method of contradiction. Suppose that Λ̂n(τ̂) diverges. Then we can construct some sequence {β̄n, Λ̄n} such that the empirical Kullback-Leibler distance ℓn(β̂n, Λ̂n) − ℓn(β̄n, Λ̄n) would become negative infinity. This is a contradiction as the Kullback-Leibler distance is always nonnegative. The construction of the contradiction is along the lines as in Murphy (1994). Briefly, we choose β̂n = β0 and define Λ̄n to be

Λ¯n(t)=0t{i=1nWi(u,β0,Λ0)}1d{j=1nNj(u)}, (26)

where Wi(u,β,Λ)=eβZi(Yi(u)uτS(vZi)dv/0τS(vZi)dv), and note that

EWi(u,β,Λ)E[eβZi{uEC(uC)+}S(uZ)/μ(Z)]0.

It can be shown easily that Λ̄n converges to Λ0 almost surely and uniformly in t. By a technical argument similar to that of Murphy (1995), we can show that ℓn(β̂n, Λ̂n) − ℓn(β̄n, Λ̄n) → −∞ as n → ∞. This is impossible so Λ̂n must stay bounded.

As Λ̂n stays bounded, we can apply Helly’s selection principle to find a convergent subsequence of (β̂nk Λ̂nk) for an arbitrary subsequence from the sequence indexed; by {1, ···, n}. By the strong law of large numbers, such convergent subsequence must converge to (β0, Λ0) using the classical Kullback-Leibler information approach. For any given subsequence {nk}, we can identify a further subsequence of (β̂ nk, Λ̂ nk) that converges to (β0, Λ0). Helly’s selection theorem implies that the entire sequence (β̂n, Λ̂n(t)) must converge to (β0, Λ0(t)) for each t ∈ [0, τ]. By the assumption that Λ0(·) is monotone and continuous, the convergence of Λ̂n(t) ∈ [0, τ] at each t is also uniformly in t. The convergence is also almost surely a true convergence as the proof is carried out for a fixed ω in the underlying probability space Ω, where we use the law of large numbers, except that it is countable many times.

A.5 Asymptotic Normality: Proof of Theorem 2

We prove the asymptotic normality by the Z-theorem for the infinite-dimensional estimating equations (van der Vaart and Wellner, 1996, Theorem 3.3.1, page 310). This approach has been successfully applied by (Murphy, 1995, Theorem 1) and (Parner, 1998, Theorem 2) among many others. The proof requires the confirmation of the three main conditions of the Z-theorem: Fréchet differentiability, weak convergence of nUn(β0,Λ0) and a stochastic approximation of the estimating equations, which we outline below.

Fréchet Derivative and its Invertibility

Let ΛZ(t)=0teβZdΛ(u). We first show that the population estimating equation U0 is Fréchet differentiable and its Fréchet derivative is continuously invertible. We have defined U0(·, β, Λ) = (U10(β, Λ), U20(·, β, Λ)) where

U10(β,Λ)=0τE0{ZdN(t)Y(u)ZdΛZ(u)+Z(uτS(vZ)dv)dΛZ(u)/μβ,Λ(Z)},U20(t,β,Λ)=0tE0{dN(u)Y(u)dΛZ(u)+(uτS(vZ)dv)dΛZ(u)/μβ,Λ(Z)}.

The Fréchet derivative can be calculated from the Gâteaux variations of U0(β, Λ) at (β0, Λ0). That is done to differentiate U0(βη, Λη) with respect to η and evaluated at η = 0, where the submodels are βη = β0 + ηβ and Λη = Λ0 + ηΛ.

The Gâteaux derivative of U20(t, β, Λ) evaluated at (β0, Λ0) is

{σ21(β)+σ22(Λ)},

where σ21(β)/ηU20(t,βη,Λ0)η=0=β{0tK1(1)(u)dΛ0(u)+0τK2(1)(t,u)dΛ0(u)}, and σ22(Λ)/ηU20(t,β0,Λn)η=0=0tK1(0)(u)dΛ(u)+0τK2(0)(t,u)dΛ(u). The Gâteaux derivative of U10(t, β, Λ) evaluated at (β 0, Λ0) is

{σ11(β)+σ12(Λ)},

where σ11(β)/ηU10(βη,Λ0)β=β0={0τK1(2)(u)dΛ0(u)+0τK2(2)(τ,u)dΛ0(u)}β, and σ12(Λ)/ηU10(β0,Λη)η=0=0τK1(1)(u)dΛ(u)+0τK2(1)(τ,u)dΛ(u).

To obtain the results on weak convergence, we need to strengthen the Ĝateaux differentiability to Fréchet differentiability, essentially for the proof of tightness (van der Vaart and Wellner, 1996, page 310). The Fréchet differentiability of U0(β, Λ) can be confirmed by definition; its derivative has the form in (19). Note that the operator ψ0 is a linear continuous operator defined on parameter space in the product space of ℝp and the Banach space L2[0, τ]. If the inverse operator U.ψ01 exists, then it must be continuous by Banach’s continuous inverse theorem (Zeidler, 1995, page 179). Hence, to prove the continuous invertibility of ψ0, we only need to show the existence of the inverse operator U.ψ01.

To show the existence of the inverse operator U.ψ01, we only need to show by the formula in (20) that σ11 and Φ=σ22σ21σ111σ12 have inverse. The operator σ11(β)=J0β is a linear operator, where the matrix J0 defined in (18) is the Fisher information for β for known Λ0. By Assumption 5, the matrix J0 has an inverse, and hence σ11 is invertible. The operator Φ has the following form:

Φ(Λ)=[0tK1(0)(u)dΛ(u)+0τ{K2(0)(t,u)(0tK1(1)(v)dΛ0(v)+0τK2(1)(t,v)dΛ0(v))J01·(K1(1)(u)+K2(1)(τ,u))}dΛ(u)].

The invertibility of Φ is equivalent to show that there exits a unique solution to the equation Φ(Λ) = Λ̃ for a function Λ̃ of bounded variation. Taking the derivative with respect to t on both sides of the equation,

dΛ(t)=K1(0)(t)dΛ(t)0τQ(t,u)dΛ(u), (27)

where Q(t, u) is defined previously in (23). We observe that the integral equation (27) is the Fredholm equation of the second type. By Assumptions 1 and 3, the bivariate function Q(t, u) defined in (23) is continuous on [0, τ] × [0, τ]. Also by assumption 3, the function 1/K1(0)(t) is continuous and bounded away from 0 for t > 0. By the classical theory for integral equation (Tricomi, 1985, Chapter 2), there is a unique solution dΛ (t) to the Fredholm integral equation (27), characterized by

dΛ(t)=dΛ(t)K1(0)(t)+0τH(t,u)K1(0)(t)dΛ(u),

where H(t, u) satisfies the equation (22). Finally, the invertibility of the functional Φ follows and its inverse operator Φ−1(Λ) has a form expressed in (21).

Weak Convergence of nUn(β0,Λ0)

As the true value (β0, Λ0) satisfies U0(β0, Λ0) = 0, we have

nUn(β0,Λ0)=n{U1n(ψ0)U10(ψ0),U2n(t,ψ0)U20(t,ψ0)}.

By the multivariate central limit theorem for the summation of independently and identically distributed (i.i.d.) random vectors, n{U1n(ψ0)U10(ψ0)}converges in law to Inline graphic, provided that the second moment is finite. The process n{U2n(t,ψ0)U20(t,ψ0)} is a sum of i.i.d. processes of bounded variation on [0, τ]. By a lemma for the central limit theorem for processes of bounded variation (van der Vaart and Wellner, 1996, Example 2.11.16), n{U2n(t,ψ0)U20(t,ψ0)} converges to a tight Gaussian process, say Inline graphic, provided that the second moment is finite. The weak convergence of nUn(β0,Λ0) follows by the continuous mapping theorem.

Stochastic Approximation

To apply the Z-theorem for the infinite dimensional estimating equations (van der Vaart and Wellner, 1996, Theorem 3.3.1), we need to confirm the following stochastic approximation:

n{(UnU0)(ψ^n)(UnU0)(ψ0)}n[{Un(·,ψ^n)U0(·,ψ^n)}{Un(·,ψ0)U0(·,ψ0)}]=oP(1).

The function is defined on Inline graphic × Inline graphic, where by Assumption 2, the set Inline graphic is compact that contains β0, and Inline graphic is a set of nondecreasing function such that for each Λ ∈ Inline graphic, Λ (0) = 0 and Λ (τ) < ∞. Hence Inline graphic contains Λ0. To apply the Z-theorem, we need Λ in the closed linear subspace Inline graphic generated by the set Inline graphic. The subspace Inline graphic is viewed in the space of functions of bounded variation on [0, τ] endowed with the variation norm by || · ||v, defined by the total variation of Λ on [0, τ], i.e.,

||Λ||vsupk=1mΛ(sk)Λ(sk1),

where the supremum is taken over all finite partitions of [0, τ], {0 = s0 < s1 < ··· < sm = τ}.

We first derive the score functions for β and Λ (·) for a single observation Inline graphic. We keep Inline graphic in the nations for the score functions to emphasize its dependence on the data. By straightforward calculation, the score function for β is

.1(O,β,Λ)=0τZdN(u)0τY(u)ZeβZdΛ(u)+0τZ(uτS(vZ)dv)eβZdΛ(u)/μβ,Λ(Z). (28)

For the infinite dimensional parameter Λ (·), consider a submodel defined by dΛ η = (1 + ηh)dΛ, where h is a bounded and integrable function. By taking the derivative of ℓn(β, Λη) with respect to η and evaluating it at η = 0, we have the score operator for Λ

.2(O,β,Λ)(h)=0τh(v)dN(v)0τY(v)eβZh(v)dΛ(v)+0τ{uτS(vZ)dv}h(u)eβZdΛ(u)/μβ,Λ(Z). (29)

Taking h(·) = 1(· ≤ t) in (29), we have the score function for Λ

.2(t;O,β,Λ)=0tdN(v)0tY(v)eβZh(v)dΛ(v)+0t{utS(vZ)dv}eβZdΛ(u)/μβ,Λ(Z). (30)

We can write .1(O,β,Λ)=0τZd.2(t;O,β,Λ).

Let ℓ̇(t, Inline graphic, ψ) = {ℓ̇1 ( Inline graphic, ψ), ℓ̇2(t, Inline graphic, ψ)}, where ψ = (β, Λ). We first introduce some notations from the empirical processes theory. Let ℙn be the empirical probability measure. Then

Un(β,Λ)=Pn.(·,O,ψ)=n1i=1n.(·,Oi,ψ).

Denote the empirical process by Gnf=n(PnfP0f), where P0 is the expectation under ψ0. Note that n(UnU0)(ψ)=Gn.(·,O,ψ) is the empirical process indexed by the class of functions

{.(t;O,ψ),ψB×,A¯,t[0,τ]}.

Let the norm ||·||Inline graphic on Inline graphic = Inline graphic × Inline graphic be defined as ||(β, Λ)||Inline graphic = |β| + ||Λ||v. Then the stochastic condition that we want to confirm is

||Gn.(t,O,ψ^n)Gn.(t,O,ψ0)||H=oP(1).

To apply the functional central limit theory, we show that the class of functions {ℓ̇(t, Inline graphic, ψ) − ℓ̇(t, Inline graphic, ψ0): ||ψ||Inline graphic <, δ, t ∈ [0, τ]} is P0-Donsker. As .1(O,ψ)=0τZd.2(t,O,ψ), we only need to show that {ℓ̇2(t, Inline graphic, ψ) − ℓ̇2(t, Inline graphic, ψ0): ||ψ||Inline graphic < δ, t ∈ [0, τ]} is P0-Donsker, where

.2(t,O,ψ).2(t,O,ψ0)=0tY(u)eβ0ZdΛ0(u)0tY(u)eβZdΛ(u)+0t{uτS(vZ)dv}eβZdΛ(u)/μZ(ψ)0t(uτS0(vZ)dv)eβ0ZdΛ0(u)/μZ(ψ0).

First, the class of functions exp(βZ) for β in the compact set Inline graphic is P0-Donsker as it is of finite dimension. The class of functions of bounded variation on [0, τ] is P0-Donsker (van der Vaart, 1998, page 273). As the function Λ (t) is of bounded variation on [0, τ], the set of functions { 0tY(u)eβZdΛ, t ∈ [0, τ], β, Λ} is P0-Donsker. Similarly, we can show that the sets of functions { 0t[uτS(vZ)dv]eβZdΛ(u), t ∈ [0, τ], βInline graphic, Λ ∈ Inline graphic} and {μZ(β, Λ), βInline graphic, Λ ∈ Inline graphic} are P0-Donsker. Note that their envelope functions are (τu)eβZΛ (τ) and τ < ∞, respectively. By Assumption 4, (τu)2Λ2(τ)E0e2βZ < ∞ for all β and Λ ∈ Inline graphic. By the permanent property of Donsker classes, we proceed to confirm the Donsker property by using the fact that the sums, production and Lipschitz transformations of simple P0-Donsker classes are still P0-Donsker.

Furthermore, ||ψψ0||Inline graphic → 0, ℓ̇2(t, Inline graphic, ψ) converges to ℓ̇2(t, Inline graphic, ψ0) for each t. The convergence also holds in the square moment by the dominated convergence theorem. It follows that

supt[0,τ]E0||.(t,O,ψ).(t,O,ψ0)||H20.

The stochastic approximation of n(UnU0)(β^n,Λ^n) to n(UnU0)(β0,Λ0) now follows by a technical lemma of van der Vaart and Wellner (1996, Lemma 3.3.5, p. 311).

Footnotes

B Supplemental Materials

Jumps for NPMLE of baseline cumulative hazard function for length-biased data: A detail description on why the NPMLE has jumps at both censored and uncensored times for length-biased data.

Contributor Information

Jing Qin, Email: jingqin@niaid.nih.gov, Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases, NIH Bethesda, Maryland 20892, USA, Phone: 301-451-2436.

Jing Ning, Division of Biostatistics, The University of Texas, Health Science Center at Houston, School of Public Health, Houston, Texas 77030, USA.

Hao Liu, Division of Biostatistics, Dan L. Duncan Cancer Center, Baylor College of Medicine, Houston, Texas 77030, USA.

Yu Shen, Department of Biostatistics, The University of Texas M. D. Anderson Cancer Center, Houston, Texas 77030, USA.

References

  1. Andersen PK, Borgan O, Gill RD, Keiding N. Statistical Models Based on Counting Processes. Springer; New York: 1992. [Google Scholar]
  2. Asgharian M, M’Lan CE, Wolfson DB. Length-biased Sampling with Right Censoring: an Unconditional Approach. Journal of the American Statistical Association. 2002;97:201–209. [Google Scholar]
  3. Asgharian M, Wolfson DB. Asymptotic Behavior of the Unconditional NPMLE of the Length-biased Survivor Function From Right Censored Prevalent Cohort Data. The Annals of Statistics. 2005;33:2109–2131. [Google Scholar]
  4. Asgharian M, Wolfson DB, Zhang X. Checking Stationarity of the Incidence Rate Using Prevalent Cohort Survival Data. Statistics in Medicine. 2006;25:1751–1767. doi: 10.1002/sim.2326. [DOI] [PubMed] [Google Scholar]
  5. Barlow RE, Proschan F. Statistical Theory of Reliability. New York: Holt, Rinehart & Winston; 1975. [Google Scholar]
  6. Bergeron P-J, Asgharian M, Wolfson DB. Covariate Bias Induced by Length-Biased Sampling of Failure Times. Journal of the American Statistical Association. 2008;103:737–742. [Google Scholar]
  7. Bickel PJ, Ritov Y. Efficient Estimation Using both Direct and Indirect Observations. Theory of Probability and its Applications. 1994;38:194–213. [Google Scholar]
  8. Breslow N. Contribution to the Discussion of the Paper by D. R. Cox. Journal of the Royal Statistical Society B. 1972;34:187–220. [Google Scholar]
  9. Chen HY, Little RJA. Proportional Hazards Regression with Missing Covariates. Journal of the American Statistical Association. 1999;94:896–908. [Google Scholar]
  10. Cox DR. Regression Models and Life Tables (with Discussion) Journal of the Royal Statistical Society, Series B. 1972;34:187–220. [Google Scholar]
  11. Cox DR. Partial Likelihood. Biometrika. 1975;62:269–276. [Google Scholar]
  12. Cox DR, Miller HD. The Theory of Stochastic Processes. London: New York; 1977. [Google Scholar]
  13. De Uña Álvarez J, Otero-Giraldez MS, Alvarez-Llorente G. Estimation Under Length-bias and Right-censoring: An Application to Unemployment Duration Analysis for Married Women. Journal of Applied Statistics. 2003;30:283–291. [Google Scholar]
  14. Dewanji A, Kalbfleisch JD. Estimation of Sojourn Time Distributions for Cyclic sSemi-markov Processes in Equilibrium. Biometrika. 1987;74:281–288. [Google Scholar]
  15. Elbers C, Ridder G. True and Spurious Duration Dependence: the Identifiability of the Proportional Hazard Model. The Review of Economic Studies. 1982;49:403–409. [Google Scholar]
  16. Ferguson TS. A Course in Large Sample Theory. 1 London: Chapman & Hall; 1996. [Google Scholar]
  17. Gail MH, Benichou J. Encyclopedia of Epidemiologic Methods. Wiley; 2000. [Google Scholar]
  18. Gordis L. Epidemiology. Philadelphia, PA: W. B. Saunders Company; 2000. [Google Scholar]
  19. Huang J, Wellner JA. Estimation of a Monotone Density or Monotone Hazard under Random Censoring. Scandinavian Journal of Statistics. 1995;22:3–33. [Google Scholar]
  20. Kalbfleisch JD, Lawless JF. Regression Models for Right Truncated Data with Applications to AIDS Incubation Times and Reporting Lags. Statistica Sinica. 1991;1:19–32. [Google Scholar]
  21. Kalbfleisch JD, Prentice RL. Marginal Likelihoods Based on Cox’s Regression and Life Model. Biometrika. 1973;60:267–278. [Google Scholar]
  22. Keiding N. Age-specific Incidence and Prevalence: a Statistical Perspective (with Discussion) Journal of the Royal Statistical Society, Series A. 1991;154:371–412. [Google Scholar]
  23. Klein J. Semiparametric Estimation of Random Effects Using the Cox Model Based on the EM Algorithm. Biometrics. 1992;48:795–806. [PubMed] [Google Scholar]
  24. Kvam P. Length Bias in the Measurements of Carbon Nanotubes. Technometrics. 2008;50:462–467. [Google Scholar]
  25. Lancaster T. The Econometric Analysis of Transition Data. Cambridge: University Press; 1990. [Google Scholar]
  26. Marshall AW, Proschan F. Maximum Likelihood Estimation for Distributions with Monotone Failure Rate. The Annals of Mathematical Statistics. 1965;36:69–77. [Google Scholar]
  27. McClean S, Devine C. A Nonparametric Maximum Likelihood Estimator for Incomplete Renewal Data. Biometrika. 1995;82:791–803. [Google Scholar]
  28. Meng X-L, Rubin DB. Maximum Likelihood Estimation via the ECM Algorithm: a General Framework. Biometrika. 1993;80:267–278. [Google Scholar]
  29. Murphy SA. Consistency in a Proportional Hazards Model Incorporating a Random Effect. The Annals of Statistics. 1994;22:712–731. [Google Scholar]
  30. Murphy SA. Asymptotic theory for the frailty model. The Annals of Statistics. 1995;23:182–198. [Google Scholar]
  31. Murphy SA, van der Vaart AW. Observed Information in Semi-parametric Models. Bernoulli. 1999;5:381–412. [Google Scholar]
  32. Murphy SA, van der Vaart AW. On Profile Likelihood (with Discussion) Journal of the American Statistical Association. 2000;95:449–485. [Google Scholar]
  33. Nielsen GG, Gill RD, Andersen PK, Sorensen TIA. A Counting Process Approach to Maximum Likelihood Estimation in Frailty Models. Scandinavian Journal of Statistics. 1992;19:25–43. [Google Scholar]
  34. Padgett WJ, Wei LJ. Maximum Likelihood Estimation of a Distribution Function with Increasing Failure Rate Based on Censored Observations. Biometrika. 1980;67:470–474. [Google Scholar]
  35. Parner E. Asymptotic Theory for the Correlated Gamma-frailty Model. The Annals of Statistics. 1998;26:183–214. [Google Scholar]
  36. Qin J, Shen Y. Statistical Methods for Analyzing Right-censored Length-biased Data under Cox Model. Biometrics. 2010;66:382–392. doi: 10.1111/j.1541-0420.2009.01287.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Rothenberg TJ. Identification in Parametric Models. Econometrica. 1971;39:577–591. [Google Scholar]
  38. Sansgiry P, Akman O. Transformations of the Lognormal Distribution as a Selection Model. The American Statistician. 2000;54:307–309. [Google Scholar]
  39. Scheike TH, Keiding N. Design and Analysis of Time-to-pregnancy. Statistical Methods in Medical Research. 2006;15:127–140. doi: 10.1191/0962280206sm435oa. [DOI] [PubMed] [Google Scholar]
  40. Simon R. Length-biased Sampling in Etiologic Studies. American Journal of Epidemiology. 1980;111:444–452. doi: 10.1093/oxfordjournals.aje.a112920. [DOI] [PubMed] [Google Scholar]
  41. Terwilliger J, Shannon W, Lathrop G, Nolan J, Goldin L, Chase G, Weeks D. True and False Positive Peaks in Genomewide Scans: Applications of Length-biased Sampling to Linkage Mapping. American Journal of Human Genetics. 1997;61:430–438. doi: 10.1086/514855. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Tricomi FG. Integral Equations. New York: Dover Publications; 1985. [Google Scholar]
  43. Tsai WY. Estimation of the Survival Function with Increasing Failure Rate Based on Left Truncated and Right Censored Data. Biometrika. 1988;75:319–324. [Google Scholar]
  44. Tsai WY. Pseudo-partial Likelihood for Proportional Hazards Models with Biased-sampling Data. Biometrika. 2009;96:601–615. doi: 10.1093/biomet/asp026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Turnbull BW. The Empirical Distribution Function with Arbitrarily Grouped, Censored and Truncated Data. Journal of the Royal Statistical Society Series B Methodological. 1976;38:290–295. [Google Scholar]
  46. van der Vaart A. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge, UK: Cambridge University Press; 1998. Asymptotic Statistics. [Google Scholar]
  47. van der Vaart AW, Wellner JA. Existence and Consistency of Maximum Likelihood in Upgrade Mixture Models. Journal of Multivariate Analysis. 1992;43:133–146. [Google Scholar]
  48. van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. New York: Springer-Verlag; 1996. with applications to statistics. [Google Scholar]
  49. Vardi Y. Nonparametric Estimation in the Presence of Length Bias. The Annals of Statistics. 1982;10:616–620. [Google Scholar]
  50. Vardi Y. Multiplicative Censoring, Renewal Processes, Deconvolution and Decreasing Density: Nonparametric Estimation. Biometrika. 1989;76:751–761. [Google Scholar]
  51. Vardi Y, Zhang CH. Large Sample Study of Empirical Distributions in a Random-Multiplicative Censoring Model. The Annals of Statistics. 1992;20:1022–1039. [Google Scholar]
  52. Wang MC. Nonparametric Estimation From Cross-Sectional Survival Data. Journal of the American Statistical Association. 1991;86:130–143. doi: 10.1080/01621459.1999.10473831. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Wang MC. Hazards Regression Analysis for Length-biased Data. Biometrika. 1996;83:343–354. [Google Scholar]
  54. Wang MC, Brookmeyer R, Jewell NP. Statistical Models for Prevalent Cohort Data. Biometrics. 1993;49:1–11. [PubMed] [Google Scholar]
  55. Wolfson C, Wolfson DB, Asgharian M, M’Lan CE, Ostbye T, Rockwood K, Hogan DB the Clinical Progression of Dementia Study Group. A Reevaluation of the Duration of Survival after the Onset of Dementia. The New England Journal of Medicine. 2001;344:1111–1116. doi: 10.1056/NEJM200104123441501. [DOI] [PubMed] [Google Scholar]
  56. Zeidler E. Applied Mathematical Sciences. Vol. 109. New York: Springer-Verlag; 1995. Applied Functional Analysis: Main Principles and Their Applications. [Google Scholar]
  57. Zelen M. Forward and Backward Recurrence Times and Length Biased Sampling: Age Specific Models. Lifetime Data Analysis. 2004;10:325–334. doi: 10.1007/s10985-004-4770-1. [DOI] [PubMed] [Google Scholar]
  58. Zelen M, Feinleib M. On the Theory of Screening for Chronic Diseases. Biometrika. 1969;56:601–614. [Google Scholar]
  59. Zeng D, Lin DY. Maximum Likelihood Estimation in Semiparametric Regression Models with Censored Data. Journal of the Royal Statistical Society, Series B. 2007;69:507–564. [Google Scholar]
  60. Zeng D, Lin DY, Yin G. Maximum Likelihood Estimation for the Proportional Odds Model with Random Effects. Journal of the American Statistical Association. 2005;100:470–483. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

websupplementary

RESOURCES