Pseudo-partial likelihood for proportional hazards models with biased-sampling data

WEI YANN TSAI

doi:10.1093/biomet/asp026

. 2009 Jun 24;96(3):601–615. doi: 10.1093/biomet/asp026

Pseudo-partial likelihood for proportional hazards models with biased-sampling data

WEI YANN TSAI ¹

PMCID: PMC3304552 PMID: 22422175

Abstract

We obtain a pseudo-partial likelihood for proportional hazards models with biased-sampling data by embedding the biased-sampling data into left-truncated data. The log pseudo-partial likelihood of the biased-sampling data is the expectation of the log partial likelihood of the left-truncated data conditioned on the observed data. In addition, asymptotic properties of the estimator that maximize the pseudo-partial likelihood are derived. Applications to length-biased data, biased samples with right censoring and proportional hazards models with missing covariates are discussed.

Some key words: em Algorithm, Left truncation, Length-biased data, Missing covariate, Right censoring

1. Introduction

The partial likelihood function of Cox (1975) has been mainly used for proportional hazards models with censored data (Cox 1972). For more complicated incomplete data, no unified method exists to find a partial likelihood for inference on the parameters of the proportional hazards models. Dempster et al. (1977) developed the em algorithm to obtain maximum likelihood estimators for incomplete data. Originally used for fully parametric models, the em algorithm was subsequently extended successfully to many nonparametric problems. In survival analysis, there is substantial literature generalizing the em algorithm to frailty models (Andersen et al. 1993, § 9), missing covariates (Paik Tsai 1997; Qi et al. 2005) and interval-censored data (Betensky et al. 1999).

In semiparametric models, one usually obtains, through a conditioning argument, an objective function with finitely many parameters of interest to which the em algorithm can be readily applied. The present paper gives an analogous pseudo-partial likelihood for proportional hazards models with biased-sampling data that can be used without intensive computation.

Under the proportional hazards models for biased-sampling data, the conditional probability density function of an observed nonnegative random variable T, given covariates z(t) and x, can be expressed as

(1)

where W(t,x) is a completely known nonnegative weight function, z(t) = { z₁(t), …, z_p(t)}^T is a p-dimensional time-dependent covariate, x = (x₁, …, x_q)^T is a q-dimensional time-independent covariate, f(t ∣ z) denotes a population conditional density function given z(s) for s ⩽ t and α (x, z) is a normalization constant making h(· ∣ x, z) a genuine probability density function. Furthermore, we will assume a proportional hazards model with Inline graphic where is the conditional survival function.

Biased-sampling data arise naturally in complex surveys. For example, in large-scale population-based surveys with multi-stage sampling, the complex design results in a set of probability weights for each subject. The weight function W(T_i, x_i) represents the probability that the ith observation (T_i, x_i, z_i) was sampled from the population. Binder (1992) and Lin (2000) have proposed and studied a method for estimating the parameters of proportional hazards models from such survey data. For W(t, x) = t, the data are referred to as length-biased data. Wang (1996) proposed statistical inference for length-biased data based on Cox's model. For W(t, x) = I(x ⩽ t), density (1) becomes a conditional probability density for left-truncated data. This problem has been extensively studied in the literature. Wang et al. (1986) used a classical approach to study the properties of the nonparametric maximum likelihood estimator. Keiding Gill (1990) used counting process techniques to study the properties of the same estimator.

The following four real datasets illustrate different types of biased-sampling data.

Example 1

Shrub data. Muttlak McDonald (1990) presented widths of 46 shrubs. Wang (1996) assumed that the probability of observing a shrub is proportional to the shrub's width, so that the sampling is length-biased. Wang (1996) analyzed the data with a proportional hazards model.

Example 2

Channing House data. Channing House is a retirement centre in Palo Alto, California. Hyde (1980) reported ages at entry and at death of 462 retirees, 365 females and 97 males, who were in residence between January 1964 and July 1975. The individuals who left Channing House or were still in the centre at the end of the study were censored. The data can be viewed as left-truncated with right censoring since the individual's death age must be greater than the entry age. The entry age serves as the left-truncation time.

Example 3

Stanford heart transplant data. Crowley Hu (1977) gave information on 103 potential heart transplant recipients who were enrolled in the Stanford heart transplant programme from October 1967 to April 1974. The data include age, waiting time to transplantation, survival or censoring time from acceptance to the programme, and three mismatch scores. Among the 103 potential heart transplant recipients, there were 69 patients who underwent the heart transplant operation. Later, Miller Halpern (1982) updated the data by reporting the survival or censoring times and ages of 184 patients who were enrolled in the same programme and had received heart transplants from October 1967 to February 1980. If we are only interested in analyzing the transplant patients, then the data of Crowley & Hu are left-truncated and right-censored data, with transplant waiting time as a random left-truncation variable. However, because Miller & Halpern did not report the transplant waiting times, their data can be viewed as biased-sampling data with right censoring. The weight function is the distribution of the transplant waiting time random variable.

Example 4

Mouse leukaemia data. Kalbfleisch Prentice (2002) reported the survival of 204 mice. The mice were followed up for two years for mortality due to thymic or nonthymic leukaemia. The two covariates of interest were the GPD1 phenotype and the level of endogenous murine leukaemia virus. There were 175 mice whose levels of endogenous murine leukaemia virus were recorded. The GPD1 phenotype was determined only on a subgroup of the 100 mice that survived 400 days; thus, the probability of missing covariates clearly depends on the follow-up time. The complete-case analysis, which uses only the mice with complete information, is clearly biased since the selection probability depends on the survival time outcome variable. In fact, the complete cases comprise biased-sampling data with the weight function equal to the probability of selecting complete cases.

2. Partial likelihood

2.1. General approach

Let χ be a sample space, and let x ∈ χ be a realization of the random vector X with density f_X(x;ϕ) depending on a vector parameter ϕ = (β, η), in which β is of interest and η is a nuisance parameter. In some applications, the dimension of η may increase with the sample size and the application of maximum likelihood estimation may lead to spurious results. However, suppose that x = (c₁, x₁, …, c_n, x_n) and that the full likelihood factorizes into

(2)

where d_i = (c₁, x₁, …, c_i−1, x_i−1) and e_i = (c₁, x₁, …, c_i−1, x_i−1, c_i). The second product on the right-hand side of (2) is the partial likelihood of β based on (x₁, …, x_n) in the sequence (c_i, x_i)(i = 1, …, n). Cox (1975) argued that inference based only on the partial likelihood would be acceptable if the information about β, contained in the first factor, was small.

A complication that sometimes occurs is that one observes a function Inline graphic , instead of observing x ∈ χ. Therefore, inferences about β must be based on y. The following algorithm is a simple generalization of the em algorithm for partial likelihood. The em algorithm applied to gamma frailty models discussed in Andersen et al. (1993, § 9) is a special case of this generalization.

For incomplete data, the em algorithm finds maximum likelihood estimates for β and η through the iterative maximization of

where β^(c) and η^(c) denote the current estimates. Therefore,

are, respectively, a log pseudo-partial likelihood function and a pseudo-partial score function of β for the observed data y, where U_i(β) = ∂ log f_β(x_i ∣ e_i)/∂ β is the score function of the partial likelihood for complete data. Unfortunately, U_p still involves the nuisance parameter η in many applications. For example, in frailty models U_p is also a function of the frailty parameters and, therefore, cannot be used directly. However, U_p is a function of β alone in some applications. In other situations, the maximization of l_p(β, η, y) with respect to β and η has a simple solution. In § 3, we show two pseudo-partial likelihood functions for Cox's models with biased-sampling data in these two situations.

2.2. Partial likelihood for left-truncated and right-censored data

Let the lifetime T⁰_i have distribution function F_i, and let the truncation time and censoring time (V_i, C_i) have joint distribution function G_i and joint probability density function g_i. It is assumed that T⁰_i and (V_i, C_i) are mutually independent. Moreover, we assume that there is a positive probability that T⁰_i ⩾ V_i and C_i ⩾ V_i. We do not sample from the joint distribution, but from the conditional distribution given the event { T⁰ ⩾ V, C ⩾ V}. Let (V_i, T⁰_i, C_i) (i = 1, …, n) be a sample of n independent triples from this conditional distribution. Then, our left-truncated and right-censored sample is (V₁, T₁, D₁), …, (V_n, T_n, D_n), where T_i = min (T⁰_i, C_i), D_i = I(T_i = T⁰_i) and I(·) is an indicator function. Note that T_i ⩾ V_i(i = 1, …, n). We let N_i(t) = I(T_i ⩽ t, D_i = 1) and Y^v_i (t) = I(V_i ⩽ t)Y_i(t) be, respectively, the indicator of whether or not the ith individual failed before time t and the indicator of whether or not the ith individual is at risk just before time t, where Y_i(t) = I(T_i ⩾ t). Furthermore, for given time-dependent covariates z_i(t) = { z_1i(t), z_2i(t), …, z_pi(t)}^T, we assume that T⁰_i follows a proportional hazards model, i.e.

where λ (t ∣ z_i) is the conditional hazard function of T⁰_i given z_i(t), β is a p × 1 vector of unknown regression coefficients and λ₀(t) is the underlying or baseline hazard function. For convenience of notation, if it is unambiguous, the dependence of z_i on t will be suppressed. As shown by Andersen et al. (1993, §§ 3.3 and 3.4), (N₁, …, N_n) is a multivariate counting process that has an intensity process (λ (t ∣ z₁)Y^v₁(t), …, λ (t ∣ z_n)Y^v_n (t)) with respect to the filtration Inline graphic which is defined in Andersen et al. (1993, p. 153). Let be the cumulative underlying hazard function. The log partial likelihood, which is the conditional likelihood given V, can be written as

(3)

see equations (3.3.3) and (7.2.2) of Andersen et al. (1993), where

and Δ H(t) = H(t+) − H(t−) for any function H(t). The partial derivative of (3) with respect to Δ Λ₀(t) is {Δ N(t)/Δ Λ₀(t)} − nS⁽⁰⁾ (β, t), where Inline graphic is the number of observed failures up to time t. Therefore, for a fixed value of β, we would estimate λ₀(t) by the Nelson–Aalen estimator

(4)

where Inline graphic . Inserting (4) into (3), we obtain the log profile partial likelihood . Here,

(5)

is the generalized log partial likelihood, originally derived by Cox (1972, 1975) for the case of censored survival data. Thus, the score function of the generalized Cox partial likelihood is Inline graphic where S⁽¹⁾ (β, t) = ∂ S⁽⁰⁾ (β, t)/∂ β.

We will treat equations l_c1 and l_c2 as our working log partial likelihood for the complete data. Two log pseudo-partial likelihoods for biased-sampling data will be derived, respectively, based on l_c1 and l_c2 in the next section. The notation N_i(t), N(t) and Y_i(t) will be used throughout with obvious adjustments for data with no censoring and/or no truncation. A parameter with subscript zero will denote the true parameter.

3. Data from biased sampling

3.1. Embedding the data into the left-truncation model

First we assume that W(·, x) is a distribution function for every fixed x; this assumption will be relaxed later. Let V and T⁰ be nonnegative random variables with conditional distribution functions pr (V < t ∣ x) = W(t, x) and pr { T⁰ < t ∣ z(·)} = 1 − S(t ∣ z), respectively. We assume that, conditional on x and z, V and T⁰ are independent. We observe (V, x, T⁰, z) only if T⁰ ⩾ V. Therefore, the conditional density of observing (V, T⁰) given (x, z) is proportional to I(t ⩾ v)w(v, x)f(t ∣ z), where w(t, x) = ∂ W(t, x)/∂ t. Hence, the marginal density of observed T, given z and x, is proportional to Inline graphic , which is proportional to the conditional probability density function given in (1). Consequently, we may treat (V, x, T⁰, z) as the complete data vector and (x, T⁰, z) as the incomplete data with the truncation time V completely missing.

3.2. Pseudo-partial likelihoods

According to the argument in § 2.1 and from the Cox log partial likelihood l_c2 of equation (5) in § 2.2, the following function, which is the conditional expectation of l_c2 given the observed data, can be considered as a log pseudo-partial likelihood for the observed data (x_i, T⁰_i, z_i) (i = 1, …, n):

(6)

If we condition on T⁰ = t, X = x, then the random variable V has conditional distribution W{min (v, t), x}/ W(t, x). Therefore, the second term of (6) is a function of (T⁰₁, x₁, z₁, …, T⁰_n, x_n, z_n) and β, which does not involve the nuisance parameter λ₀(t). We may, therefore, use the Monte Carlo method to compute it. For example, let V_jk (j = 1, …, n, k = 1, …, m) be random samples from the distribution W{min (v, T⁰_j), x_j}/ W(T⁰_j, x_j). Then the second term of (6) can be approximated by

(7)

We substitute (7) into the second term of (6) and obtain an approximate loglikelihood

(8)

In particular, for length-biased data, i.e. W(t, x) = t, the maximum approximate loglikelihood estimator based on (8) is asymptotically equivalent to the improved estimator proposed by Wang (1996). In addition, for m = 1, the approximate loglikelihood Inline graphic is identical to the log-pseudolikelihood described by Wang (1996).

The disadvantages of using l_c2 as the working log partial likelihood for complete data are as follows: we must assume that W(t,x) is a nondecreasing function in t for every fixed x; the underlying cumulative hazard function must be estimated by other methods; it requires intensive computation to obtain the log pseudo-partial likelihood Inline graphic ; and the loglikelihood l_c2 contains less information about the parameters than the loglikelihood l_c1. Therefore, we may take l_c1 as our working log partial likelihood and apply the same procedure to (3). The resulting log pseudo-partial likelihood can be written as

(9)

where

For a fixed value of β, maximization of (9) with respect to Δ Λ₀(t) leads to Inline graphic . Therefore, for a fixed value of β, we would estimate λ₀(t) by the Nelson–Aalen estimator

(10)

where Inline graphic . Inserting (10) into (8), we obtain the following log profile pseudo-partial likelihood depending only on β:

(11)

which is also a generalized Cox log partial likelihood. The vector of score statistics is given as

(12)

where Inline graphic . We shall base our estimator of β on (12), and the value of β which maximizes (11) will be denoted by . As pr { W(T⁰_i, x_i) = 0} = 0, W(t, x_i)/ W(T⁰_i, x_i) is well defined.

The log pseudo-partial likelihood (11) is identical to the usual log partial likelihood of the models λ (t ∣ z_i, z^*_i) = λ₀(t) exp {β^T z_i + z^*_i (t)}, where z^*_i (t) = log { W(t, x_i)/ W(T⁰_i, x_i)}. In particular, when W(t, x) = W(t), z^*_i (t) can be simplified to − log { W(T⁰_i)}. Standard statistical software, such as sas, can be used to obtain the estimate Inline graphic of β by setting the regression coefficient of z^*_i to be 1, using the option offset = z^* in procedure phreg. The robust sandwich covariance matrix estimator from sas is consistent whereas the model-based covariance matrix is inconsistent because the score (12) is not a martingale. For computing estimates of parameters and variances for general weight functions, readers can use the author's R subroutine downloadable from www.columbia.edu/∼wt5/.

3.3. Censoring

We may assume that the data are subject not only to biased sampling, but also to right censoring. Let V_i, T⁰_i and C_i be, respectively, truncation, survival time and censoring time. Recall that when we define the left-truncated and right-censored data, we assume that (V_i, C_i) and T⁰_i are mutually independent. Hence, the joint probability density function of (T⁰_i, V_i, C_i) can be expressed as f_i(t)g_i(v, c), where f_i is the probability density function of T⁰_i and g_i is the joint probability density function of (V_i, C_i). We only identify two types of censoring mechanism based on different censoring and truncation mechanisms and assumptions about g_i. However, there are possible applications to other types of censoring.

The first type of censoring assumes that the given covariates (x_i, z_i), original truncated time, survival time and censoring time are mutually independent. However, we observe the data (V_i, T_i, D_i) only if V_i ⩽ T_i, where T_i = min (T⁰_i, C_i) and D_i = I(T⁰_i ⩽ C_i). The biased censored data (T_i, D_i) are obtained by applying the censoring mechanism to the data before the data are sampled with bias. In the embedding, we first apply the censoring mechanism to the survival time and then apply the truncation mechanism to the observed censored data. This type of censoring is equivalent to assuming that

where g_2i(t) is the probability density function of the censoring time C_i. We may also say that V_i and C_i are quasi-independent in the region {(v, c) ∣ v ⩽ c}. The observed data (V_i, T_i, D_i) comprise a special case of the standard left-truncated and right-censored data as defined in § 2.2. Hence, in the first type of censoring, the conditional probability density function of the observed data (V_i, T_i, D_i), given (x_i, z_i) = (x, z), is

where Inline graphic is the survival function of the censoring time. Therefore, . The procedures proposed in §§ 3.2 and 3.3 are still valid because E{ I(V_i ⩽ t)Y_i(t) ∣ T_i, D_i, x_i, z_i} still equals Y_i(t)W(t, x_i)/ W(T_i, x_i). The formulae of the log pseudo-partial likelihood will be the same for this type of censoring with T⁰_i replaced by T_i and N_i(t) defined by N_i(t) = I(T_i ⩽ t, D_i = 1).

The second type of censoring comprises the censoring of residual lifetime after the data are sampled with bias. Let R⁰_i = T⁰_i − V_i and R_ci = C_i − V_i be, respectively, the residual lifetime and the residual censoring time of the ith individual. Given covariates (x_i, z_i) and C_i ⩾ V_i, we assume that R_ci and (R⁰_i, V_i) are independent. We observe (V_i, T_i, D_i) only if V_i ⩽ T_i, where T_i = V_i + R_i, R_i = min (R⁰_i, R_ci) and D_i = I(R⁰_i ⩽ R_ci). The censoring time C_i and truncation time V_i are not independent in this type of censoring. Let g_3i(t) be the probability density function of the residual censoring time R_ci. The second type of censoring is equivalent to assuming that

where g_3i(t) is the probability density function of the residual censoring time R_ci. The conditional density function of the observed data (V_i, T_i, D_i) given (x_i, z_i) = (x, z) is

where Inline graphic is the survival function of the residual censoring time. Hence,

Unfortunately, under the second type of censoring, the conditional expectation E{ I(V_i ⩽ t)Y_i(t) ∣ data} is a function of the censoring distribution. If the truncation time V is observable, then we may use a Kaplan–Meier-type estimator of Inline graphic . Consider the one-sample problem such that β = 0 and w(v, x) = w(v). The nonparametric maximum likelihood estimator of is the Kaplan–Meier estimator based on the censored residual lifetime (R_i, D_i); that is

Hence, the maximum pseudo-partial likelihood estimator of S is

where

The estimator Inline graphic is not the nonparametric maximum conditional likelihood estimator proposed and studied by Tsai et al. (1987), nor is it the nonparametric maximum likelihood estimator. However, if there is no censoring, is the nonparametric maximum likelihood estimator. It is straightforward to prove that Inline graphic will converge to and will converge to . As a result, under some regularity conditions, will converge to the survival function S(t) of the survival time.

If the censoring time depends on the covariates, we need to use a smooth type of Kaplan–Meier estimator of Inline graphic . If the truncation time V_i cannot be observed, we may use the nonparametric maximum likelihood estimator of . More research is needed in order to understand the properties of the proposed method.

The Stanford heart transplant dataset of Miller Halpern (1982) is a prospective cohort study. The event time of interest is the survival time after entry. The censoring time is the duration between calendar entry date of the patients and February 1980. If we assume that there is no loss of follow-up, then C = February 1980 − E; and V = transplant calendar date − E = transplant waiting time, where C, V and E, respectively, are the censoring time, truncation time and calendar entry date of the patient. If V, the transplant waiting time, is independent of E, the calendar entry date, then C and V are independent for all patients in the cohort: the transplant waiting times of the patients who died or were censored before transplantation cannot be observed. Therefore, the censoring time C is quasi-independent of the truncation time V for the transplant patients in the Stanford heart transplant data. Consequently, the censoring is of the first type. The Channing House dataset is a retrospective cohort study. The event time of interest is the death age. The censoring time C for patients who did not leave the centre before July 1975 is the time between the birth date B and July 1975. The truncation time, i.e. entry age, V is E−B, where E is the calendar entry date. The residual censoring time R_c is C − V = July 1975 − E. If entry age is independent of calendar entry date, i.e. V and E are independent, then the residual censoring time R_c is independent of the truncation time V. This censoring is of the second type.

Most applications can be classified as either the first or second type of censoring. For example, the censoring mechanism of the proportional hazards model with missing covariates, see § 4.3, and the Stanford heart transplant data are of the first type; the censoring mechanisms of the renewal process (Vardi 1989), cross-sectional survival data (Wang 1991) and the Channing House data are of the second type.

The asymptotic properties of the maximum pseudo-partial likelihood estimators Inline graphic and for censored biased-sampling data of general nonnegative weight functions are established and discussed in the Appendix. We assume that the censoring is of the first type for the rest of the paper.

4. Application

4.1. Length-biased data

The techniques developed in the previous sections can be applied to length-biased data by using W(t) = t. We illustrate the method with a simulation study and an analysis of the shrub dataset. Consider a two-sample proportional hazards model with covariate z = 0 representing Group 0 and z = 1 representing Group 1. We generate 100 length-biased samples from Group 0 with population density f₀(t) = t exp (− t)I(t > 0) and 100 length-biased samples from Group 1 with population density f₁(t) = t exp (β − e^β t)I(t > 0) for β = 0, 1, 2 in three different scenarios. The relative hazard between Group 0 and Group 1 is equal to e^β and, thus, the log hazard ratio is β = 0, 1, 2. We also calculate the estimator Inline graphic obtained by maximizing the approximate likelihood , which was also proposed by Wang (1996). The variance estimator proposed by Wang (1996) is identical to the estimator from sas. We use equation (A1) in Theorem A2 provided in the Appendix to obtain the variance estimator for the estimator Inline graphic . For comparison we also include the inverse probability weighted estimator of Binder (1992) and Lin (2000); see also Horvitz Thompson (1952) and Qi et al. (2005). For the definition of the inverse probability weighted estimator, see equation (13) in § 4.3. Table 1, based on 1000 replicates, which shows that Inline graphic is more efficient than and and that the variance estimator of underestimates the true sample variance of .

Table 1.

Monte Carlo simulation for length-biased data. One hundred length-biased observations were generated from Group 0 with population density f₀(t) = t exp (− t)I(t > 0) and 100 length-biased observations were generated from Group 1 with population density f₁(t) = t exp (β − e^β t)I(t > 0) for β = 0, 1, 2. Here Inline graphic maximizes the loglikelihood l(β), maximizes the loglikelihood and was proposed by Binder (1992) and Lin (2000). Estimates are based on 1000 replications

	Bias			Sample variance			Mean of estimated variance
β
0	0.002	0.003	−0.001	0.010	0.020	0.049	0.010	0.020	0.036
1	0.009	0.012	0.014	0.018	0.030	0.070	0.017	0.030	0.050
2	0.032	0.033	0.060	0.058	0.080	0.139	0.053	0.075	0.103

Open in a new tab

We denote the width of an observed shrub from the shrub dataset by T_i. Wang (1996) assumed that the probability of including T_i in the dataset is proportional to T_i itself. We use the proportional hazards model,

where z₁ = I(T belongs to transect I) and z₂ = I(T belongs to transect II) are two indicator covariates. Use of sas with offset − log (T_i) provides Inline graphic with model-based standard error estimates {se, se and corr(. The estimates are very similar to the results of Wang (1996), but the model-based standard errors overestimate the true standard errors of . Use of equation (A1) in the Appendix, which is identical to the robust sandwich covariance estimate from sas, gives {se Inline graphic , se and corr(. For comparison with estimated standard errors (0.33, 0.31).

4.2. Biased samples with right censoring

Miller Halpern (1982) compared four regression techniques on the updated Stanford heart transplant data without acknowledging that the transplant patient's survival time was sampled with bias. As mentioned in § 1, the survival times can be treated as a biased sample with a weight function equal to the distribution of the transplant waiting time. Miller Halpern (1982) did not provide the transplant waiting times but Crowley Hu (1977) did. The Weibull distribution fits the transplant waiting times very well. The R² of the fit of log[−log Inline graphic to log(t) is 0.97, where is the product-limit estimate of the transplant waiting time survival function based on Crowley & Hu's (1977) 103 patients. The conditional maximum likelihood estimate for the Weibull survival function of transplant waiting time is exp(−0.027t^0.925). Hence, the weight function, W(t) = 1 − exp (−0.027t^0.925), will be used to obtain the parameters of the proportional hazards model. We assume that the hazard rate of the transplant patient's survival is proportional to exp Inline graphic . Miller Halpern (1982) deleted 27 patients lacking the T5 mismatch score and 5 patients with survival times less than 10 days from the total of 184 patients in one of their data analyses. Based on 152 Stanford heart transplant patients, the pseudo-partial likelihood estimate is (−0.13, 0.0021) with {se Inline graphic , se and = (−0.17, 0.0026) with se. In calculating the variances of the estimates, we assume that the weight function is known without error. Both methods show a strong relationship between survival time and age.

4.3. Missing covariates

It is assumed that, for each i, (T_i, D_i, z_i, R_i) are independent and identically distributed random vectors, where R_i = 1 if z_i is fully observed and is zero otherwise. Let W(t, z) = pr (R = 1| T = t, Z = z) be the conditional probability of observing the full covariates data given the covariates z and follow-up time T = t. We assume that W(t,z) is either completely known or can be estimated from other methods; see Qi et al. (2005) and an unpublished Harvard School of Public Health technical report by M. Pugh, J. Robins, S. Lipsitz and D. Harrington. Copas Farewell (2001), Qi et al. (2005) and Pugh et al.'s report proposed the following weighted complete-case pseudolikelihood score function for inference of the proportional hazards models with missing covariates:

(13)

where Inline graphic and W_i = W{ T_i, z_i(T_i)}. The inverse probability weighted estimator is the solution of U_IPW(β) = 0. This pseudo-score U_IPW was also proposed and studied by Binder (1992) and Lin (2000) in the survey-sampling literature. We may treat the complete case, with R_i = 1, as a biased sample from the population with selection probability proportional to W(t,z). The conditional density of (T, D) given the covariates z and R = 1 is proportional to W(t, z)f^D (t ∣ z)S^{(1− D)} (t ∣ z). Since the censoring was applied to the data before the cases with missing covariates were dropped, the censoring is of the first type considered in § 3.3. The pseudo-partial score function becomes Inline graphic where and

Let Inline graphic denote the solution of U_mc(β) = 0. If W(t, z) = W(t), then . Hence, is a weighted mean of U_mc1, …, U_mcn, with weight proportional to 1/ W_i, while U_mc is the unweighted mean. When some of the W_i are close to zero, the equation U_IPW = 0 becomes unstable. Therefore, in order to prove the asymptotic normality of the estimator Inline graphic , Pugh et al.'s report and Qi et al. (2005) had to assume that W_i > ε > 0 for some positive ε. We need weaker assumption to prove the asymptotic properties of the estimator ; see the Appendix.

We now analyze the mouse leukaemia data of Kalbfleisch Prentice (2002) by the pseudo-partial likelihood method and the inverse probability weighted method. As in Kalbfleisch Prentice (2002), we dichotomized virus level into a binary variable with zero representing values below 10⁴ and one otherwise. There are two analyses, corresponding to the endpoints death by thymic leukaemia and death by thymic or nonthymic leukaemia. Logistic regression was used to estimate the observation probabilities, i.e. weight function, based on 156 mice that survived for at least 400 days; the observation probabilities were set to zero for the remaining 48 mice that died or were censored before 400 days. In order to make the analysis comparable with that of Wang Chen (2001), we included only the survival time and its quadratic term as two predictors in our logistic regression model. Both analyses used 204 mice and treated both virus level and GPD1 phenotype as the missing covariates. Table 2 shows the results from applying the pseudo-partial likelihood method and the inverse probability weighted method. The pseudo-partial likelihood method shows that the virus level has a significant relationship with both endpoints, while the GPD1 has a significant relationship with death of thymic or nonthymic leukaemia and GPD1 has a moderately significant relationship with thymic leukaemia death. The inverse probability weighted method shows that GPD1 has a significant relationship with both endpoints and virus level has a moderately significant relationship with both endpoints. The inclusion of death by nonthymic leukaemia changes the estimates for GPD1 phenotype slightly, but moderately reduces the estimates for virus level. A similar phenomenon was also found by Qi et al. (2005). As in the Stanford heart transplant data analysis, the weight function is treated as known. If the weight function were treated as unknown, then the variance estimator in Theorem A2 would overestimate the true sample variance of Inline graphic .

Table 2.

Analysis of mouse leukaemia data using the Cox regression model with various methods. Here Inline graphic is the pseudo-partial-likelihood estimator and is the inverse probability weighted estimator

	Thymic leukaemia		Thymic and nonthymic leukemia
	Coefficient estimate (se)		Coefficient estimate (se)
Approach	GPD1	Virus	GPD1	Virus
Complete-case	−1.44(0.60)	1.44(0.72)	−1.46(0.57)	1.22(0.65)
	−1.15(0.64)	1.51(0.71)	−1.19(0.59)	1.28(0.62)
	−1.47(0.65)	1.48(0.75)	−1.41(0.63)	1.27(0.67)

Open in a new tab

se: estimated standard error.

If W(.) is a function of D, then Inline graphic is still a consistent estimator. Generally, however, under the same conditions, is not consistent, since E{ U_mc(β₀)} ≠ 0. For generalization to the Cox model with missing covariates and a detailed simulation comparison of and , see Luo et al. (2009).

5. Discussion

The procedures developed in this paper can be easily extended to other types of incomplete data. The basis of our method is the conditional expectation of the partial score function given the data. For the proportional hazards model, if N(t) is completely known, the conditional expectation involves two terms; see equation (3). One is the conditional expectation of z, while the other is the conditional expectation of S⁽⁰⁾ (β, t). If z is completely known, we only have to consider the conditional expectation of S⁽⁰⁾. In other types of incomplete data, we may also have to compute the conditional expectation of z or Δ N(t) or both. For example, in the proportional hazards models with missing covariates or with covariates with measurement errors, the covariates z are partially missing. Paik Tsai (1997) applied a similar idea and proposed two estimators of β for the proportional hazards models with missing covariates.

The estimators Inline graphic and proposed in § 3, are not nonparametric maximum likelihood estimators and, therefore, we generally do not expect these estimators to be optimal. However, as a special case, when W(t, x) = W(t) and β₀ = 0, the estimator is the nonparametric maximum likelihood estimator and is the most efficient estimator. Since Inline graphic and maximize the pseudo-partial likelihood, we expect the efficiency of these two estimators to be quite high. Empirical evidence from our limited simulation and real-data experiments suggests that is more efficient than .

Another special case is given by W(t, x) = t^x,f(t ∣ z) = f₀(t) and p = pr (x = 0) = 1 − pr (x = 1). The nonparametric maximum likelihood estimator Inline graphic of was studied by Vardi (1982). We computed the asymptotic relative efficiency of with respect to when F₀ is a uniform distribution on [0,1], for t and p ∈ {0.2, 0.4, 0.6, 0.8}. When p = 0 or 1, the product limit estimator based on is identical to the nonparametric maximum likelihood estimator, so that the asymptotic relative efficiency equals 1 for p = 0 and p = 1. The lowest asymptotic relative efficiency we obtained was 0.985. Since the estimator Inline graphic is much easier to calculate than the nonparametric maximum likelihood estimator, is a preferred estimator.

Acknowledgments

The author wishes to thank an associate editor and reviewers for suggestions leading to an overall improvement. He also appreciates Dr. Bruce Levin's comments and inputs. The paper was partially completed during the author's visit to the Department of Statistics, National Cheng Kung University, Taiwan.

Appendix

Large sample properties

In deriving equation (12), we explicitly assume that the weight function W(t, x) is a distribution function for any given x. However, after the likelihood (12) is obtained, we only need a weaker assumption, that the weight be nonnegative, to prove the asymptotic properties. Here, we assume that the weight function W(t,x) satisfies Assumption 1.

Assumption 1

For every fixed x, there exists a constant a(x) such that { t ∣ W(t, x) > 0} = (a(x), ∞) or [a(x), ∞).

Assumption 1 does not require that W(·, x) be a nondecreasing function for any fixed x. The following four theorems describe the asymptotic properties of the estimators and .

Theorem A1

If Assumption 1 holds and the matrix I(β) is positive definite, converges to β₀ in probability as n → ∞, where and .

Proof

If Assumption 1 holds, we have

where . Then converges in probability to

Hence, U(β₀) → 0 in probability as n → ∞ and, therefore, Theorem A1 holds by a standard argument.

For asymptotic normality, we need to introduce more notation. Define

where , and F₁(t) = pr { T_i < t, I(D_i = 1)}. Furthermore, we define

Note that is obtained by substituting s⁽⁰⁾, s⁽¹⁾, F₁(t) and β₀ by and , respectively, in ξ_i.

Theorem A2

Under the same conditions as in Theorem A1, converges weakly to a normal distribution with zero-mean and covariance matrix , where Ξ = E(ξ^⊗2).

Proof

The score function n^−1/2 U(β₀) can be expressed as

By Taylor series expansion, the second term in the above equation can be written as

Hence, . By the multivariate central limit theorem and its corollary, n^−1/2 U(β₀) converges to a multivariate normal distribution, yielding Theorem A2.

Note that I(β₀) can be consistently estimated by Inline graphic and Ξ can be consistently estimated by . Thus, the covariance matrix can be consistently estimated by

(A1)

We now turn to a study of the large sample properties of the estimated baseline integrated hazard function Inline graphic .

Theorem A3

Let M be a large positive number such that pr (T ⩾ M) is strictly positive. Under the same conditions as in Theorem A1, for t < M the process converges weakly to a Gaussian process with zero-mean and covariance function E{ξ_0i(s)ξ_0i(t)}, which can be consistently estimated by , where

Theorem A4

Under the conditions of Theorem A1, the asymptotic covariance of and can be consistently estimated by .

Proofs of Theorem A3 and Theorem A4. We may write Inline graphic as

Here, the last term can be shown to converge to zero in probability. By Theorem A2, Inline graphic , and by Taylor series expansion around β₀, the first term can be approximated by

The second term can be expressed as

Therefore, a simple application of the multivariate central limit theorem implies that the finite-dimensional distribution of { A_n(t), B_n(t), C_n(t)} is a multivariate normal. As in the proof in Tsiatis (1981), the sequence of distributions induced by A_n, B_n and C_n is tight.

References

Andersen P. K., Borgan O., Gill R. D., Keiding N. Statistical Models Based on Counting Processes. New York: Springer; 1993. [Google Scholar]
Betensky R. A., Lindsey J. C., Ryan L. M., Wand M. P. Local EM estimation of the hazard function for interval-censored data. Biometrics. 1999;55:238–45. doi: 10.1111/j.0006-341x.1999.00238.x. [DOI] [PubMed] [Google Scholar]
Binder D. A. Fitting Cox's proportional hazards models from survey data. Biometrika. 1992;79:139–47. [Google Scholar]
Copas A. J., Farewell V. T. Incorporating retrospective data into an analysis of time to illness. Biostatistics. 2001;2:1–12. doi: 10.1093/biostatistics/2.1.1. [DOI] [PubMed] [Google Scholar]
Cox D. R. Regression models and life tables (with Discussion) J. R. Statist. Soc. 1972;B. 34:187–220. [Google Scholar]
Cox D. R. Partial likelihood. Biometrika. 1975;62:269–76. [Google Scholar]
Crowley J., Hu M. Covariance analysis of heart transplant survival data. J. Am. Statist. Assoc. 1977;72:27–36. [Google Scholar]
Dempster A. P., Laird N. M., Rubin D. B. Maximum likelihood estimation from incomplete data via the EM algorithm (with Discussion) J. R. Statist. Soc. 1977;B. 39:1–38. [Google Scholar]
Horvitz D. G., Thompson D. J. A generalization of sampling without replacement from a finite universe. J. Am. Statist. Assoc. 1952;47:663–85. [Google Scholar]
Hyde J. Survival analysis with incomplete observations. In: Miller R. G., Efron B., Brown B. W., Moses L. E., editors. Biostatistics Casebook. New York: John Wiley & Sons; 1980. pp. 31–46. [Google Scholar]
Kalbfleisch J. D., Prentice R. L. The Statistical Analysis of Failure Time Data. 2nd ed. New York: Wiley; 2002. [Google Scholar]
Keiding N., Gill R. D. Random truncation models and Markov processes. Ann. Statist. 1990;18:582–602. [Google Scholar]
Lin D. Y. On fitting Cox's proportional hazards models to survey data. Biometrika. 2000;87:37–47. [Google Scholar]
Luo X., Tsai W.-Y., Xu Q. Pseudo partial likelihood estimators for Cox regression with missing covariates. Biometrika. 2009;96 doi: 10.1093/biomet/asp027. (forthcoming) [DOI] [PMC free article] [PubMed] [Google Scholar]
Miller R., Halpern J. Regression with censored data. Biometrika. 1982;69:521–31. [Google Scholar]
Muttlak H. A., McDonald L. L. Ranked set sampling with size-biased probability of selection. Biometrics. 1990;46:435–46. [Google Scholar]
Paik M. C., Tsai W.-Y. On using Cox proportional hazard models with missing covariate. Biometrika. 1997;84:579–93. [Google Scholar]
Qi L., Wang C. Y., Prentice R. L. Weighted estimators for proportional hazards regression with missing covariates. J. Am. Statist. Assoc. 2005;100:1250–63. [Google Scholar]
Tsai W.-Y., Jewell N. P., Wang M.-C. A note on the product limit estimate of a survival curve under right-censoring and left-truncation. Biometrika. 1987;74:883–6. [Google Scholar]
Tsiatis A. A. A large sample study of Cox's regression model. Ann. Statist. 1981;9:91–108. [Google Scholar]
Vardi Y. Nonparametric estimation in the presence of length bias. Ann. Statist. 1982;10:616–20. [Google Scholar]
Vardi Y. Multiplicative censoring, renewal processes, deconvolution and decreasing density. Biometrika. 1989;76:751–61. [Google Scholar]
Wang C. Y., Chen H. Y. Augmented inverse probability weighted estimator for Cox missing covariate regression. Biometrics. 2001;57:414–9. doi: 10.1111/j.0006-341x.2001.00414.x. [DOI] [PubMed] [Google Scholar]
Wang M.-C. Nonparametric estimation from cross-sectional survival data. J. Am. Statist. Assoc. 1991;86:130–43. [Google Scholar]
Wang M.-C. Hazards regression analysis for length-biased data. Biometrika. 1996;83:343–54. [Google Scholar]
Wang M.-C., Jewell N. P., Tsai W.-Y. Asymptotic properties of the product limit estimate under random truncation. Ann. Statist. 1986;14:1597–605. [Google Scholar]

[R1] Andersen P. K., Borgan O., Gill R. D., Keiding N. Statistical Models Based on Counting Processes. New York: Springer; 1993. [Google Scholar]

[R2] Betensky R. A., Lindsey J. C., Ryan L. M., Wand M. P. Local EM estimation of the hazard function for interval-censored data. Biometrics. 1999;55:238–45. doi: 10.1111/j.0006-341x.1999.00238.x. [DOI] [PubMed] [Google Scholar]

[R3] Binder D. A. Fitting Cox's proportional hazards models from survey data. Biometrika. 1992;79:139–47. [Google Scholar]

[R4] Copas A. J., Farewell V. T. Incorporating retrospective data into an analysis of time to illness. Biostatistics. 2001;2:1–12. doi: 10.1093/biostatistics/2.1.1. [DOI] [PubMed] [Google Scholar]

[R5] Cox D. R. Regression models and life tables (with Discussion) J. R. Statist. Soc. 1972;B. 34:187–220. [Google Scholar]

[R6] Cox D. R. Partial likelihood. Biometrika. 1975;62:269–76. [Google Scholar]

[R7] Crowley J., Hu M. Covariance analysis of heart transplant survival data. J. Am. Statist. Assoc. 1977;72:27–36. [Google Scholar]

[R8] Dempster A. P., Laird N. M., Rubin D. B. Maximum likelihood estimation from incomplete data via the EM algorithm (with Discussion) J. R. Statist. Soc. 1977;B. 39:1–38. [Google Scholar]

[R9] Horvitz D. G., Thompson D. J. A generalization of sampling without replacement from a finite universe. J. Am. Statist. Assoc. 1952;47:663–85. [Google Scholar]

[R10] Hyde J. Survival analysis with incomplete observations. In: Miller R. G., Efron B., Brown B. W., Moses L. E., editors. Biostatistics Casebook. New York: John Wiley & Sons; 1980. pp. 31–46. [Google Scholar]

[R11] Kalbfleisch J. D., Prentice R. L. The Statistical Analysis of Failure Time Data. 2nd ed. New York: Wiley; 2002. [Google Scholar]

[R12] Keiding N., Gill R. D. Random truncation models and Markov processes. Ann. Statist. 1990;18:582–602. [Google Scholar]

[R13] Lin D. Y. On fitting Cox's proportional hazards models to survey data. Biometrika. 2000;87:37–47. [Google Scholar]

[R14] Luo X., Tsai W.-Y., Xu Q. Pseudo partial likelihood estimators for Cox regression with missing covariates. Biometrika. 2009;96 doi: 10.1093/biomet/asp027. (forthcoming) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] Miller R., Halpern J. Regression with censored data. Biometrika. 1982;69:521–31. [Google Scholar]

[R16] Muttlak H. A., McDonald L. L. Ranked set sampling with size-biased probability of selection. Biometrics. 1990;46:435–46. [Google Scholar]

[R17] Paik M. C., Tsai W.-Y. On using Cox proportional hazard models with missing covariate. Biometrika. 1997;84:579–93. [Google Scholar]

[R18] Qi L., Wang C. Y., Prentice R. L. Weighted estimators for proportional hazards regression with missing covariates. J. Am. Statist. Assoc. 2005;100:1250–63. [Google Scholar]

[R19] Tsai W.-Y., Jewell N. P., Wang M.-C. A note on the product limit estimate of a survival curve under right-censoring and left-truncation. Biometrika. 1987;74:883–6. [Google Scholar]

[R20] Tsiatis A. A. A large sample study of Cox's regression model. Ann. Statist. 1981;9:91–108. [Google Scholar]

[R21] Vardi Y. Nonparametric estimation in the presence of length bias. Ann. Statist. 1982;10:616–20. [Google Scholar]

[R22] Vardi Y. Multiplicative censoring, renewal processes, deconvolution and decreasing density. Biometrika. 1989;76:751–61. [Google Scholar]

[R23] Wang C. Y., Chen H. Y. Augmented inverse probability weighted estimator for Cox missing covariate regression. Biometrics. 2001;57:414–9. doi: 10.1111/j.0006-341x.2001.00414.x. [DOI] [PubMed] [Google Scholar]

[R24] Wang M.-C. Nonparametric estimation from cross-sectional survival data. J. Am. Statist. Assoc. 1991;86:130–43. [Google Scholar]

[R25] Wang M.-C. Hazards regression analysis for length-biased data. Biometrika. 1996;83:343–54. [Google Scholar]

[R26] Wang M.-C., Jewell N. P., Tsai W.-Y. Asymptotic properties of the product limit estimate under random truncation. Ann. Statist. 1986;14:1597–605. [Google Scholar]

PERMALINK

Pseudo-partial likelihood for proportional hazards models with biased-sampling data

WEI YANN TSAI

Abstract

1. Introduction

Example 1

Example 2

Example 3

Example 4

2. Partial likelihood

2.1. General approach

2.2. Partial likelihood for left-truncated and right-censored data

3. Data from biased sampling

3.1. Embedding the data into the left-truncation model

3.2. Pseudo-partial likelihoods

3.3. Censoring

4. Application

4.1. Length-biased data

Table 1.

4.2. Biased samples with right censoring

4.3. Missing covariates

Table 2.

5. Discussion

Acknowledgments

Appendix

Large sample properties

Assumption 1

Theorem A1

Proof

Theorem A2

Proof

Theorem A3

Theorem A4

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Pseudo-partial likelihood for proportional hazards models with biased-sampling data

WEI YANN TSAI

Abstract

1. Introduction

Example 1

Example 2

Example 3

Example 4

2. Partial likelihood

2.1. General approach

2.2. Partial likelihood for left-truncated and right-censored data

3. Data from biased sampling

3.1. Embedding the data into the left-truncation model

3.2. Pseudo-partial likelihoods

3.3. Censoring

4. Application

4.1. Length-biased data

Table 1.

4.2. Biased samples with right censoring

4.3. Missing covariates

Table 2.

5. Discussion

Acknowledgments

Appendix

Large sample properties

Assumption 1

Theorem A1

Proof

Theorem A2

Proof

Theorem A3

Theorem A4

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases