Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Aug 22.
Published in final edited form as: Stat Methods Med Res. 2013 Jul 30;25(4):1718–1735. doi: 10.1177/0962280213498325

A flexible semiparametric modeling approach for doubly censored data with an application to prostate cancer

Seungbong Han 1, Adin-Cristian Andrei 2, Kam-Wah Tsui 3
PMCID: PMC8380435  NIHMSID: NIHMS1731412  PMID: 23907782

Abstract

Doubly censored data often arise in medical studies of disease progression involving two related events for which both an originating and a terminating event are interval-censored. Although regression modeling for such doubly censored data may be complicated, we propose a simple semiparametric regression modeling strategy based on jackknife pseudo-observations obtained using nonparametric estimators of the survival function. Inference is carried out via generalized estimating equations. Simulations studies show that the proposed method produces virtually unbiased covariate effect estimates, even for moderate sample sizes. A prostate cancer study example illustrates the practical advantages of the proposed approach.

Keywords: doubly censored data, pseudo-observations, regression, semiparametric, survival analysis

1. Introduction

Doubly censored data arise in numerous medical studies involving two related events, such as an originating and a terminating one. For instance, in HIV/AIDS clinical trials, main interest is often on the latency time, the period elapsed between the human immunodeficiency virus infection X and disease onset S1,2. Due to limited resources, neither X nor S can be observed at the precise time of their occurrence, but are assessed via periodic patient screening. Therefore, both the disease infection and the onset times are recorded in interval-censored format as (L, R] and (P, Q], respectively. Our motivating example is a randomized phase II trial3 designed to evaluate, docetaxel with or without doxercalciferol in 69 patients with metastatic, androgen-independent prostate cancer. An endpoint, time-to-cancer progression X was assessed by periodic patient screening. Thus, X occurs between the most recent progression-free visit L and the first visit R when progression was established. Also a secondary point, overall survival S is subject to right-censoring, thus belonging to an interval (P, Q = ∞). However, our methodology does not require that Q be infinite. Although separate analyses of time-to-cancer progression X and overall survival S are important and informative, a remaining critical task is to identify factors impacting survivorship after cancer progression T = SX. Scientific questions of this nature are frequent in many other medical areas, such as osteosarcoma or breast cancer. For additional examples and details, see Bacci et al.4 and others.5-8 In the medical oncology literature, landmark analyses are sometimes employed to analyze T, despite their limitations.9,10 Since in our example, both the originating event X and the terminating event S are known up to an interval, analyses of the doubly censored T = SX require specialized statistical methodology. For estimating survivorship after disease progression, nonparametric estimators of univariate survival functions have been considered.11-13 On the other hand, regression methods for doubly censored data have been developed under the proportional hazards models (PHM).14,15 Cox regression with the onset time interval-censored and the event time right-censored has been considered by Goggins et al.16. Martingale-based methods under the additive hazards model (AHM) have been developed as well.17,18 On the other hand, a Monte Carlo imputation approach has been proposed,19 in addition to introducing a goodness-of-fit test under the PHM. Moreover, Zhang et al.20 also proposed another imputation method for doubly censored data under the PH model. Assuming that the terminating event S is right-censored, Sun et al.15 and Zhang et al.21 have developed new modeling strategies. In the Bayesian analysis framework, Komarek et al.22,23 considered frailty versions of the PH model and accelerated failure time (AFT) models, respectively. In the latter case, they considered a penalize Gaussian mixture with a number of mixture components. An R package of bayesSurv is available for implementing this model.24 However, these methods are restrictive in the sense that the focus was on conditional interpretation of the effects of the covariates via frailty models and relying on the AFT or PH assumptions. Jara et al.25 also considered the Bayesian nonparametric approach that can be used for the analysis of doubly censored data and their method is included in the R package DPpackage.26 Recently, Yu27 proposed another Bayesian modeling approach assuming the terminating endpoint is right-censored, which has quite restrictive use. Furthermore, one may need to specify hyper-parameter values for prior distributions in the Bayesian regression methods and their interpretation of the results should be conducted cautiously. Many of these existing methods are not used routinely, perhaps due to a lack of software availability unfortunately, or an ambiguity of pre-chosen hyper-parameters as in the Bayesian models. For additional details of these and other methods, one may consult Sun.1 We propose a simple, computationally efficient approach based on the jackknife pseudo-observations (POs). Rapidly becoming a convenient modeling tool in survival analysis, POs have been used successfully in right-censored data problems.28-33 A comprehensive review of existing methods is provided by Andersen and Perme.34 Recently, Han et al.35 have extended the POs use to interval-censored data.

We first present an introduction to POs, followed by the current methodological developments in Section 2. Simulation results are shown in Section 3 and the analysis of the motivating example constitutes Section 4. Further remarks and related discussions are part of Section 5.

2. Regression analysis of doubly censored failure times

We briefly describe the generic definition of the pseudo-observations and then extend their use to models for doubly censored survival outcomes.

2.1. Pseudo-observations: an introduction

In general, assume that the data consist of {(Ti, Zi), i = 1, … , n}, i.i.d. random vectors and that one would like to estimate θi(Zi) = E{f(Ti)∣Zi}, for a deterministic function f. Here, Ti is an outcome of interest, such as SiXi in our example, while Zi represents a p–dimensional covariate vector for the ith individual. Unbiased (or approximately unbiased) nonparametric estimators of θ = E{f(Ti)} are, in general, mixtures of conditional quantities of interest. This is true because E{θi(Zi)} = E[E{f(Ti)∣Zi}] = E{f(Ti)} = θ. The average over the empirical distribution for Z yields

θ~=1ni=1nθi(Zi)

and

θ~i=1n1jinθj(Zj),

both representing unbiased estimators of θ. The definition of the ith pseudo-observation ηi=nθ~(n1)θ~i=θi(Zi), i = 1, …, n, suggests a way of constructing regression models for θi(Zi). One may start out with (approximately) unbiased estimators θ~ and θ~i, i = 1, …, n, of θ and define the pseudo-observations ηi. Then, one regresses ηi, instead of θi(Z), on the covariate Zi.

2.2. Pseudo-observations with doubly censored data

For doubly censored data, one observes {(Li, Ri], (Pi, Qi], Zi; i = 1, … , n}, where (Li, Ri] and (Pi, Qi] are the intervals corresponding to the initiating and terminating events Xi and Si, respectively. Independence is assumed across the individuals in the sample. The censoring intervals are also assumed to be non-informative in nature. Suppose that S^(t) is a consistent estimator of the marginal survival function S(t) = P(Ti > t) based on {(Li, Ri], (Pi, Qi], i = 1, … , n}. Similarly, S^i(t) denotes the leave- ith-observation-out version based on {(Lj, Rj], (Pj, Qj], ji} where i = 1, …, n.

In order to construct regression models for Ti = SiXi, define S(tZi) = P(Ti > tZi) to be the conditional survival function of Ti given Zi, at a fixed time t > 0. If A(t) is a deterministic function of t, then a transformation of S(tZi) takes the form

g{S(tZi)}=A(t)+βTZi, (1)

where β is the p-dimensional vector of covariate effects and g(·) is a smooth continuous link function. For instance, the link function g(t) = log(−log(t)) leads to the proportional hazards model, while g(t)=log(t1t) yields the proportional odds model. In this context, the ith pseudo-observation νi,t is defined as

vi,t=ng(S^(t))(n1)g(S^i(t)), (2)

where t > 0 and 0<min{S^(t),S^i(t);i=1,,n}<1

This definition is slightly different from that of Andersen et al.28 in that it incorporates the link function g(·) for PO construction. This way prevents the occurrence of out-of-range probability estimates. We subsequently compare the newly-defined PO approach with the original PO approach proposed by Andersen et al.28 As pointed out by Tukey36 and Andersen et al.,29 the POs are nearly independent, thus their use in the context presented above is justified. In summary, instead of regressing g{S(tZi)} on Zi, one regresses νi,t on Zi, regarding νi,t as a valid substitute for g{S(tZi)}. Thus, the proposed method requires an estimator of the marginal survival function S(t) of T and ways to accomplish POs and covariate effect estimate β are presented in the next sections.

2.3. Pseudo-observations at multiple time points

To fit model (1), one may compute POs at J(≥ 1) multiple time points t1 < t2 < ⋯ < tJ, then parameter estimates efficiency could be markedly improved rather than a single time point t. Here, t1>inf{t:min{S^(t),S^i(t);i=1,,n}>0} and tJ<sup{t:max{S^(t),S^i(t);i=1,,n}<1}. Let νi = (vi,t1, νi,t2, … , νi,tJ) be the vector thus obtained. Define γ = (A(t1), … , A(tJ), β)T and μi = (A(t1) + βTZi, … , A(tJ) + βTZi). Let Ui(γ) be (μiγ)TMi1(viμi). Given the inter-correlations among νis entries, consistent β^ estimates could be obtained using generalized estimating equations.37 Estimates γ^ for γ are obtained based on the following generalized estimating equations: If Ui(γ)=ZiMi1{viμi} and Mi is the working covariance matrix for νi, then the score equation for γ is

U(γ)=i=1nUi(γ)=0. (3)

Under the regularity conditions, n(γ^γ) is asymptotically normal with mean zero and covariance matrix can be consistently estimated by the following sandwich matrix. If

I(γ^)=i=1nZiMi1ZiT, (4)
var^{U(γ^)}=i=1nUi(γ^)Ui(γ^)T, (5)

then

Var^(γ^)={I(γ^)1}var^{U(γ^)}{I(γ^)1}. (6)

Alternatively, one may use jackknife variance estimators such as the one-step or the approximate procedures.38 If the POs are computed for a single time-point t, the working covariance Mi is a scalar equal to 1. See Andersen and Klein,39 Scheike and Zhang40 and Han et al.35 for further examples of generalized estimating equations usage in PO-based modeling. For GEE implementation in practice, one may use the R function geese from the package geepack41 or the function gee from the package gee.42 Otherwise, one can use the PROC GENMOD procedure in SAS.

2.4. A conditional likelihood method for estimating S(t)

To obtain the POs, a consistent estimator of the survival function S(t) is required. Several methods to obtain a nonparametric maximum likelihood estimator (NPMLE) of S(t) have been proposed in the literature. For instance, use of the self-consistency equation was suggested.2 However, this estimator may lead to a saddle point or local maximum, as well as solution nonuniqueness issues. Therefore, Gómez and Lagakos43 introduced a two-step algorithm to replace doubly censored data by two separate sets of interval-censored data. Although avoiding previously mentioned issues, this method might incur a loss in efficiency. Along the same line, Sun44 proposed an efficient, conditional likelihood (CL) approach, which can also accommodate truncation. To obtain the POs, we have used this latter approach for obtaining an NPMLE of S(t). For completeness, a description on the CL method is given. Recall that Ti = SiXi denotes the survival time of interest and let u1 < ⋯ < ur be the possible mass points of the Xis and v1 < ⋯ < vs be those of the Tis. Also let wj = Pr(Xi = uj), j = 1, … , r and fk = Pr(Ti = vk), k = 1, … , s and note that j=1rwj=1 and k=1sfk=1.

Define the corresponding censoring indicators αjki=I(Li<ujRi,Pi<uj+vkQi) and γjki=I(Li<ujRi), with j = 1, … , r and k = 1, … , s. The conditional likelihood function Lc(w, f), given Xi ∈ (Li, Ri], takes the form

Lc(w,f)=i=1nj=1rk=1sαjkiwjfkj=1rk=1sγjkiwjfk. (7)

Define ϕij = I(Li < ujRi), αik = I(PiRi < vkQiLi),

ϕij={k:Piuj<vkQiujfk,ifuj(Li,Ri]1,otherwise} (8)

and

αik={l:Pi<ul+vkiwl,ifvk(PiRi,QiLi]1,otherwise,} (9)

where j = 1, … , r, k = 1, … , s and i = 1, … , n. Lc(w, f) can be rewritten as44

Lc(w,f)=i=1nj=1rϕijϕijwjj=1rϕijwj (10)
=i=1nk=1sαikαikfk (11)

To estimate w and f, one can maximize Lc(w, f) using a two-step self-consistency algorithm. To this end, let

μik(f,w)=αikαikfkj=1sαijαijfj,μij(w,f)=ϕijϕijwjk=1rϕikϕikwk,

and

γij(w,f)=(1ϕij)wjk=1rϕikwk.

Then, w and f could be estimated by iterating the following equations until convergence,

wj(l)=i=1n[μij(w(l1),f(l1))+γij(w(l1),f(l1))]M(w(l1),f(l1)), (12)
fk(l)=i=1nμik(f(l1),w(l1))M(f(l1),w(l1)), (13)

where

M(w,f)=i=1nj=1rμij(w,f)+γij(w,f)andM(f,w)=i=1nk=1sμik(f,w).

In the following simulation section, we compare the proposed method with an existing approach and investigate the impact of incorporating the link function for the PO construction (denoted by PO-CL: vi) by comparing with the original PO approach (denoted by PO-CL: ηi).

3. Simulation studies

To see the performance of the proposed method, we apply the PO-CL method to simulated data generated from proportional hazards, proportional odds and a class of accelerated failure time models. Samples of size n = 50 and n = 100 are generated under each scenario, which is replicated 1,000 times. Suppose that Xi and Si are the originating and the terminating events for subject i, respectively and recall that the survival time of interest is Ti = SiXi. Mimicking the example, for each individual i = 1, … , n, we create a censoring interval (Li, Ri] and (Pi, Qi], allowing that for some Qi = ∞. For every i, a number of 10 potential inter-visit times xi1, xi2, xi3, … , xi10 are drawn from a uniform distribution, Unif(1, 2). Then, potential visit times are defined as Hi,0 = 0, Hi,1 = xi1, Hi,2=k=12xik,,Hi,10=k=110xik. Independently of the visit-generating process, Xi and Si are generated for individual i. Details on how this is accomplished in the three classes of models described earlier, are given in the subsequent paragraphs. If Xi is included in an interval (Hi,l, Hi,l+1] for a certain value of l = 0, 1, … , 9, the corresponding censoring interval (Li, Ri] for Xi is defined as (Hi,l, Hi,l+1]. Similarly, if Si is included in an interval (Hi,l′, Hi,l′+1] for a certain value of l′ = 0, 1, … , 9, the corresponding censoring interval (Pi, Qi] for Si is defined as (Hi,l′, Hi,l′+1]. Because we chose out 10 potential visit times to make every Xi and Si less than or equal to Hi,10, the constructed intervals (Li, Ri] and (Pi, Qi] are all finite. In the IBCSG Trial VI example, approximately 10% of the cases are subject to right-censoring. Among all terminating event censoring intervals (Pi, Qi] thus generated, a proportion equal to the desired right-censoring percentage is randomly selected and Qi is set to ∞. Consequently, in all our simulations except the proportional odds model, we have considered a 10% right-censoring proportion for the terminating event Si.

In all scenarios, POs are obtained at 1, 5, 10 or 25 equally-spaced time points between the 20th and the 80th percentiles of the ordered observed unique elements of the set {0, vk : k = 1, … , s}. For example, the median time was considered in the case of J = 1 under the PH model. We assumed a first-order autoregressive (AR1) working correlation structure for the GEE fitting and explored other structures as well. The Bayesian methods referred in Section 1 conduct the inference for the disease elapsed time by modeling both the originating event time and the terminating event time explicitly. The methods proposed by Komarek24 or Jara et al.25 currently could not handle the right-censoring for the terminating events (Qi = ∞), whereas the method of Yu27 only works on the exact observations or the right-censoring for terminating events. Across the simulation scenarios, we compared the proposed PO-CL method with a Bayesian approach using the LDPDdoublyint routine from the R package DPpackage. 10,000 MCMC runs with default options were devised for the chain convergence. We removed observations whose terminating event times are right-censored from the analysis set.

3.1. Proportional hazards models

In the proportional hazards model scenario, X and T are generated from different proportional hazard models. The underlying model is

log[log{S(tZ)}]=log{0th0(u)du}+βTZ, (14)

where h0(·) is the baseline hazard function.

For a constant baseline hazard equal to 1, the hazard function for X has the form h(xZi) = exp(−Zi), where Zi is drawn from the normal distribution N(–2.5, 0.5). On the other hand, T is generated using two-dimensional covariates Zi = (Zi1, Zi2), meaning that Ts hazard function takes the form h{tZi = (Zi1, Zi2)} = exp(β1Zi1 + β2Zi2). Here, Zi1 and Zi2 are independently drawn from a Bernoulli(0.5) distribution. Coefficients β1 and β2 are fixed as 0.5 and 0.25. If Xi and Si are included in the same visit time intervals, we may observe Ri > Pi which is contradictory. Should this occur, the right end point for Xi and the left end point for Si are modified to Xi + 0.01 and Si – 0.01, respectively. To estimate the covariate effects on T, we model g(t) = log{−log(S(t)} in the underlying model (1).

We compare the proposed PO-CL method using the vi with ηi based PO method. On the other hand, PO-CS (ηi) uses a complementary log-log link function g(t) = log(−log(1 – t)) in the geese routine. Besides, we compare PO-CL with a Cox proportional hazards model (COX-MPI) that uses middle point imputation. Namely, finite censoring intervals (L, R] and (P, Q] are replaced by their middle points (L + R)/2 and (P + Q)/2 and T~=(P+Q)2(L+R)2. If Q = ∞, we compute the right-censoring time as T~=P(L+R)2 and assume the corresponding observations as right-censored. Then, a Cox proportional hazards model is constructed for T~. Given the lack of public software availability for doubly censored data methods, COX-MPI is oftentimes used in practice. We employ the R function coxph from the package survival45 for the COX-MPI fitting. Simulation results are presented in Table 1. Tabulated are the empirical mean of the β^ estimates, the mean of the estimated standard errors for β^ (SE), the empirical standard error for β^ (ESE), the coverage probability (CP) of the true β by the nominal 95% confidence intervals and the computation time in seconds (Time). These summary results indicate that the PO-CL(νi) method performs very well in all scenarios, by producing good quality estimators for β, with CPs close to the 95% nominal level. As the number of time points (J) increases, the efficiency of β^ is improved; SE and ESE are getting smaller. However, if we use the original PO-method based on ηis (PO-CL:ηi) and the complementary log-log link function in the geese routine, severely biased and unstable estimates were obtained regardless of the number of time points (J). Part of the reason for the poor performance is that the employed link function, log(−log(1 – t)) does not result in the linear form of A(t) and βTZ. On the other hand, COX-MPI produces substantially biased β^1 and β^2 estimates although their efficiency is superior to the PO-based methods. The Bayesian method using the DPpackage does not attain satisfactory results. The estimated regression coefficients are quite biased and the empirical means of estimated standard errors for β^ are very different from the empirical standard errors for β^. The reason might be due to either hyper-parameter value selection or right-censoring issue for the terminating-events.

Table 1.

Proportional Hazards Model.

n Method J β^1 β1 = 0.5
β^2 β2 = 0.25
Time
SE ESE 95.CP SE ESE 95.CP
50 PO-CL: νi 25 0.48 0.50 0.47 0.95 0.25 0.50 0.51 0.93 81
PO-CL: νi 10 0.48 0.50 0.48 0.95 0.25 0.50 0.51 0.95 50
PO-CL: νi 5 0.48 0.52 0.50 0.95 0.25 0.52 0.53 0.95 40
PO-CL: νi 1 0.47 0.60 0.60 0.95 0.25 0.59 0.59 0.95 26
PO-CL: ηi 25 5.1E + 16 1.1E + 13 1.9E + 18 0.57 −5.1E + 16 1.2E + 13 1.9E + 18 0.74 79
PO-CL: ηi 1 −2.1E + 32 4.6E + 15 2.9E + 30 0.60 1.5E + 34 6.5E + 18 4.4E + 35 0.89 28
COX-MPI 0.39 0.32 0.33 0.93 0.19 0.32 0.33 0.95 1
Bayesian −0.23 1.64 0.32 0.73 0.15 1.65 0.43 0.97 246
100 PO-CL: νi 25 0.49 0.35 0.31 0.95 0.26 0.35 0.35 0.96 665
PO-CL: νi 10 0.48 0.36 0.34 0.94 0.24 0.36 0.34 0.97 500
PO-CL: νi 5 0.48 0.37 0.37 0.94 0.23 0.37 0.36 0.97 311
PO-CL: νi 1 0.47 0.42 0.42 0.94 0.24 0.40 0.41 0.96 256
PO-CL: ηi 25 −5.7E + 11 3.5E + 11 1.5E + 13 0.35 0.68E + 11 3.4E + 12 1.8E + 13 0.75 660
PO-CL: ηi 1 2.4E + 16 1.2E + 20 5.6E + 14 0.56 3.8E + 13 4.5E + 12 4.8E + 13 0.60 250
COX-MPI 0.40 0.22 0.23 0.92 0.21 0.22 0.21 0.96 1
Bayesian −0.17 1.70 0.24 0.83 0.18 1.81 0.39 0.98 396

Ten percent right-censoring proportion for the terminating event S, Zi1 ~ Bernoulli(0.5), Zi2 ~ Bernoulli(0.5), β1 = 0.5 and β2 = 0.25, β^i = empirical mean of estimated βi, values, SE = empirical mean of estimated standard errors for β^i, ESE = empirical standard error for β^i, 95. CP = coverage probability of true βi by the 95% confidence intervals and Time = CPU time in seconds. νi and ηi represent incorporating the link function for PO-construction and the original ith PO proposed by Anderson et al. (2003),28 respectively. Some numbers are written in a scientific notation form (e.g. 4.6E + 14 = 4.6 × 1014).

3.2. Proportional odds models

The second simulation study devised is for proportional odds models in which the logit-transformed survival function is of the form

logit[S(tZi)]=Ziβ+logit[S0(t)], (15)

where Zi is the covariate vector of the ith individual, S0(t) is the baseline survival function and logit(t) = log(t/(1 – t)). Event times X and T are generated from two different proportional odds models. To generate X, we assume S0(x) = exp(−x), Zi ~ N(−2.5, 0.5) and β = 1. Similarly, to generate T, we use two independent covariates: a Bernoulli(0.5) and a N(−4.5, 0.5), while letting (β1, β2) = (0.5, 0.75) and S0(t) = exp(−t). Again, to prevent situations when Ri > Pi, the right end point for Xi and the left end point for Si are modified to be Xi + 0.01 and Si – 0.01, respectively. For comparison purposes, the MS-MPI regression method46 is also implemented. As in the proportional hazards model scenario, finite intervals are replaced by their middle point as well as the infinite intervals for the terminating event are considered as the right-censored observations. We use the function prop.odds from the R package timereg47 for model fitting. Simulation results shown in Table 2 and 3, indicate that the proposed PO-CL (νi) performs very well in terms of bias, efficiency and coverage. When the right censoring proportion is 30% for the terminating event time (Si) and 25 time-point were considered (J = 25), the original POs-based regression method (POCL : ηi) produced non-convergent coefficient effects (βs). When the right-censoring proportion of Si was decreased to 10%, these results were not changed. The poor performance of the PO-CS (ηi) may be due to many occurrences POs out of the [0, 1] range (about 40%). If J is reduced to 5, less unstable results were obtained but the coefficient estimates are still quite biased (out of range POs are about 25%). On the other hand, the MS-MPI appears to underestimate the covariate effects. Although the sample size increases, its bias disadvantage seems to persist. For the Bayesian method, it still gives quite biased results regardless of the sample size.

Table 2.

Proportional Odds Model.

Right Cens. Prop. = 30%
β1 = 0.5
β^2 β2 = 0.75
Time
PO-CL J β^1 SE ESE 95.CP SE ESE 95.CP
ηi 25 6.7E + 12 5.8E + 12 8.0E + 13 0.828 7.2E + 12 5.6E + 12 1.2E + 14 0.805 80
νi 25 0.496 0.654 0.687 0.937 0.761 0.646 0.681 0.937 81
ηi 10 0.633 0.675 0.929 0.915 0.828 0.672 0.942 0.915 51
νi 10 0.533 0.647 0.672 0.945 0.793 0.646 0.663 0.945 49
Right Cens. Prop. = 20%
ηi 25 5.1E + 11 1.1E + 11 1.2E + 13 0.826 6.9E + 12 5.6E + 12 8.5E + 13 0.602 79
νi 25 0.518 0.669 0.662 0.955 0.786 0.594 0.602 0.962 84
ηi 10 0.603 0.735 0.846 0.936 0.989 0.653 0.843 0.918 52
νi 10 0.522 0.675 0.692 0.958 0.762 0.669 0.700 0.950 49
Right Cens. Prop. = 10%
ηi 25 7.0E + 11 2.3E + 12 1.0E + 13 0.843 3.8E + 12 2.7E + 12 5.7E + 13 0.802 82
νi 25 0.508 0.669 0.692 0.945 0.751 0.664 0.662 0.952 79
ηi 10 0.593 0.635 0.824 0.918 0.889 0.643 0.873 0.918 51
νi 10 0.530 0.665 0.701 0.948 0.752 0.669 0.690 0.940 48

Sample size n = 50, Zi1 ~ Bernoulli(0.5), Zi2 ~ N(−4.5, 0.5), β1 = 0.5 and β2 = 0.75. Right censoring proportions for S = 0.3, 0.2 or 0.1. β^i = empirical mean of estimated βi values, SE = empirical mean of estimated standard errors for β^i, ESE = empirical standard error for β^i, 95. CP = coverage probability of true βi by the 95% confidence intervals. νi and ηi represent incorporating the link function for PO-construction and the original ith PO proposed by Anderson et al. (2003), respectively. Some numbers are written in a scientific notation form (e.g. 4.6E + 14 = 4.6 × 1014).

Table 3.

Proportional Odds Model.

n Method J β^1 β1 = 0.5
β^2 β2 = 0.75
Time
SE ESE 95.CP SE ESE 95.CP
50 PO-CL: νi 25 0.51 0.67 0.69 0.95 0.75 0.66 0.66 0.95 79
PO-CL: νi 10 0.53 0.67 0.70 0.95 0.75 0.67 0.69 0.94 48
PO-CL: νi 5 0.53 0.70 0.73 0.94 0.79 0.69 0.71 0.94 39
PO-CL: ηi 25 7.0E + 11 2.3E + 12 1.0E + 13 0.843 3.8E + 12 2.7E + 12 5.7E + 13 0.80 82
PO-CL: ηi 10 0.59 0.64 0.82 0.92 0.89 0.64 0.87 0.92 51
PO-CL: ηi 5 0.59 0.66 0.82 0.93 0.93 0.69 0.97 0.92 42
MS-MPI 0.39 0.51 0.53 0.95 0.63 0.51 0.53 0.95 1
Bayesian 0.62 0.55 0.51 1.00 −0.08 0.35 0.34 0.28 220
100 PO-CL: νi 25 0.49 0.42 0.40 0.95 0.75 0.40 0.41 0.95 650
PO-CL: νi 10 0.49 0.43 0.41 0.95 0.74 0.43 0.41 0.94 540
PO-CL: νi 5 0.51 0.44 0.43 0.96 0.76 0.44 0.42 0.93 311
PO-CL: ηi 25 4.2E + 12 1.9E + 12 5.9E + 13 0.89 5.1E + 12 2.3E + 12 8.3E + 13 0.86 655
PO-CL: ηi 10 0.44 0.39 0.38 0.93 0.85 0.40 0.50 0.93 510
PO-CL: ηi 5 0.42 0.41 0.45 0.94 0.89 0.42 0.55 0.92 302
MS-MPI 0.39 0.36 0.34 0.95 0.66 0.37 0.37 0.92 1
Bayesian 0.52 0.53 0.26 1.00 −0.03 0.71 0.27 0.53 412

Ten percent right-censoring proportion for the terminating event S, Zi1 ~ Bernoulli(0.5), Zi2 ~ N(−4.5, 0.5), β1 = 0.5 and β2 = 0.75, β^i = empirical mean of estimated βi values, SE = empirical mean of estimated standard errors for β^i, ESE = empirical standard error for β^i, 95. CP = coverage probability of true βi by the 95% confidence intervals, Time = CPU time in seconds. νi and ηi represent incorporating the link function for PO-construction and the original ith PO proposed by Anderson et al. (2003), respectively. Some numbers are written in a scientific notation form (e.g. 4.6E + 14 = 4.6× 1014).

3.3. Accelerated failure time models

The third simulation study is based on an accelerated failure time model. We generate T = exp(Zi1β1 + Zi2β2 + G), where exp(G) follows the Gompertz(1, 1) distribution. Therefore, the conditional survival function, given covariate vector Zi = {Zi1, Zi2} is equal to

S(tZi)=exp[1exp{te(Zi1β1+Zi2β2)}]. (16)

In addition, both Zi1 and Zi2 are generated independently from Bernoulli(0.5) and Unif (1, 1.5) distributions, respectively. Also, β1 is fixed at 0.5 and β2 at 0.75. Similarly, X is generated from an accelerated failure time model, assuming that exp(G) is Gompertz(1, 1)-distributed, β = 1, and covariate Z is distributed as N(2.5, 0.5). As in the proportional hazards/odds scenarios, the right end point for Xi and the left end point for Si are modified to Xi + 0.01 and Si – 0.01 respectively, whenever Ri > Pi. The link function used is g(t) = log[log{1 – log(t)}]. For comparison, we use a parametric accelerated failure time model involving once again the middle point imputation (AFT-MPI). We use the R function survreg from the package survival45 for the AFT-MPI fitting. Since there is no option for Gompertz distribution, the default Weibull distribution is assumed instead. Simulation results displayed in Table 3 indicate good quality PO-CL (νi) estimates of β^1 and β^2. The 95% CPs are very close to the nominal levels. On the other hand, PO-CL(ηi) with log- link function in the geese routine produces quite biased estimates and 95% CPs below the nominal level. Similarly, the AFT-MPI leads to biased β^ estimates in all cases and their coverage probabilities that are markedly below the nominal level. The Bayesian method still produces quite biased coefficients estimates as well.

In summary, PO-based regression method (νi) using appropriate link function produces quite reliable regression coefficient estimates and the efficiency can be improved if we use more multiple time points.

4. A randomized phase II prostate cancer trial example

Prostate cancer is one of the most prevalent cancers, having a complex pathogenesis. Ultimately, one in six men develops prostate cancer.48 A randomized, placebo-controlled, double-blinded phase II trial3 has been conducted to investigate how docetaxel with or without doxercalciferol is associated with time to prostate-specific antigen response in androgen-independent prostate cancer patients. Disease-free and overall survival times served as secondary endpoints. Between October 2002 and July 2005, 69 patients were randomized in 1:1 fashion to the trial arms, without additional stratification. Each patient was evaluated at baseline and every four weeks (one cycle) thereafter, by means of the Eastern Cooperative Oncology Group (ECOG) performance status, urine and blood tests and a physical exam. Disease progression measured via serial prostate-specific antigen values, hence progression time was deemed to have occurred between the most recent progression-free visit and the first visit when progression was established. Thus, time-to-cancer progression is subject to interval-censoring. On the other hand, survivorship after cancer progression is subject to both right-censoring and interval-censoring. As per the study protocol, Attia et al.3 have conducted separate analyses for disease-free survival (DFS) and overall survival (OS) time assuming that both DFS and OS are right-censored. First we estimate the DFS and OS using the expectation maximization iterative convex minorant (EMICM) estimator.49 EMICM estimator produces the NPMLE for interval-censored survival time. Figures 1 and 2 present EMICM estimators of the DFS and OS by study arm. When we examine both DFS and OS functions, it seems the new treatment effect does not differ from the placebo treatment effect in terms of DFS and OS.

Figure 1.

Figure 1.

Estimates of the disease-free survival (DFS) probability.

Figure 2.

Figure 2.

Estimates of the overall survival (OS) probability.

However, modeling survivorship after cancer progression is also of high interest.5-8 Although methods such as landmark analyses have been used in similar analyses, their limitations have been noted.9,10 It is worth noting that popular options, such as the Kaplan-Meier estimator or the Cox proportional hazards regression model are subject to induced dependent censoring issues, hence not valid in graphically representing and analyzing survivorship post cancer progression. In the current example, two patients have not experienced disease progression while on study and have been excluded from our analyses. Therefore, our results are based on a total of 35 combination chemotherapy (docetaxel with doxercalciferol) and 32 docetaxel without doxercalciferol patients. Figure 3 shows the estimated survivorship after cancer-progression in each trial arm. To model this time, we apply the proposed PO-CL (νi) method, assuming a proportional odds model. Under this model, logit[S(tZi)] = −Ziβ + logit[S0(t)], where Z is the covariate vector presented below, S0(t) is the baseline survival function and logit(t) = log(t/(1 – t)). Covariates considered include trial arm, age, baseline values for prostate-specific antigen levels, ECOG performance status, hemoglobin, Gleason Score, alkaline phosphatase level and lactate dehydrogenase. For better interpretability, the original prostate-specific antigen levels and lactate dehydrogenase values were rescaled via division by 100. Since randomized phase II trials are not designed to formally compare treatment arms, we will also not attempt to do so in this present example. Instead, our focus is on identifying factors associated with mortality after cancer progression.

Figure 3.

Figure 3.

Plot of the estimated time to death after prostate cancer progression.

To fit the PO-CL (νi) model, we have calculated POs at five time points chosen in the same manner as in simulations and assumed first-order autoregressive correlation structure. We have first conducted univariate and multivariable analyses to test the doxercalciferol effects. The final multivariable model includes both doxercalciferol and lactase dehydrogenase effects as the risk factors although they are not strictly significant based on our clinical judgement. Results shown in Table 4 include the estimated regression coefficient (β^), the standard error for β^, a 95% confidence interval for the β^, and the corresponding p-value based on Wald’s test. The PO-CL (νi) model indicates that both alkaline phosphatase level and hemoglobin are significantly associated (at 5% significance level) with mortality post cancer progression, findings not revealed by analyses from the original investigators. Specifically, low hemoglobin levels and high alkaline phosphatase levels are associated with an increase in the death odds ratio, once cancer progression has been established (p-values = 0.046 and 0.043, respectively).

Table 4.

Accelerated Failure Time Model.

n Method J β^1 β1 = 0.5
β^2 β2 = 0.75
Time
SE ESE 95.CP SE ESE 95.CP
50 PO-CL: νi 25 0.50 0.28 0.28 0.95 0.78 1.01 0.98 0.94 74
PO-CL: νi 10 0.49 0.29 0.29 0.94 0.77 1.00 0.97 0.94 48
PO-CL: νi 5 0.49 0.30 0.30 0.96 0.77 1.03 1.01 0.94 37
PO-CL: ηi 25 1.1E + 4 1.9E + 6 1.2E + 5 0.61 0.56 1.36 2.89 0.66 74
PO-CL: ηi 5 0.24 0.28 0.35 0.76 0.43 0.92 1.11 0.87 35
AFT-MPI 0.39 0.20 0.18 0.88 0.62 0.15 0.59 0.39 1
Bayesian 0.46 0.87 0.33 1.00 0.08 1.47 0.73 0.82 251
100 PO-CL: νi 25 0.51 0.18 0.18 0.96 0.75 0.66 0.71 0.93 645
PO-CL: νi 10 0.49 0.19 0.18 0.96 0.74 0.66 0.69 0.94 536
PO-CL: νi 5 0.50 0.20 0.18 0.96 0.74 0.68 0.71 0.93 310
PO-CL: ηi 25 −0.54 0.21 0.24 0.002 −0.69 0.68 0.78 0.41 650
PO-CL: ηi 5 −0.43 0.19 0.22 0.005 −0.59 0.63 0.70 0.40 309
AFT-MPI 0.40 0.15 0.16 0.88 0.64 0.11 0.42 0.40 1
Bayesian 0.52 0.68 0.33 1.00 0.39 1.46 1.16 0.70 400

Ten percent right-censoring proportion for the terminating event S, Zi1 ~ Bernoulli(0.5), Zi2 ~ Unif (1, 1.5), β1 = 0.5 and β2 = 0.75, β^i = empirical mean of estimated βi values, SE = empirical mean of estimated standard errors for β^i, ESE = empirical standard error for β^i, 95. CP = coverage probability of true βi by the 95% confidence intervals, Time = CPU time in seconds. νi and ηi represent incorporating the link function for PO-construction and the original ith PO proposed by Anderson et al. (2003), respectively. Some numbers are written in a scientific notation form (e.g. 4.6E + 14 = 4.6× 1014).

Recently, Perme and Andersen50 suggest a graphical method for goodness-of-fit based on pseudo-observations. Following their approach, a raw residual (at each time point) can be defined by vi,t[α^(t)+β^TZi]. Then its standardized pseudo-residual can be defined as vi,t[α^(t)+β^TZi]Ψ, where Ψ is an empirical standard error for the raw residuals. Using these residuals, we evaluated the final model fitting on the example by plotting the standardized pseudo-residuals against its linear predictor α^(t)+β^TZi. Basically, we can compute the residuals at each time point, we may select three time points chosen at three quartiles based on the estimated survival function for T. Figure 4 shows plot of the pseudo-residuals against the linear predictor at time points 4.72, 9.05 and 13.38. Based on the smoother there seems no trend to indicate the proportional odds assumption violated.

Figure 4.

Figure 4.

Goodness-of-fit for the proportional Odds (PO) model based on pseudo-residuals. Model fitting is conducted based on proportional odds assumption. Each row presents one of the three time points chosen corresponding to three quartiles of estimated survival function for T.

5. Discussion

We introduce a flexible regression method for modeling doubly censored failure time data. The stepping-stone is represented by the pseudo-observations that are obtained using the NPMLE of the survival function. Generalized estimating equations are used to obtain covariate effect estimates and their standard errors. From a computational standpoint, the programming effort is minimal. Therefore, this proposed approach has potential for immediate use in biomedical data analyses. By incorporating the link function into the PO-construction, we obtain more stable and less biased covariate effect estimates. Basically, the pseudo-observation-based regression method uses the GEE method. The performance of GEE depends on the sample size and the response value (Y). For binary data or Poisson count data, Y should be between [0, 1] or non-negative, respectively. When Y values are out of the expected range, the performance of GEE is degraded. Table 6 shows how the deviated Y values influence the GEE performance.

Table 6.

Simulation results for GEE performance.

Degree of
deviation(ζ)
Sample size
(n)
β^1 SE β^2 SE
1 100 0.912 0.169 0.904 0.168
300 0.965 0.056 0.959 0.056
500 0.979 0.034 0.977 0.033
2 100 4.34E + 15 2.09E + 14 4.15E + 15 2.06E + 14
300 1.188 0.300 1.146 0.290
500 1.086 0.153 1.080 0.152
3 100 1.46E + 16 7.46E + 14 1.32E + 16 7.37E + 14
300 1.12E + 14 1.20E + 13 7.22E + 13 8.13E + 12
500 1.246 0.338 1.231 0.330
4 100 1.81E + 16 1.19E + 15 1.55E + 16 1.18E + 15
300 2.79E + 15 1.52E + 14 2.08E + 15 1.26E + 14
500 2.65E + 13 2.69E + 12 1.45E + 12 1.32E + 12

N number of Y were generated (Y1, Y2, … , Yn). The first 80% of Y values come from the logit transformed linear predictors: Yi = exp(β1Zi1 + β2Zi2)/{1 + exp(β1Zi1 + β2Zi2)} and the latter 20% of Y values were fixed as a constant number (ζ) where Zi1 ~ Bernoulli(0.5), Zi2 ~ Bernoulli(0.5) and β1 = β2 = 1. Three repeated observations were assumed and the logit-link function was employed. Each scenario was replicated 1,000 times, β^i = empirical mean of estimated βi values and SE = empirical mean of the estimated standard errors for β^i. Some numbers are written in a scientific notation form (e.g. 4.34E + 15 = 4.34× l015).

The GEE method works well when the degree of deviation is small (ζ = 1) or sample size is large (N = 500). However, the performance becomes worse when the deviation is too large (ζ = 4), and even large sample size (N = 500) does not help to overcome this deviation. We have observed similar results when we applied the original POs-based regression method to the proportional odds model. Under the PH model, log of cumulative incidence function can be expressed as

logΛ(tZ)=logΛ0(t)+βTZ. (17)

If we define the ith pseudo-observation for the cumulative incidence function as ξinΛ(t) – (n – 1)Λi(t), we can use the log-link function in the GEE to estimate β. Note this representation still has the range restriction because Λ(t) is non-negative. Further simulation study was devised for the newly-constructed ξi under the PH model.

As shown in Table 7, PO-CL (ξi) method still produces quite unstable coefficient estimates, but their median of estimated βi values are much better than PO-CL (ηi). Again, these results underlie that it is better to get rid of the range restriction in constructing the POs. The choice of functional used for POs does not affect the performance of the proposed method. Besides, this approach is more robust on the change of the number of time points in contrast with most earlier work on pseudo-observations. As the number of time points increases, the efficiency of covariate effect estimate seems to increase. However we selected five time points considering computation time in the example. For further guidelines on how to choose such points, one may refer to Andersen et al.23 For the type of working correlation matrix, the first-order autoregressive working correlation matrix is used. Choosing other correlation structure such as independence seems to have no important effect for the parameter estimation.

Table 7.

Simulation results for the PH model.

PO-CL β^1 β¯1 β1 = 0.5
β^2 β¯2 β2 = 0.25
SE ESE 95.CP SE ESE 95.CP
ηi −5E + 12 −0.39 8E + 11 8E + 13 0.34 −1E + 13 −0.25 1E+12 1E+14 0.72
ξi −2E + 14 0.62 3E + 13 4E + 15 0.84 −1E + 13 0.22 4E + 13 2E + 14 0.84
νi 0.42 0.48 0.37 0.35 0.83 0.24 0.24 0.37 0.37 0.86

Sample size (n)=100, number of time points (J)=5, β^i = empirical mean of estimated βi values, β¯i = empirical median of estimated βi values, SE = empirical mean of estimated standard errors for β^i, ESE = empirical standard error for β^i, 95. CP = coverage probability of true βi by the 95% confidence intervals. βi and ηi represent incorporating the link function for PO-construction and the original ith PO. Furthermore, ξi represents the PO-construction for Λ(t), and log-link function was employed in the GEE.

Model misspecification is another important aspect that must be considered carefully in practice. Incorrect choices of the link function or the omission of important covariates may lead to biased inference. An additional advantage of this approach is that it easily permits one to assess the model goodness-of-fit. As such, residuals analogous to those in generalized linear models can be used to assess the correct covariate functional form. Assuming the correct link function g(·), the standardized raw residuals should be randomly scattered when plotted against the respective covariate. Given the relative paucity of such tools in the doubly censored data literature, this PO-based modeling approach could also be considered a useful addition to statistical practice.

In the example, we selected the proportional odds model instead of the proportional hazard model based on the goodness-of-fit based on pseudo-residuals. The conditional likelihood method for estimating the univariate survival function implicitly assumes that the infection time and the onset time of disease are independent. Although this assumption seems reasonable for most situations, the dependent situations may arise.25 The dependence structure of the two event times can be formulated with the gamma frailty copula model and pseudo self-consistency equations were derived subsequently.51 We may obtain the POs to account for the two dependent event times using the gamma frailty copula model. We intend to extend the PO-based regression method to the dependent event times for the future work. For the PO-based regression method, R program can be found at http://hanseungbong.wordpress.com/.

Table 5.

Prostate cancer trial example results under the proportional odds model (PO-CL:νi).

Variable β^ 95% CI for β^ P-value
Univariate Docetaxel without doxercalciferol reference reference reference
Docetaxel with doxercalciferol 0.626 (−0.494, 1.748) 0.273
Multivariable Docetaxel without doxercalciferol reference reference reference
Docetaxel with doxercalciferol 0.816 (−0.357, 1.991) 0.173
Alkaline phosphatase level 0.263 (0.005, 0.522) 0.046
Hemoglobin −0.069 (−0.136, −0.002) 0.043
Lactate dehydrogenase −0.160 (−0.334, 0.013) 0.071

Tabulated results are the estimated regression coefficients (β^), 95% confidence interval for β^ and corresponding P-values.

Acknowledgements

The authors would like to thank two anonymous referees for their valuable comments and suggestions.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Adin-Cristian Andrei’s research is supported in part by following grants: P30 CA014520-36, UL1 RR025011-03, R21 CA132267-02 and W81XWH-08-1-0341. Kam-Wah Tsui’s research is supported in part by the NSF grant DMS-0604931.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

  • 1.Sun J The statistical analysis of interval-censored failure time data. New York: Springer-Verlag, 2006. [Google Scholar]
  • 2.De Gruttola V and Lagakos SW. Analysis of doubly-censored survival data, with application to AIDS. Biometrics 1989; 45: 1–12. [PubMed] [Google Scholar]
  • 3.Attia S, Eickhoff J, Wilding G, et al. Randomized, double-blinded phase II evaluation of docetaxel with or without doxercalciferol in patients with metastatic, androgen-independent prostate cancer. Clin Cancer Res 2008; 14: 2437–2443. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Bacci G, Ferrari S, Longhi A, et al. Therapy and survival after recurrence of Ewing’s tumors: the Rizzoli experience in 195 patients treated with adjuvant and neoadjuvant chemotherapy from 1979 to 1997. Ann Oncol 2003; 14: 1654–1659. [DOI] [PubMed] [Google Scholar]
  • 5.Barker LM, Pendergrass TW, Sanders JE, et al. Survival after recurrence of Ewing’s sarcoma family of tumors. J Clin Oncol 2005; 23: 4354–4362. [DOI] [PubMed] [Google Scholar]
  • 6.Crompton BD, Glodsby RE, Weinberg VK, et al. Survival after recurrence of osteosarcoma: a 20-year experience at a single institution. Pediatr Blood Cancer 2006; 47: 255–259. [DOI] [PubMed] [Google Scholar]
  • 7.Yamashita H, Toyama T, Nishio M, et al. p53 protein accumulation predicts resistance to endocrine therapy and decreased post-relapse survival in metastatic breast cancer. Breast Cancer Res 2006; 8: R48. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Longhi A, Mariani E and Kuehn J. A randomized study with adjuvant mistletoe versus oral etoposide on post relapse disease-free survival in osteosarcoma patients. Eur J Integr Med 2009; 1: 27–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Anderson JR, Cain KC and Gelber RD. Analysis of survival by tumor response. J Clin Oncol 1983; 1: 710–718. [DOI] [PubMed] [Google Scholar]
  • 10.Anderson JR, Cain KC and Gelber RD. Analysis of survival by tumor response and other comparisons of time-to-event by outcome variables. J Clin Oncol 2008; 26: 3913–3915. [DOI] [PubMed] [Google Scholar]
  • 11.Bacchetti P and Jewell NP. Nonparametric estimation of the incubation period of AIDS based on a prevalent cohort with unknown infection times. Biometrics 1991; 47: 947–960. [PubMed] [Google Scholar]
  • 12.Sun J Empirical estimation of a distribution function with truncated and doubly interval-censored data and its application to AIDS studies. Biometrics 1995; 51: 1096–1104. [PubMed] [Google Scholar]
  • 13.Gómez G and Calle ML. Non-parametric estimation with doubly censored data. J Appl Stat 1999; 26: 45–58. [Google Scholar]
  • 14.Kim MY, De Gruttola V and Lagakos SW. Analyzing doubly censored data with covariates, with application to AIDS. Biometrics 1993; 49: 13–22. [PubMed] [Google Scholar]
  • 15.Sun J, Liao Q and Pagano M. Regression analysis of doubly censored failure time data with application to AIDS studies. Biometrics 1999; 55: 909–914. [DOI] [PubMed] [Google Scholar]
  • 16.Goggins WB, Finkelstei DM and Zaslavsky AM. Applying the Cox proportional hazards model for analysis of latency data with interval censoring. Stat Med 1999; 18: 2737–2747. [DOI] [PubMed] [Google Scholar]
  • 17.Lin DY and Ying Z. Semiparametric analysis of the additive risk model. Biometrika 1994; 81: 61–71. [Google Scholar]
  • 18.Sun L, Kim Y and Sun J. Regression analysis of doubly censored failure time data using the additive hazards model. Biometrics 2004; 60: 637–643. [DOI] [PubMed] [Google Scholar]
  • 19.Pan W A multiple imputation approach to regression analysis for doubly censored data with application to AIDS studies. Biometrics 2001; 57: 1245–1250. [DOI] [PubMed] [Google Scholar]
  • 20.Zhang W, Zhang Y, Chaloner K, et al. Imputation methods for doubly censored HIV data. J Stat Comput Simul 2009; 79: 1245–1257. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Zhang W, Chaloner K, Cowles MK, et al. A Bayesian analysis of doubly censored data using a hierarchical Cox model. Stat Med 2008; 27: 529–542. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Komarek A, Lesaffre E, Harkanen T, et al. A Bayesian analysis of multivariate doubly-interval-censored dental data. Biostatistics 2005; 6: 145–155. [DOI] [PubMed] [Google Scholar]
  • 23.Komarek A and Lesaffre E. Bayesian accelerated failure time model with multivariate doubly interval-censored data and flexible distributional assumptions. J Am Stat Assoc 2008; 103: 523–533. [Google Scholar]
  • 24.Komarek A bayesSurv: Bayesian survival regression with flexible error and random effects distributions. R package version 0.6-2, http://CRAN.R-project.org/package=bayesSurv (2010, accessed 4 July 2013). [Google Scholar]
  • 25.Jara A, Lesaffre E, Iorio MD, et al. Bayesian semiparametric inference for multivariate doubly-interval-censored data. Ann Appl Stat 2010; 4: 2126–2149. [Google Scholar]
  • 26.Jara A, Hanson T, Quintana F, et al. DPpackage: Bayesian semi- and nonparametric modeling in R. J Stat Softw 2010; 40: 1–30. [PMC free article] [PubMed] [Google Scholar]
  • 27.Yu B A Bayesian MCMC approach to survival analysis with doubly-censored data. Comput Stat Data Anal 2010; 54: 1921–1929. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Andersen PK, Klein JP and Rosthøj S. Generalized linear models for correlated pseudo-observations with applications to multi-state models. Biometrika 2003; 90: 15–27. [Google Scholar]
  • 29.Andersen PK, Hansen MG and Klein JP. Regression analysis of restricted mean survival time based on pseudo-observations. Life Time Data Anal 2004; 10: 335–350. [DOI] [PubMed] [Google Scholar]
  • 30.Klein JP and Andersen PK. Regression modeling for competing risks data based on pseudo-values of the cumulative incidence function. Biometrics 2005; 61: 223–229. [DOI] [PubMed] [Google Scholar]
  • 31.Andrei AC and Murray S. Regression models for the mean of quality-of-life-adjusted restricted survival time using pseudo-observations. Biometrics 2007; 63: 398–404. [DOI] [PubMed] [Google Scholar]
  • 32.Logan BR, Klein JP and Zhang MJ. Comparing treatments in the presence of crossing survival curves: an application to bone marrow transplantation. Biometrics 2008; 64: 733–740. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Logan BR, Zhang MJ and Klein JP. Marginal models for clustered time-to-event data with competing risks using pseudovalues. Biometrics 2011; 67: 1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Andersen PK and Perme MP. Pseudo-observations in survival analysis. Stat Methods Med Res 2010; 19: 71–99. [DOI] [PubMed] [Google Scholar]
  • 35.Han S and Andrei A-C and Tsui K-W A semiparametric regression method for interval-censored data. Commun Stat-Simul Comput 2013; doi: 10.1080/03610918.2012.697962, in press. [DOI] [Google Scholar]
  • 36.Tukey JW. Bias and confidence in not quite large samples. Ann Math Stat 1958; 29: 614. [Google Scholar]
  • 37.Liang KY and Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika 1986; 73: 13–22. [Google Scholar]
  • 38.Yan J and Fine JP. Estimating equations for association structures. Stat Med 2004; 23: 859–880. [DOI] [PubMed] [Google Scholar]
  • 39.Andersen PK and Klein JP. Regression analysis for multistate models based on a pseudo-value approach with applications to bone marrow transplantation studies. Scand J Stat 2007; 34: 3–16. [Google Scholar]
  • 40.Scheike T and Zhang MJ. Direct modelling of regression effects for transition probabilities in multistate models. Scand J Stat 2007; 34: 17–32. [Google Scholar]
  • 41.Yan J geepack: yet another package for generalized estimating equations. R-News 2002; 12–14. [Google Scholar]
  • 42.Vincent JC. gee: generalized estimation equation solver. R package version 4.13–17, http://CRAN.R-project.org/package=gee (2011, accessed 4 July 2013). [Google Scholar]
  • 43.Gómez G and Lagakos SW. Estimation of the infection time and latency distribution of AIDS with doubly censored data. Biometrics 1994; 50: 204–212. [PubMed] [Google Scholar]
  • 44.Sun J Self-consistency estimation of distributions based on truncated and doubly censored data with applications to AIDS cohort studies. Lifetime Data Anal 1997; 3: 305–313. [DOI] [PubMed] [Google Scholar]
  • 45.Therneau T and Lumley T. Survival: survival analysis, including penalised likelihood. R package version 2.35–7, http://CRAN.R-project.org/package=survival (2009, accessed 4 July 2013). [Google Scholar]
  • 46.Martinussen T and Scheike T. Dynamic regression models for survival data. New York: Springer Verlag, 2006. [Google Scholar]
  • 47.Scheike T, Martinussen T and Silver J. timereg: timereg package for flexible regression models for survival data. R package version 1.2–5, http://CRAN.R-project.org/package=timereg (2009, accessed 4 July 2013). [Google Scholar]
  • 48.American Cancer Society. Cancer facts & figures 2012. Atlanta, GA: American Cancer Society, 2012. [Google Scholar]
  • 49.Wellner JA and Zhan Y. A hybrid algorithm for computation of the nonparametric maximum likelihood estimator from censored data. J Am Stat Assoc 1997; 92: 945–959. [Google Scholar]
  • 50.Perme MP and Anderson PK. Checking hazard regression models using pseudo-observations. Stat Med 2008; 27: 5309–5328. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Jiang H, Fine JP, Kosorok MR, et al. 2005. Pseudo self-consistent estimation of a copula model with informative censoring. Scand J Stat 2008; 32: 1–20. [Google Scholar]

RESOURCES