Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Mar 14.
Published in final edited form as: Stat Med. 2022 Mar 28;41(16):3003–3021. doi: 10.1002/sim.9399

Regression modeling of restricted mean survival time for left-truncated right-censored data

Rong Rong 1,2, Jing Ning 3, Hong Zhu 2
PMCID: PMC10014036  NIHMSID: NIHMS1876336  PMID: 35708238

Abstract

The restricted mean survival time (RMST) is a clinically meaningful summary measure in studies with survival outcomes. Statistical methods have been developed for regression analysis of RMST to investigate impacts of covariates on RMST, which is a useful alternative to the Cox regression analysis. However, existing methods for regression modeling of RMST are not applicable to left-truncated right-censored data that arise frequently in prevalent cohort studies, for which the sampling bias due to left truncation and informative censoring induced by the prevalent sampling scheme must be properly addressed. The pseudo-observation (PO) approach has been used in regression modeling of RMST for right-censored data and competing-risks data. For left-truncated right-censored data, we propose to directly model RMST as a function of baseline covariates based on POs under general censoring mechanisms. We adjust for the potential covariate-dependent censoring or dependent censoring by the inverse probability of censoring weighting method. We establish large sample properties of the proposed estimators and assess their finite sample performances by simulation studies under various scenarios. We apply the proposed methods to a prevalent cohort of women diagnosed with stage IV breast cancer identified from surveillance, epidemiology, and end results-medicare linked database.

Keywords: general censoring mechanisms, inverse probability of censoring weighting, left-truncated right-censored data, pseudo-observations, restricted mean survival time

1 |. INTRODUCTION

The restricted mean survival time (RMST) is a clinically relevant summary measure in studies with survival outcomes. Unlike the uncensored data, the mean survival time may not be estimable due to censoring. As an alternative measure to the mean survival time or median survival time, the RMST defined as the expected survival time up to a fixed time point τ has been suggested:13 μ(τ)=E(Tτ)=0τS(t)dt, which is the area under the survival curve over a specified time interval [0,τ]. The RMST can be consistently estimated even when the largest observed time is censored as long as τ is no larger than the largest failure time. The difference or ratio of RMST characterizes the absolute magnitude of risk or benefit in survival and is a useful alternative measure to the hazard ratio from the Cox regression analysis. Although the Cox proportional hazards model is commonly used for exploring the relationship between survival and covariates, the validity of the proportional hazards assumption is often questionable and can be hard to be checked analytically for certain types of survival data, such as left-truncated right-censored data. In contrast, RMST is an easily interpretable measure of average survival time over a fixed followed-up time period and does not have any assumption requirement. Therefore, analysis based on RMST is more desirable in clinical settings, especially when the proportional hazards assumption is violated. Existing methods for estimating RMST include the indirect and direct estimations. The indirect methods estimate RMST through the Cox proportional hazards model.2,46 Such indirect RMST estimation is inconvenient and requires the proportional hazards assumption to some extent. Hence, directly modeling RMST itself is more appealing. For right-censored data, Andersen et al3 proposed a regression analysis of RMST given baseline covariates using pseudo-observations (POs). Tian et al7 modeled the relationship between RMST and baseline covariates through a link function under covariate-independent censoring. Wang and Schaubel8 developed generalized estimating equation methods to model RMST as a function of baseline covariates under general censoring mechanisms.

Left-truncated right-censored data are frequently encountered in prevalent cohort studies, in which diseased patients who have not yet experienced the disease-related failure event (eg, death) are sampled and prospectively followed for the subsequent failure event.9 One motivational example of such data is from a prevalent cohort of late-stage breast cancer patients identified from surveillance, epidemiology, and end results (SEER)-medicare linked database. The study cohort consists of patients diagnosed with Stage IV breast cancer before the sampling time and are still alive at the sampling time, and the goal is to investigate the impact of covariates on RMST among patients with Stage IV breast cancer. In addition to right censoring, survival data from a prevalent cohort are subject to left truncation because patients who died before the sampling time are not included in the study cohort. Statistical methods must account for the sampling bias due to left truncation and informative censoring induced by the prevalent sampling scheme. Although much research has been conducted into both regression analysis of left-truncated right-censored data1013 and direct regression analysis of RMST for right-censored data,3,7,8 relatively little work is available on direct regression modeling of RMST for left-truncated right-censored data. To our knowledge, only one paper by Lee et al14 studied direct regression analysis of RMST for length-biased right-censored data, a special type of left-truncated right-censored data that assumes a constant disease incidence rate. That paper was mostly concerned with covariate-independent censoring and constructed unbiased estimating equations to obtain consistent estimators of covariate effects on RMST. In observational cohort studies, covariate-dependent censoring or dependent censoring occurs frequently when the censoring time and failure time are correlated through common baseline covariates or possibly time-varying covariates, respectively. For example, in AIDS studies, patients who have low CD4 counts (an indicator of immune function in patients living with HIV) are more likely to drop out of the study, resulting in overestimation of the overall survival if covariate-independent censoring is assumed. Methods for regression modeling of RMST need to take into account covariate-dependent censoring and dependent censoring. The inverse probability of censoring weighting (IPCW) method, discussed by Robins and Rotnitzky,15 Robins,16 and Robins and Finkelstein17 among others, can be used to correct for the bias due to covariate-dependent censoring and dependent censoring. For right-censored data, Xiang and Murray18 developed a model for the log of RMST given baseline covariates by using POs that account for dependent censoring. In this article, we will consider general censoring mechanisms in regression modeling of RMST for left truncated right-censored data.

POs are jackknife estimates that represent the contribution of each subject to the estimator of the parameter of interest.19 POs are usually used to study the bias and precision of the parameter estimator. Andersen et al introduced an approach of using POs in the regression analysis of right-censored data.3,19 The PO approach has also been used in regression modeling of competing risks data20,21 and the Cox regression analysis of left-truncated right-censored data.22 In this article, we propose to extend the PO approach in Andersen et al3 to left truncated right-censored data and directly model RMST as a function of baseline covariates based on POs under general censoring mechanisms. The PO approach has the advantage of handling complex issues related to left truncation and right censoring in the first step of generating POs and then using POs as responses in a generalized linear model for uncensored data. The remainder of this article is organized as follows. In Section 2, we introduce the left-truncated right-censored data structure with notations and describe the regression model of RMST. In Section 3, we first present the proposed method for regression modeling of RMST given covariates using POs under covariate-independent censoring. Then, we relax the covariate-independent censoring assumption to incorporate covariate-dependent censoring and dependent censoring. We investigate the finite sample performances of proposed estimators by simulation studies under various scenarios in Section 4. As an illustration, we apply the proposed methods to a prevalent cohort of women diagnosed with late-stage breast cancer identified from SEER-medicare linked database in Section 5. We provide concluding remarks in Section 6. Technical details can be found in the Appendix.

2 |. DATA, NOTATIONS, AND REGRESSION MODEL

In a prevalent cohort study, patients with a certain disease are sampled or enrolled and then followed prospectively till the occurrence of a failure event or censoring. We are interested in studying the underlying relationship between the RMST and baseline covariates through regression modeling based on the PO approach. Let T˜ be the time from the disease onset to the failure event (unbiased failure time). Let A˜ be the time from disease onset to study enrollment. Under the prevalent sampling, the failure time T˜ is not randomly sampled from the target population because patients who experienced the failure event prior to the enrollment are not included. Hence, patients in the prevalent cohort all have A˜<T˜. Let T be the sampled failure time from the disease onset (biased failure time) and A be the corresponding truncation time. For the sampled patients, let V be the time from enrollment to the failure event, and we have T=A+V, where V is subject to right censoring. Let C be the residual censoring time from enrollment. Let Y=min(T,A+C) be the follow-up time till failure event or censoring and δ=I(V<C) be the failure indicator. Let X be a p×1 vector of baseline covariates. The observed data are Yi,Ai,δi,Xi, i=1,2,,n. Let τ be a prespecified time point of interest from the disease onset and T˜τ=min(T˜,τ) be the restricted survival time for a fixed τ. The RMST is then defined as μ(τ)=E[T˜τ]. Throughout this article, we use these notations and assume that T˜ and A˜ are conditionally independent given covariates X, which is a standard assumption for left-truncated right-censored data. Note that the biased failure time T is correlated with censoring time from the disease onset A+C through a common variable A, which is referred to as informative censoring induced from prevalent sampling. Our goal is to directly model the relationship between RMST and covariates through a generalized linear model:

g[μτ|Z]=Zβτ, (1)

where g() is a differentiable, strictly increasing link function, Z=(1,X), and βτ is a (p+1)×1 coefficient vector specific to τ. Examples of common link functions include the linear link g(m)=m and log link g(m)=log(m). The linear link function leads to a simple linear regression of RMST, where the covariate effects can be interpreted as differences in the RMST. However, since the linear model may produce negative responses that are not meaningful for RMST, the log-linear model under the log link function would be a natural alternative, where the covariate effects can be interpreted as ratios in the RMS.23

3 |. REGRESSION MODELING OF RMST BASED ON PSEUDO-OBSERVATIONS

POs for regression analysis of RMST can be defined by using a consistent estimator μˆ(τ) for the parameter of interest μ(τ).3 For conventional right-censored data, a consistent estimator of μ(τ) is μˆ(τ)=EˆT˜τ=0τSˆ(t)dt, where Sˆ(t) is the Kaplan-Meier estimator for the survival function P(T˜>t). Then, the i th PO is computed as

μˆi(τ)=nμˆ(τ)-(n-1)μˆ-i(τ), (2)

where μˆ-i(τ) is the jackknife leave-one-out estimator for μ(τ) based on data leaving out subject i. The rationale behind the PO approach is that any estimator of μ(τ)=ET˜τ is also implicitly an estimator of EZET˜τ|Z, where the inner expectation is the quantity of interest in the regression model (1) and the outermost expectation is taken with respect to the empirical distribution of Z. Let μ˜(τ)=1ni=1nET˜τ|Zi be a consistent estimator of EZET˜τ|Z. Then, the corresponding ith PO is nμ˜(τ)-(n-1)μ˜-1(τ)=n1ni=1nET˜τ|Zi-(n-1)1n-1j=1,jinET˜τ|Zj=ET˜τ|Zi, which is the quantity of interest in regression modeling. As described in Anderson et al,3 since both μˆ(τ) and μ˜(τ) are consistent estimators for μ(τ) and they will be approximately equal when n is large, μˆ(τ) that is estimable from censored survival data can be used to replace μ˜(τ), and formula (2) can be used to generate POs that have the same conditional mean of interest for regression modeling as the original individual level data. In other words, models based on POs generated by (2) will have regression parameters similar to a model fit to the values of T˜τ if all these values were uncensored. Thus, the POs, PO=μˆ1(τ),μˆ2(τ),,μˆn(τ) obtained from (2) can be used as responses of the regression model (1) to estimate βτ under a generalized estimating equation framework.3 Graw et al24 and Overgaard et al25,26 provide the formal theoretical justification of PO approach and asymptotic properties of parameter estimators. We extend the method in Anderson et al3 to left-truncated right-censored data where only the biased failure times are observable, and propose a modified PO approach to estimate and analyze RMST.

First, we consider covariate-independent censoring, that is, the residual censoring C is independent of (A,V). The Kaplan-Meier estimator would result in overestimation of the survival function for left-truncated right-censored data,27 and thus, overestimation of the RMST. The survival function S(t) for such data can be consistently estimated by a product-limit estimator SˆPL(t) with risk set R(t)=i:AitYi,28 and

SˆPLt=j:t(j)t1-djrj,

where t(1),,t(K) denotes the set of K distinct ordered failure times from uncensored Yi in the sample, dj=i=1nIYi=t(j),δi=1 is the number of failures at tj, and rj=i=1nIAi<t(j)Yi is the number of subjects “at risk” right before the jth failure time. The risk set R(t) at any time t consists of subjects who have entered the study and have not failed or been censored by that time. Note that the difference between the Kaplan-Meier estimator for right-censored data and the product-limit estimator SˆPL(t) is the definition of risk set. For left-truncated right-censored data, SˆPL(t) is similar to the Kaplan-Meier estimator, after replacing the risk set RKM(t)=i:tYi with R(t)=i:AitYi. The product-limit estimator SˆpL(t) is the nonparametric maximum likelihood estimator of S(t).29 A consistent estimator μˆ(τ) can be obtained by integrating the product-limit estimator of the survival function over the time interval [0,τ], μˆ(τ)=0τSˆPL(t)dt, and is used to construct POs, PO=μˆ1(τ),μˆ2(τ),,μˆn(τ), based on (2). The POs are then used as responses in the generalized linear model (1) with a suitable link function to estimate regression parameters βτ and predict μτ|Zi. The regression coefficients, βτ, can be estimated by the generalized estimating equations

U(βτ)=i=1nUi(βτ)=i=1n{βg1(Ziβτ)}𝒱i1{μ^i(τ)g1(Ziβτ)}=0, (3)

where 𝒱i is a working variance of μˆi(τ)30,31 with a simple choice of 𝒱i=1. Anderson et al32 showed that the estimates obtained from generalized estimating equations using POs are consistent for right-censored data. Since the nonparametric estimator μˆ(τ) based on the product-limit estimator is consistent for left-truncated right-censored data, we can also use (3) to obtain consistent estimates of the regression coefficients in model (1). Let βˆτ be the solution to (3) and βτ0 be the true value of βτ. The asymptotic properties of βˆτ are summarized in Theorem 1 with the proof and regularity conditions provided in the Appendix.

Theorem 1.

Under some regularity conditions, βˆτ is consistent to βτ0, and n(βˆτ-βτ0) is asymptotically normal with mean zero and a covariance matrix that can be consistently estimated using a standard “sandwich” estimator, which takes the form

Σ^=(β^τ)1var^{U(βτ)}(β^τ)1,

where

(β^τ)=i{g1(Ziβ^τ)β^τ}𝒱i1{g1(Ziβ^τ)β^τ},var^{U(βτ)}=iUi(β^τ)Ui(β^τ)T.

Second, covariate-independent censoring may be implausible in practice and covariate-dependent censoring often occurs in observational cohort studies, where the censoring time and failure time are only conditionally independent given baseline covariates. Furthermore, the censoring time may be correlated with the failure time through a mutual association with possibly time-varying covariates, which is referred to as dependent censoring. We relax the covariate-independent censoring assumption and model RMST under more general censoring mechanisms. The product-limit estimator is a consistent estimator of the survival function for left-truncated right-censored data under covariate-independent censoring. It is crucial to account for covariate-dependent censoring or dependent censoring to consistently estimate the survival function. The IPCW approach can be used to adjust for covariate-dependent censoring or dependent censoring by assigning extra weight to subjects who are not censored or who are observed.1517,33 Each subject is assigned a weight inversely proportional to the estimated probability of remaining uncensored until time t given covariates. The Cox proportional hazards model for censoring is frequently used to model the relationship between censoring time and covariates and estimate such probability. For simplicity of discussion, we assume that the residual censoring C is conditionally independent of (A,V), given baseline covariates X, although the covariate-dependent censoring assumption can be easily relaxed to dependent censoring by incorporating time-varying covariates M(t) into the Cox model.17 The Cox model for the residual censoring time C given covariates X is:

λC(t|X)=λC0(t)expαX

where λc0 is the baseline hazard function for censoring and α is the vector of model parameters. Let αˆ be the partial likelihood estimate of α and ρ(j)’s denote the distinct ordered residual censoring times. A consistent estimator of the conditional probability that subject i remains uncensored through time t given X is provided by:

Kˆi(t)={j:ρ(j)<t,δj=0}[1λˆC0(ρ(j))exp{αˆXi}],

where

λˆC0(ρ(j))=1-δji=1nexp{αˆXi}I(ρ(j)Yi)

is the Cox estimator of the baseline hazard function for censoring, λC0, with I(ρ(j)Yi) being the at-risk indicator and δ being the failure indicator.17 The subject-specific IPCW weight is Wˆi(t)=1/Kˆi(t). The contribution of subject i at risk at any time t(j) is weighted by the subject-specific weight Wˆit(j). The IPCW version of the product-limit estimator for S(t) for left-truncated right-censored data is then given by

S^IPCW(t)={1ift<t1,j:t(j)t[1i=1nI{Yi=t(j),δi=1}W^i(t(j))i=1nI{(Ai,Yj)R(t(j))}W^i(t0))]iftt1.

In the presence of covariate-dependent censoring, we can use SˆIPCW(t) to consistently estimate the survival function S(t) and further to obtain estimated RMST and corresponding POs. Then, the POs can be used in the generalized linear model (1) to estimate βτ, similar to the case under covariate-independent censoring. For right-censored data, Robins and Finkelstein17 provided proof of the consistency of SˆIPCW(t) for S(t) under dependent censoring. The consistency of the IPCW estimator also holds for the left-truncated right-censored data when the risk set is properly adjusted. Therefore, the resulting estimator of βτ is consistent and asymptotically normal, which can be proved similarly to Theorem 1 under covariate-independent censoring.

4 |. SIMULATIONS

We conduct a series of simulations to assess the performance of the proposed methods for left-truncated right-censored data, under various scenarios. The failure time data are generated both under proportional hazards and under nonproportional hazards. For each scenario, the simulation was repeated 1000 times with a sample size of n=350 or 500.

4.1 |. Simulations under left truncation and covariate-independent censoring

First, we evaluate the performance of the proposed method under proportional hazards and covariate-independent censoring. We randomly assign each subject to two groups, A and B, with equal probability. Group A is treated as the reference. The assumed model for RMST is μτx1=ET˜τ|X1=x1=βτ0+βτ1x1 with the linear link function. The failure time T˜ follows the Cox proportional hazards model and is generated from a distribution with hazard function t|X1=x1=expγx1, where γ=0.5. The covariate X1 is binary and equals to 1 for subjects in group B and equals to 0 for subjects in group A. The residual censoring time C is generated from an exponential distribution with parameter λC, allowing for various levels of censoring (ie, censoring rates of 30% and 45%). The truncation time A˜ follows a Weibull distribution with scale parameter λl and shape parameter αl, where λl is such that the truncation rate is 30% when αl=1. Regression parameters in the RMST model are estimated at two values of τ=(0.69,1.39), which are approximately the 60 th and 80 th percentiles of the failure time T˜, respectively. The upper panel of Table 1 summarizes the simulation results. Next, we investigate how the proposed method performs under nonproportional hazards and covariate-independent censoring. Let Z=1,X1. The failure time is generated from a distribution with hazard function λ(t|Z=z)=exp-zγ+zζlog(8t), where γ=(0.5,1) and ζ=(1,-0.3). Apparently, the proportional hazards assumption is not valid because the hazard ratio of the two groups varies over time. The rest of the data generating and estimation procedures are similar to the case under the proportional hazards. The values of τ are set to be 0.69 and 1.39, which are approximately the 50 th and 80th percentiles of the failure time, respectively. The lower panel of Table 1 summarizes the simulation results. Moreover, we carry out simulations with the log link function and the details of simulations are described in the Appendix with results summarized in Table B1.

TABLE 1.

Simulation results under covariate-independent censoring and with linear link function

Proportional hazards
30% censoring rate
45% censoring rate
n τ True RB1 SD2 SE3 CP4 MSE5 RB SD SE CP MSE
350 0.69 β 0 0.500 −0.004 0.041 0.033 0.900 0.001 −0.008 0.048 0.034 0.893 0.001
β 1 −0.087 −0.023 0.067 0.057 0.956 0.003 −0.034 0.077 0.059 0.950 0.003
1.39 β 0 0.750 −0.005 0.074 0.056 0.917 0.003 −0.017 0.119 0.058 0.920 0.004
β 1 −0.205 0.005 0.116 0.092 0.954 0.008 −0.015 0.127 0.099 0.960 0.010
500 0.69 β 0 0.500 −0.008 0.040 0.028 0.927 0.001 −0.008 0.039 0.028 0.922 0.001
β 1 −0.087 −0.011 0.072 0.049 0.960 0.002 −0.003 0.065 0.050 0.958 0.003
1.39 β 0 0.750 −0.005 0.056 0.045 0.927 0.002 −0.011 0.071 0.047 0.912 0.002
β 1 −0.205 0.029 0.094 0.079 0.954 0.006 0.020 0.106 0.082 0.959 0.007
Nonproportional hazards
30% censoring rate
45% censoring rate
n τ True RB SD SE CP MSE RB SD SE CP MSE
350 0.69 β 0 0.497 −0.022 0.028 0.030 0.974 0.001 −0.022 0.030 0.031 0.967 0.001
β 1 0.126 0.024 0.032 0.034 0.979 0.001 0.032 0.037 0.036 0.971 0.001
1.39 β 0 0.568 −0.065 0.043 0.048 0.945 0.004 −0.063 0.045 0.051 0.956 0.004
β 1 0.437 0.021 0.060 0.060 0.950 0.004 0.016 0.064 0.064 0.953 0.004
500 0.69 β 0 0.497 −0.022 0.024 0.025 0.957 0.001 −0.022 0.024 0.026 0.970 0.001
β 1 0.126 0.024 0.028 0.029 0.970 0.001 0.024 0.028 0.030 0.965 0.001
1.39 β 0 0.568 −0.063 0.035 0.041 0.918 0.003 −0.069 0.038 0.043 0.903 0.003
β 1 0.437 0.021 0.048 0.050 0.954 0.003 0.021 0.052 0.053 0.947 0.003

Abbreviations: CP, empirical coverage probability; MSE, mean squared error, defined as bias2 + SE2; RB, relative bias, defined as bias/true; SD, empirical standard deviation of 1000 parameter estimates; SE, average of estimated standard errors across 1000 iterations.

From Table 1, the estimation procedure performs well with generally very small relative biases and the estimated model-based standard errors (SEs) computed as the GEE sandwich estimator being close to the empirical standard deviations (SDs) in all scenarios. Increasing the censoring rate from 30% to 45% does not affect the relative bias much but tends to increase SEs and SDs slightly. Additionally, the estimated SEs and SDs decrease as the sample size increases. The coverage probabilities are generally close to the nominal level of 95%, with some slight undercoverage in estimating the intercept, β0. Nevertheless, the estimation of the regression coefficient, which is often the main focus, is reliable. It is noted that relative biases are larger in the nonproportional hazards scenario than those in the proportional hazards scenario, however, SDs and SEs under nonproportional hazards are considerably smaller than those under proportional hazards. Overall, there is no obvious directional trend when comparing the mean squared errors under nonproportional hazards and under proportional hazards, and coverage probabilities under these two scenarios are comparable. The results in Table B1 also suggest a good performance of the proposed method under various scenarios with the log link function.

In both Tables 1 and B1, the relative bias of parameter estimate increases as τ increases. Such effect is more pronounced for the estimation of intercept and the bias can be relatively large in some cases. For example, the relative bias for β0 is as large as 0.069 at τ=1.39 in the lower panel of Table 1. This is possibly caused by the presence of the extremely small negative-valued POs that behave as outliers in the subsequent GEE analysis. Similar problems were observed in the simulations for regression analysis of RMST with right-censored data using POs in Anderson et al3 where the bias increases considerably as τ increases, especially under a high censoring rate and in the simulations for estimating regression parameters in the Cox model with left-truncated right-censored data using POs in Grand et al22 The bias observed at larger τ may be due to the low precision of the product-limit estimator at the tail part of the survival function and is enhanced for the RMST estimator by the integration. Another important explanation is that under left truncation, there can be very few subjects at risk at the beginning so that the information in the data is too sparse. Grand et al22 suggested select a set of time points where the information is less sparse to improve the estimation procedure. Here, we use a “conditional survival function” approach to address this issue. This approach is to only include subjects whose failure times are greater than k, where k is the failure time corresponding to the cutoff value of the POs that separates out the outliers. Therefore, this approach is essentially modeling the RMST based on the conditional survival function given surviving beyond k, that is, ET˜τ|T˜τ>k=kτP(T˜>t|T˜>k)dt. The extremely small negative POs are identified by using a cutoff value obtained by subtracting 2 times interquartile range from the first quartile. Table B3 in the Appendix includes the estimation results at τ=1.39 by using the “conditional survival function” approach with linear and log link functions. It shows that the estimation performance is improved remarkably comparing to those in the lower panel of Table 1 and the lower panel of Table B1, respectively. The absolute value of relative bias after using the “conditional survival function” approach is no more than 0.012 across different simulation settings. In addition, the coverage probability is closer to the nominal level of 95% after the adjustment, especially for β0.

4.2 |. Simulations under left truncation and covariate-dependent censoring

We conduct the following two simulation studies to evaluate the proposed methods under covariate-dependent censoring and compare the parameter estimates obtained by using the traditional PO approach and the IPCW-adjusted PO approach. We first generate the survival data under a proportional hazards model. We randomly assign each subject to three groups, A, B, and C, with unequal proportions of 40%, 30%, and 30%. Group A is treated as the reference. We set the two covariates X1 and X2 as dummy variables indicating groups B and C, respectively. The assumed model of RMST is μτ(x)=ET˜τ|X=x=βτ0+βτ1x1+βτ2x2. The failure time is generated from a distribution with hazard function λ(t|X)=exp(γX), where γ=(0.5,1) and X=X1,X2. The residual censoring time is generated from an exponential distribution with parameter λC=λC0exp4X1+5X2, which depends on the two covariates. Varying λC0 allows for various levels of censoring (ie, censoring rates of 30% and 45%). The truncation variable follows the same Weibull distribution as in Section 4.1, with a truncation rate of 30%. After the data are generated, we compute the IPCW-adjusted POs for the RMST at two values of τ=(0.69,1.39), which are approximately the 65th and 85th percentiles of the failure time, respectively. Table 2 summarizes the simulation results. Next, we generate the survival data under nonproportional hazards. The assumed model for RMST is μτ(x)=ET˜τ|X=x=βτ0+βτ1x1+βτ2x2. Let Z=1,X1,X2. The failure time is generated from a distribution with hazard function λ(t|Z=z)=exp-zγ+zζlog(8t), where γ=(0.5,1,1) and ζ=(1,-0.3,0). The residual censoring time is generated from an exponential distribution with parameter λC=λC0expX1+2X2, where λC0 is such that the censoring rate is 30% or 45%. The truncation variable remains the same with a truncation rate of 30%. As previous, the values of τ are set to be 0.69 and 1.39, which are approximately the 45th and 80th percentiles of the failure time, respectively. Table 3 summarizes the simulation results. Likewise, we conduct simulations with the log link function and under covariate-dependent censoring. The details are provided in the Appendix with results summarized in Table B2.

TABLE 2.

Simulation results under proportional hazards and covariate-dependent censoring, and with linear link function

n cen% τ True Unadjusted PO method
IPCW-adjusted PO method
RB SD SE CP MSE RB SD SE CP MSE
350 30% 0.69 β 0 0.500 0.001 0.043 0.033 0.908 0.001 −0.008 0.037 0.030 0.910 0.001
β 1 −0.087 0.023 0.083 0.064 0.950 0.004 −0.080 0.061 0.056 0.933 0.003
β 2 −0.188 −0.059 0.086 0.077 0.931 0.006 0.048 0.086 0.074 0.959 0.006
1.39 β 0 0.750 0.012 0.072 0.061 0.927 0.004 −0.019 0.064 0.052 0.926 0.003
β 1 −0.205 −0.020 0.116 0.102 0.961 0.010 −0.083 0.108 0.093 0.945 0.009
β 2 −0.391 −0.118 0.161 0.122 0.903 0.017 0.046 0.130 0.125 0.980 0.016
350 45% 0.69 β 0 0.500 0.016 0.042 0.037 0.905 0.001 −0.006 0.048 0.034 0.900 0.001
β 1 −0.087 0.011 0.076 0.065 0.958 0.004 0.023 0.090 0.068 0.957 0.005
β 2 −0.188 −0.213 0.087 0.078 0.839 0.008 −0.027 0.110 0.087 0.942 0.008
1.39 β 0 0.750 0.028 0.106 0.072 0.941 0.006 −0.004 0.074 0.058 0.914 0.003
β 1 −0.205 −0.078 0.152 0.116 0.951 0.014 −0.001 0.136 0.115 0.957 0.013
β 2 −0.391 −0.312 0.142 0.129 0.785 0.032 −0.020 0.177 0.168 0.929 0.028
500 30% 0.69 β 0 0.500 −0.001 0.047 0.028 0.911 0.001 −0.012 0.039 0.027 0.926 0.001
β 1 −0.087 −0.003 0.070 0.052 0.952 0.003 −0.069 0.068 0.052 0.935 0.003
β 2 −0.188 −0.048 0.106 0.067 0.933 0.005 0.043 0.090 0.068 0.956 0.005
1.39 β 0 0.750 0.015 0.060 0.049 0.914 0.003 −0.016 0.056 0.044 0.906 0.002
β 1 −0.205 −0.024 0.091 0.084 0.956 0.007 −0.073 0.083 0.079 0.945 0.006
β 2 −0.391 −0.113 0.123 0.102 0.891 0.012 0.054 0.103 0.105 0.965 0.011
500 45% 0.69 β 0 0.500 0.014 0.045 0.032 0.909 0.001 0.002 0.038 0.028 0.916 0.001
β 1 −0.087 0.023 0.086 0.058 0.950 0.003 0.003 0.059 0.053 0.955 0.003
β 2 −0.188 −0.197 0.079 0.070 0.850 0.006 −0.021 0.086 0.071 0.950 0.005
1.39 β 0 0.750 0.025 0.091 0.062 0.929 0.004 −0.005 0.061 0.047 0.911 0.002
β 1 −0.205 −0.127 0.111 0.095 0.946 0.010 0.010 0.109 0.099 0.946 0.010
β 2 −0.391 −0.315 0.136 0.112 0.712 0.028 −0.023 0.161 0.149 0.928 0.022

Note: Estimates obtained by using the traditional pseudo-observation (PO) approach and the inverse probability of censoring weighting (IPCW)-adjusted PO approach are compared.

TABLE 3.

Simulation results under nonproportional hazards and covariate-dependent censoring, and with linear link function

n cen% τ True Unadjusted PO method
IPCW-adjusted PO method
RB SD SE CP MSE RB SD SE CP MSE
350 30% 0.69 β 0 0.497 −0.038 0.035 0.036 0.962 0.002 −0.028 0.031 0.034 0.976 0.001
β 1 0.126 0.103 0.041 0.042 0.955 0.002 0.056 0.038 0.040 0.967 0.002
β 2 0.109 0.110 0.040 0.042 0.963 0.002 0.073 0.039 0.041 0.966 0.002
1.39 β 0 0.568 −0.125 0.051 0.055 0.808 0.008 −0.086 0.047 0.054 0.922 0.005
β 1 0.437 0.117 0.076 0.072 0.895 0.008 0.055 0.077 0.071 0.927 0.006
β 2 0.310 0.165 0.075 0.071 0.887 0.008 0.135 0.074 0.074 0.921 0.007
350 45% 0.69 β 0 0.497 −0.048 0.036 0.037 0.960 0.002 −0.038 0.033 0.036 0.971 0.002
β 1 0.126 0.151 0.044 0.043 0.952 0.002 0.095 0.042 0.042 0.958 0.002
β 2 0.109 0.119 0.043 0.043 0.961 0.002 0.101 0.041 0.043 0.960 0.002
1.39 β 0 0.568 −0.171 0.054 0.059 0.665 0.013 −0.104 0.053 0.057 0.894 0.007
β 1 0.437 0.185 0.076 0.077 0.829 0.012 0.073 0.078 0.076 0.937 0.007
β 2 0.310 0.206 0.077 0.073 0.858 0.009 0.152 0.081 0.079 0.905 0.008
500 30% 0.69 β 0 0.497 −0.034 0.030 0.029 0.954 0.001 −0.032 0.028 0.029 0.961 0.001
β 1 0.126 0.087 0.036 0.035 0.963 0.001 0.063 0.036 0.035 0.961 0.001
β 2 0.109 0.101 0.035 0.035 0.955 0.001 0.110 0.034 0.035 0.957 0.001
1.39 β 0 0.568 −0.127 0.043 0.046 0.701 0.007 −0.090 0.042 0.045 0.856 0.005
β 1 0.437 0.112 0.062 0.061 0.872 0.006 0.050 0.062 0.060 0.936 0.004
β 2 0.310 0.181 0.062 0.059 0.837 0.007 0.139 0.061 0.061 0.906 0.005
500 45% 0.69 β 0 0.497 −0.046 0.029 0.031 0.952 0.001 −0.018 −0.036 0.030 0.969 0.001
β 1 0.126 0.135 0.036 0.036 0.957 0.002 0.087 0.033 0.035 0.964 0.001
β 2 0.109 0.101 0.035 0.036 0.959 0.001 0.101 0.033 0.036 0.962 0.001
1.39 β 0 0.568 −0.169 0.045 0.049 0.495 0.012 −0.104 0.042 0.049 0.858 0.006
β 1 0.437 0.176 0.066 0.064 0.767 0.010 0.073 0.062 0.063 0.931 0.005
β 2 0.310 0.206 0.064 0.060 0.800 0.008 0.132 0.065 0.067 0.912 0.006

Note: Estimates obtained by using the traditional pseudo-observation (PO) approach and the inverse probability of censoring weighting (IPCW)-adjusted PO approach are compared.

Tables 2 and 3 show that the IPCW-adjusted PO approach can substantially reduce the bias after accounting for covariate-dependent censoring, especially when the censoring rate is high or when the RMST is computed at a larger τ. For example, in Table 2, under the scenario of n=500, censoring rate of 45% and τ=1.39, the relative bias is reduced from 0.169 to 0.104 for the estimate of the intercept β0, from 0.176 to 0.073 for the estimate of β1, and from 0.206 to 0.132 for the estimate of β2. Overall, the estimated standard errors (SEs) are close to the empirical standard deviations (SDs) in all scenarios. As the sample size increases from 350 to 500, SEs and SDs decrease. Moreover, the coverage probabilities under the IPCW-adjusted PO approach are closer to the nominal level of 95%, with some undercoverage in the estimation of the intercept, β0. The estimation of the regression coefficients is generally reliable. Similarly, Table B2 shows that the IPCW-adjusted PO approach greatly improves the estimation under various scenarios with the log link function. The substantial bias reduction by using the IPCW-adjusted PO approach suggests that when the censoring mechanism is more complicated than covariate-independent censoring (eg, covariate-dependent censoring), which is often the case in many applications, the proposed method with IPCW adjustment outperforms the unadjusted PO approach. Although the IPCW-adjusted PO approach substantially reduces the bias, the bias is still relatively large at τ=1.39 in Tables 3 and B2, especially for the intercept β0, similar to that observed in the lower panels of Tables 1 and B1. This is probably due to the low precision of RMST estimator at the tail and the sparse data information at early event times under left truncation.

5 |. APPLICATION

The surveillance, epidemiology, and end results (SEER)-medicare linked database is a population-based cancer registry that provides data for prevalent cohorts, which include patients who have already been diagnosed with cancers. We identified a prevalent cohort from the SEER-medicare linked database that consists of patients diagnosed with stage IV breast cancer from 2002 to 2006 and survived beyond 2006 with a last follow-up date of December 31, 2010.13 This prevalent cohort included 933 patients with complete information on receptors for either estrogen (ER) or progesterone (PR) in the tumor, receipt of chemotherapy, age at diagnosis, vital status, and death/last contact dates. The truncation time was the time from the breast cancer diagnosis to study enrollment, and the failure time was the overall survival after the breast cancer diagnosis. We apply the proposed method to directly model RMST and investigate the impact of chemotherapy, ER/PR status, and age at diagnosis on RMST among patients with stage IV breast cancer.

Among the 933 patients, 707 (75.8%) experienced failure events and 226 (24.2%) were censored by the end of the study, 465 (49.8%) received chemotherapy, and 791 (84.8%) patients were ER/PR positive. Figure 1 presents the survival function estimated by the product-limit estimator for left-truncated right-censored data as well as the corresponding nonparametric RMST estimator by integrating the survival curve at varying values of τ, among patients with and without chemotherapy. Figure 2 presents the estimated survival curve and RMST curve by ER/PR status. Insummary, these figures show that the receipt of chemotherapy and positive ER/PR status tend to result in a longer RMST. Since there is no formal analytical tool for testing the proportional hazards assumption for left-truncated right-censored data in literature, the validity of the proportional hazards assumption cannot be rigorously checked and the analysis based on direct modeling of RMST is more applicable. Moreover, a preliminary Cox regression analysis of residual censoring time suggests that the residual censoring time is independent of covariates, and thus, the proposed method that assumes covariate-independent censoring is used in the analysis.

FIGURE 1.

FIGURE 1

Product-limit estimator of survival function (left panel) and the nonparametric RMST estimator (right panel) by receipt of chemotherapy (chemo = 1, receiving chemotherapy; chemo = 0, not receiving chemotherapy)

FIGURE 2.

FIGURE 2

Product-limit estimator of survival function (left panel) and the nonparametric RMST estimator (right panel) for patients by ER/PR status (ER/PR = 1, positive; ER/PR = 0, negative)

The regression model of RMST at τ=2,5, and 8 years postdiagnosis are considered, which are reasonable time points for this study of stage IV breast cancer. We use both the linear and log link functions for the regression model and include two binary variables for the receipt of chemotherapy and ER/PR status and one continuous variable for age at diagnosis as covariates. Table 4 summaries the regression analysis results. Overall, the covariate effects demonstrate similar trends between the two link functions. In the model with the linear link, the receipt of chemotherapy and positive ER/PR status are significantly associated with a longer average postdiagnosis survival time for all the values of τ. Older age at diagnosis tends to be associated with a shorter survival time and such association becomes significant at a later time point (τ= 8 years). In the model with the log link function, chemotherapy is marginally associated with an increase of the average postdiagnosis survival time, positive ER/PR status is significantly associated with an increase of the survival time, and older age at diagnosis is significantly associated with a decrease of the survival time during the next 8 years postdiagnosis. Specifically, it is estimated that the receipt of chemotherapy is associated with an increase of the survival time by 0.92 years (95% CI: 0.22–1.61) on average during the next 5 years postdiagnosis, using the linear link, and chemotherapy is associated with an increase of the survival time by a factor of 1.31 (95% CI: 0.96–1.79), using the log link. The positive ER/PR status is estimated to be associated with an increase of the average survival time by 1.78 years (95% CI: 0.84–2.72) during the next 5 years postdiagnosis using the linear link and is associated with an increase of the survival time by a factor of 2.04 (95% CI: 1.21–3.44), using the log link. During the next 5 years postdiagnosis, it is estimated that every 1 year increase in the age at diagnosis is associated with a decrease of the survival time by 0.05 years (95% CI: 0.00, 0.10) on average with the linear link and is associated with a decrease of the survival time by a factor of 0.98 (95% CI: 0.96, 1.00) with the log link. Additionally, we can estimate the average postdiagnosis survival time based on these models. For patient without chemotherapy, having negative ER/PR status, and with an age at diagnosis of 50, the average postdiagnosis survival time out of the next 5 years is estimated to be approximately 1.42 years (17.0 months) and 1.62 years (19.5 months), using the linear link and the log link, respectively. For another patient receiving chemotherapy, having positive ER/PR status, and with the same age at diagnosis, the average postdiagnosis survival time out of the next 5 years is estimated as 4.11 years (49.3 months) and 4.33 years (52.0 months), using the linear link and the log link, respectively.

TABLE 4.

Estimated covariate effects with 95% confidence intervals (CIs) and P-values at various values of τ (years) for the prevalent cohort from the SEER-medicare data

Linear link
Chemo
ER/PR
Age
τ βˆ * CI P βˆ CI P βˆ CI P
2 0.49 (0.11, 0.87) 0.01 0.80 (0.29, 1.32) <0.01 −0.02 (−0.04, 0.01) 0.20
5 0.92 (0.22, 1.61) 0.01 1.78 (0.84, 2.72) <0.01 −0.05 (−0.10, 0.00) 0.05
8 1.13 (0.28, 1.97) 0.01 2.16 (1.02, 3.31) <0.01 −0.07 (−0.13, −0.01) 0.02
Log link
Chemo
ER/PR
Age
τ eβ^ CI P eβ^ CI P eβ^ CI P
2 1.31 (0.98, 1.75) 0.07 1.64 (1.09, 2.48) 0.02 0.99 (0.97, 1.01) 0.25
5 1.31 (0.96, 1.79) 0.09 2.04 (1.21, 3.44) 0.01 0.98 (0.96, 1.00) 0.07
8 1.32 (0.96, 1.82) 0.09 2.04 (1.20, 3.48) 0.01 0.98 (0.96, 1.00) 0.04

Note: The linear link function (estimates are additive effects on RMST) and the log link function (estimates are multiplicative effects on RMST) are used.

*

regression parameter estimate

95% confidence interval

P-value.

To compare the estimation results between the two link functions visually, Figure 3 presents the estimated RMST curve at τ=5 years against the age at diagnosis, by using the two link functions. For both link functions, RMST decreases as age at diagnosis increases. It is also observed that the discrepancy of the estimated RMST between the two link functions is overall small. Figure 4 presents the estimated RMST curves by the combination of receipt of chemotherapy, ER/PR status, and age at diagnosis, using the nonparametric method, multivariable regression model of RMST with the linear link, multivariable regression model of RMST with the log link, and by integrating the survival curve estimated from the multivariable Cox model. To facilitate the comparison with the nonparametric method, age at diagnosis is dichotomized into a binary covariate (<70 and ≥70) in the multivariable models. Left truncation is adjusted in all the methods. The number of patients without chemotherapy and with negative ER/PR status is very small: only four patients with age at diagnosis <70 and 24 patients with age at diagnosis ≥70. The nonparametric method would not be reliable in these cases, and thus, they are not included in Figure 4 for the comparison. Figure 4 shows that the direct modeling of RMST with the linear link and log link functions gives similar results in general. When the number of patients is relatively large, RMST estimated by the direct regression model is in better agreement with the nonparametric estimate, comparing with RMST estimated by the Cox model. Between the two link functions, although the linear link may be more appealing due to its straightforward interpretation, it does not always lead to estimated RMST values within an admissible range (0,τ],8 as shown in Figure 3 where negative values of estimated RMST appear as age at diagnosis increases, for patients without chemotherapy and with negative ER/PR status. This suggests that the regression model with the log link function may be a better fit for the observed data in this study.

FIGURE 3.

FIGURE 3

Estimated RMST during the next five years post diagnosis (τ= 5 years) against the age at diagnosis using two link functions. “Ref” represents the reference patients without chemotherapy and negative ER/PR status, and “chemo&ER/PR” represents patients with chemotherapy and positive ER/PR status

FIGURE 4.

FIGURE 4

Estimated RMST by the combination of receipt of chemotherapy, ER/PR status, and age at diagnosis, using the nonparametric method, multivariable regression model of RMST with the linear link, multivariable regression model of RMST with the log link, and integrated multivariable Cox model survival curve. n is the number of patients in each combination

6 |. DISCUSSION

The RMST is an appealing summary measure for survival data due to its simple and clinically meaningful interpretation, therefore, the analysis of RMST has attracted a growing research interest. However, little work is available on regression analysis of RMST for left-truncated right-censored data. As discussed in Lee et al,14 the generalization of existing methods based on weighted estimating equations to left-truncated right-censored data is more challenging and complex. The estimation of weight functions would involve estimating the survival function of failure time, distribution of truncation time, and survival function of residual censoring time. The PO approach has been used in regression analysis of RMST for right-censored data and competing risks data, but it has not been extended to the analysis of RMST under left truncation and right censoring, probably because left truncation and induced informative censoring further complicate such analysis. In this article, we fill the methodological gap by proposing direct regression modeling of RMST under left truncation and general censoring mechanisms and using the PO approach to develop estimation equations for model parameters. The proposed methods have several attractive features. First, we directly model RMST as function of baseline covariates through a generalized linear model with a link function, rather than imposing any restrictive structural assumption such as the proportional hazards. This provides a flexible and robust way to investigate the association between RMST and covariates and to predict patient’s expected survival time in the next τ years. Second, by using the PO approach, left truncation and right censoring are handled in the first step of generating the POs, and standard statistical programs/software can be used in the subsequent GEE analysis once the POs are obtained. Thus, the proposed methods can be readily implemented in practice. Third, we consider various censoring mechanisms and use the IPCW method to properly adjust for potential covariate-dependent censoring or dependent censoring. Lastly, we establish the asymptotic properties of proposed estimators, whereas important theoretical justification is often lacking in many existing work using the POs.3,18,22

For the method under covariate-dependent censoring or dependent censoring, the Cox proportional hazards model is used as a working model for the residual censoring time while other semiparametric models, such as generalized transformation models or accelerated failure time model, may be used to compute the censoring distribution given covariates and subject-specific weight function. Extreme weights may present when using IPCW and such a problem can be handled by weight truncation,34 as pointed out by a reviewer. The bias of a parameter estimate will increase and its variance will decrease, as the weights are progressively truncated.35 The range of weights in our simulations is (1, 45.0) under proportional hazards and (1,40.5) under nonproportional hazards. Although we didn’t encounter highly extreme values of weights, we have conducted additional simulations with weights truncated at the 1st and 99th percentiles and evaluated the bias-variance tradeoff. The result shows a modest decrease in variance estimates and a relatively large increase in relative bias. For example, under proportional hazards and covariate-dependent censoring with a censoring rate of 45%, the relative bias changes from −0.020 to −0.122, SD changes from 0.177 to 0.150, and SE changes from 0.168 to 0.134, for the estimation of β2 at τ=1.39 under the linear link. The small improvement in variance reduction due to weight truncation appears to be out-weighted by the relatively large bias induced. Nevertheless, with extreme weights, we recommend using weight truncation and exploring the bias-variance tradeoff. The values of τ are typically prespecified based on the clinical relevance. In the simulations, considerable bias is observed in estimating the intercept in some cases, likely due to the presence of extremely small negative-valued POs that behave as outliers in the subsequent GEE analysis. Similar problems were observed in the simulations in Anderson et al3 and Grand et al.22 The bias problem may be due to the low precision of product-limit estimator at the tail. Moreover, there can be very few subjects at risk at the beginning under left truncation. The sparse information at early event times could cause problems in estimating the survival function, which further affects the estimation of RMST by the integration. Specifically, having fewer subjects at the beginning would result in a big drop of the complete-sample estimates of the survival function S^PL(t) at early event times. Each corresponding leave-one-out estimate, S^PL1(t), on the other hand, is much larger than S^PL(t) by excluding the subject with small risk set right before his/her event time. These contrasts generate negative POs of RMST with potentially large absolute values, leading to bigger bias in regression parameter estimates. We use a “conditional survival function” approach to adjust for such bias and our simulation results show that this approach can substantially reduce the bias.

In the data application, regression models with linear link and log link functions are considered and compared with the nonparametric method for estimating RMST by graphical displays. Other link functions, such as the logistic link, may be considered. The choice of link function would depend on the scientific question of interest and actual data in a specific application. For example, the linear link or log link may be selected if the difference or ratio of RMST is of interest, respectively. Besides the comparison with nonparametric method, we can also use the Akaike’s information criterion, the Bayesian information criterion, and cumulative sum of residuals36 to assess the performances of models with different link functions. In particular, POs are defined for each subject and can therefore be used to construct cumulative sum of residuals analogous to that in a general linear model.36 For right-censored data, Perme and Anderson37 proposed methods for checking hazard regression models using POs, where pseudo-residuals are defined and used for checking the goodness-of-fit of a chosen model. Model diagnostics is crucial in survival analysis and applications. It is our intention in future research to develop rigorous residual-based goodness-of-fit tests for selecting an appropriate link function in regression modeling of RMST for left-truncated right-censored data. The proposed methods are also applicable to regression analysis of RMST for other types of complex survival data, where few alternative methods are available. For example, for regression modeling of the survival function or RMST with clustered survival data, we can compute leave-one-out POs and then use them as outcomes in generalized estimating equations to obtain consistent estimators of model parameters. The GEE sandwich variance estimator can be used to properly adjust for the within cluster correlation. Logan et al proposed a method for modeling the marginal cumulative incidence function for clustered competing risks data using the PO approach.38 In future research, we plan to extend the proposed methods to clustered survival data, such as clustered left-truncated right-censored data, clustered competing risks data under left truncation, and recurrent event data. Moreover, causal inference methods such as the propensity score method could be incorporated into the proposed methods to adjust for confounding. Left truncation and right censoring are handled in generating the POs and then the POs can be used as a replacement for the possibly incompletely observed outcomes when applying standard causal inference methods, such as the propensity score method. Anderson et al proposed to use POs for estimating the average causal effect with right-censored data.39 Incorporating casual inference methods (eg, inverse probability of treatment weighting with propensity scores) into regression modeling of RMST with left-truncated right-censored data based on POs is another interesting direction for our future research.

ACKNOWLEDGEMENTS

This work was supported in part by grants from the National Institutes of Health (2P30CA142543 and R01CA193878). The authors thank the associate editor and three reviewers for their constructive comments that have greatly improved the initial version of this article.

APPENDIX A. PROOF OF THEOREM 1

The following regularity conditions are introduced to establish the asymptotic properties of βˆτ:

C1: For a prespecified time τ,P(A+Cτ)>0.

C2: The baseline covariates X are bounded almost surely.

C3: The matrix (βτ0) is positive definite.

In order to use the general theorem for GEE30 to develop the asymptotic properties of β^τ, Equation (3) needs to be unbiased at the true parameter value βτ0,EUβτ0=0, in addition to the above regularity conditions. This requires the following “asymptotic unbiasedness” of the POs:24

Eμˆi(τ)|Zi=g-1Ziβτ+op(1).

It holds without the remainder term if all failure times are uncensored and untruncated because the POs, μˆi(τ)(i=1,,n), are exactly equal to T˜iτ when the data are complete. In the presence of left truncation and right censoring, we show the “asymptotic unbiasedness” of the POs and asymptotic properties of βˆτ, using techniques similar to those in Graw et al24

Proof.

Let P denote the probability law of the vector of observed data Yi and Pn()=n1i=1(Yi), i=1,,n denote the empirical law corresponding to the sample of left-truncated right-censored observations Y1,.Yn. Further denote by Pn(i) the empirical distribution of the reduced sample Y1,,Yi-1, Yi+1,,Yn. The RMST functional ϕ operates on a set P of probability measures for Yi that includes P and the empirical measures.40,41 It is defined such that ϕ(P)=μ(τ) is the parameter of interest and ϕPn=μˆ(τ) the RMST estimate corresponding to the observed data Y1,,Yn. Thus, the POs can be expressed as μˆi(τ)=nϕPn-(n-1)ϕ(Pn(i)). We use the von Mises expansion on a smooth statistical functional ϕ:41

ϕ(Pn)=ϕ(P)+n1k=1nϕ˙(Yk)+12n2k=1nj=1nϕ¨(Yk,Yj)+OP(n32), (A1)

where ϕ˙ and ϕ¨ are the first and second order influence functions 42 of the functional ϕ. The first order influence function is centered, Eϕ˙Yi=0.43 The second order influence function is symmetric, ϕ¨Yi,Yj=ϕ¨Yj,Yi, and should satisfy for every t44

Eϕ¨Yi,t=ϕ¨(y,t)dP(y)=0. (A2)

From Equation (A1), we have:

nϕPn-(n-1)ϕPn(i)=ϕ(P)+ϕ˙Yi+1n-1k=1nϕ¨Yk,Yi+OP(1),

as shown in Graw et al24 By the law of large numbers, 1n-1k=1nϕ¨Yk,Yi converges to Eϕ¨Yi,t, which equals to 0 by Equation (A2). Thus, for the smooth statistical functional ϕ with a second order von Mises expansion as in (A1) such that Equation (A2) holds, the POs can be represented by:

nϕPn-n-1ϕPni=ϕP+ϕ˙Yi+OP1.

For left-truncated right-censored data, the RMST estimate obtained by integrating out the product-limit estimator of the survival function is consistent, under the assumption that the residual censoring time C is independent of (T˜,A,X). James40 discussed the property of the second order influence function of the Kaplan-Meier functional and by similar arguments, we can show that the RMST functional ϕ also has the second order von Mises expansion as in (A1). Essentially, we have shown that, under the regularity conditions and covariate-independent censoring assumption, the POs of RMST for left-truncated right-censored data can be represented as

μˆiτ=ϕ˙Yi+μτ+OP1. (A3)

This leads to important properties of POs as follows:

  1. μˆi(τ)(i=1,,n) can be approximated by independent and identically distributed variables.

  2. Eμˆi(τ)=μ(τ)+OP1, for all i=1,,n.

Since any estimator of μ(τ)=ET˜τ is also implicitly an estimator of EZET˜τ|Z, similarly, we have

Eμˆi(τ)|Zi=μτ|Zi+OP(1),foralli=1,,n.

Therefore, Uβτ0 is an asymptotically unbiased estimation equation. Based on Equation (A3), Uβτ can be approximated by a sum of independent and identical distributed random variables. Following the arguments in Graw et al24 and by Liang and Zeger,30 βˆτ is consistent to βτ0, and n(βˆτ-βτ0) is asymptotically normal with mean zero and a covariance matrix that can be estimated using a standard “sandwich” estimator, which takes the form

Σ^=(β^τ)1var^{U(βτ)}(β^τ)1,

with

(β^τ)=i{g1(Ziβ^τ)β^τ}𝒱i1{g1(Ziβ^τ)β^τ},var^{U(βτ)}=iUi(β^τ)Ui(β^τ)T.

APPENDIX B. ADDITIONAL SIMULATIONS AND SUPPLEMENTARY TABLES

We conduct additional simulations to assess the performance of the proposed methods for RMST model with the log link function. We randomly assign each subject to two groups, A and B, with equal probability. Group A is treated as the reference. The covariate X1 is binary and equals to 1 for subjects in group B and equals to 0 for subjects in group A. The assumed model for RMST is logμτx1=logET˜τ|X1=x1=βτ0+βτ1x1 with the log link function. The failure time data are generated both under proportional hazards and under nonproportional hazards. For each scenario, the simulation was repeated 1000 times. First, we evaluate the performance of the proposed method under covariate-independent censoring and with the log link function. The data generating process is similar to the case with the linear link in Section 4.1. Table B1 summarizes the simulation results. Second, we evaluate the proposed methods under covariate-dependent censoring and with the log link function. Under proportional hazards, the failure time T˜ is generated from a distribution with hazard function λt|X1=x1=expγx1, where γ=0.5. The residual censoring time is generated from an exponential distribution with parameter λC=λC0exp(4X1). Varying λC0 allows for various levels of censoring (ie, censoring rates of 30% and 45%). Under nonproportional hazards, the failure time is generated from a distribution with hazard function λ(t|Z=z)=exp-zγ+zζlog(8t), where Z=(1,X1), γ=(0.5,1) and ζ=(1,-0.3). The residual censoring time is generated from an exponential distribution with parameter λC=λC0exp(X1), where λC0 is such that the censoring rate is 30% or 45%. The truncation variable follows the same Weibull distribution as in Section 4.1, with a truncation rate of 30%. Table B2 summarizes the simulation results.

Table B3 presents the simulation results where the “conditional survival function” approach is used to adjust for the bias observed at a larger τ=1.39.

TABLE B1.

Simulation results under covariate-independent censoring and with log link function

Proportional hazards
n τ True 30% censoring rate
45% censoring rate
RB1 SD2 SE3 CP4 MSE5 RB SD SE CP MSE
350 0.69 β0 −0.693 0.015 0.089 0.068 0.900 0.005 0.009 0.085 0.068 0.904 0.005
β1 −0.191 −0.015 0.147 0.144 0.967 0.021 0.019 0.154 0.144 0.957 0.021
1.39 β0 −0.288 0.059 0.102 0.079 0.928 0.007 0.042 0.099 0.078 0.917 0.006
β1 −0.320 0.021 0.213 0.184 0.967 0.034 0.031 0.207 0.201 0.963 0.040
500 0.69 β0 −0.693 0.012 0.074 0.058 0.921 0.003 0.010 0.074 0.058 0.916 0.003
β1 −0.191 0.042 0.162 0.136 0.962 0.019 −0.001 0.120 0.120 0.964 0.014
1.39 β0 −0.288 0.043 0.082 0.066 0.924 0.005 0.045 0.082 0.065 0.928 0.004
β1 −0.320 0.034 0.168 0.146 0.964 0.021 0.013 0.161 0.146 0.955 0.021
Nonproportional hazards
30% censoring rate
45% censoring rate
n τ True RB SD SE CP MSE RB SD SE CP MSE
350 0.69 β0 −0.700 0.032 0.059 0.062 0.965 0.004 0.033 0.062 0.064 0.969 0.005
β1 0.225 0.040 0.064 0.068 0.972 0.005 0.040 0.070 0.071 0.956 0.005
1.39 β0 −0.566 0.122 0.081 0.095 0.982 0.014 0.120 0.083 0.098 0.979 0.014
β1 0.571 0.075 0.092 0.103 0.984 0.012 0.071 0.093 0.106 0.980 0.013
500 0.69 β0 −0.700 0.037 0.052 0.053 0.974 0.003 0.033 0.054 0.055 0.975 0.004
β1 0.225 0.059 0.057 0.059 0.975 0.004 0.048 0.060 0.060 0.974 0.004
1.39 β0 −0.566 0.125 0.067 0.079 0.950 0.011 0.131 0.095 0.088 0.950 0.013
β1 0.571 0.074 0.074 0.085 0.969 0.009 0.078 0.104 0.094 0.969 0.011

TABLE B2.

Simulation results under covariate-dependent censoring and with log link function (n = 500)

Proportional hazards
cen% τ True Unadjusted PO method
IPCW-adjusted PO method
RB SD SE CP MSE RB SD SE CP MSE
30% 0.69 β0 −0.693 −0.004 0.069 0.059 0.907 0.003 0.012 0.073 0.059 0.900 0.004
β1 −0.191 −0.022 0.128 0.120 0.957 0.014 −0.003 0.118 0.123 0.963 0.015
1.39 β0 −0.288 −0.022 0.081 0.070 0.916 0.005 0.014 0.077 0.064 0.903 0.004
β1 −0.320 −0.113 0.161 0.144 0.928 0.022 0.023 0.139 0.146 0.957 0.021
45% 0.69 β0 −0.693 −0.014 0.076 0.065 0.909 0.004 −0.012 0.063 0.059 0.891 0.004
β1 −0.191 −0.139 0.129 0.121 0.932 0.015 0.001 0.118 0.122 0.959 0.015
1.39 β0 −0.288 −0.056 0.083 0.081 0.932 0.007 −0.022 0.092 0.071 0.889 0.005
β1 −0.320 −0.331 0.144 0.146 0.811 0.033 0.003 0.192 0.201 0.946 0.040
Nonproportional hazards
Unadjusted PO method
IPCW-adjusted PO method
cen% τ True RB SD SE CP MSE RB SD SE CP MSE
30% 0.69 β0 β0.700 0.042 0.052 0.053 0.967 0.004 0.033 0.051 0.052 0.958 0.003
β1 0.225 0.053 0.056 0.058 0.968 0.004 0.034 0.058 0.057 0.955 0.003
1.39 β0 −0.566 0.185 0.069 0.079 0.839 0.017 0.148 0.067 0.078 0.942 0.013
β1 0.571 0.108 0.078 0.085 0.944 0.011 0.084 0.074 0.085 0.976 0.010
45% 0.69 β0 −0.700 0.046 0.051 0.054 0.969 0.004 0.042 0.050 0.054 0.982 0.004
β1 0.225 0.045 0.057 0.059 0.976 0.004 0.057 0.056 0.059 0.974 0.004
1.39 β0 −0.566 0.233 0.076 0.084 0.751 0.024 0.158 0.068 0.083 0.925 0.015
β1 0.571 0.128 0.087 0.090 0.922 0.013 0.081 0.078 0.090 0.974 0.010

Note: Estimates obtained by using the traditional pseudo-observation (PO) approach and the inverse probability of censoring weighting (IPCW)-adjusted PO approach are compared.

TABLE B3.

Simulation results under nonproportional hazards and covariate-independent censoring, with adjustment by the “conditional survival function” approach

Linear link
n τ True 30% censoring rate
45% censoring rate
RB SD SE CP MSE RB SD SE CP MSE
350 1.39 β0 0.568 0.003 0.036 0.039 0.960 0.002 0.004 0.039 0.042 0.957 0.002
β1 0.437 0.007 0.048 0.050 0.967 0.003 0.008 0.051 0.053 0.961 0.003
500 1.39 β0 0.568 0.007 0.033 0.032 0.945 0.001 0.006 0.034 0.035 0.961 0.001
β1 0.437 <0.001 0.042 0.041 0.942 0.002 0.001 0.043 0.045 0.956 0.002
Log link
30% censoring rate
45% censoring rate
n τ True RB SD SE CP MSE RB SD SE CP MSE
350 1.39 β0 −0.566 −0.003 0.067 0.068 0.958 0.005 −0.012 0.070 0.074 0.956 0.006
β1 0.571 0.003 0.072 0.075 0.960 0.006 −0.004 0.077 0.081 0.966 0.007
500 1.39 β0 −0.566 −0.007 0.054 0.057 0.957 0.003 −0.002 0.059 0.062 0.959 0.004
β1 0.571 −0.001 0.058 0.063 0.969 0.004 0.004 0.062 0.068 0.965 0.005

Footnotes

CONFLICT OF INTEREST

The authors declare no potential conflict of interests.

DATA AVAILABILITY STATEMENT

The data that are used in our application study are available from the National Cancer Institute. We are not able to share the SEER-medicare breast cancer data due to the data sharing policy of the National Cancer Institute. Data are available at https://healthcaredelivery.cancer.gov/seermedicare/ with the permission of the National Cancer Institute.

REFERENCES

  • 1.Zhao H, Tsiatis AA. A consistent estimator for the distribution of quality adjusted survival time. Biometrika. 1997;84(2):339–348. [Google Scholar]
  • 2.Chen PY, Tsiatis AA. Causal inference on the difference of the restricted mean lifetime between two groups. Biometrics. 2001;57(4):1030–1038. [DOI] [PubMed] [Google Scholar]
  • 3.Andersen PK, Hansen MG, Klein JP. Regression analysis of restricted mean survival time based on pseudo-observations. Lifetime Data Anal. 2004;10:335–350. [DOI] [PubMed] [Google Scholar]
  • 4.Karrison T Restricted mean life with adjustment for covariates. J Am Stat Assoc. 1987;82(400):1169–1176. [Google Scholar]
  • 5.Zucker DM. Restricted mean life with covariates: modification and extension of a useful survival analysis method. J Am Stat Assoc. 1998;93(442):702–709. [Google Scholar]
  • 6.Zhang M, Schaubel DE. Estimating differences in restricted mean lifetime using observational data subject to dependent censoring. Biometrics. 2011;67(3):740–749. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Tian L, Zhao L, Wei LJ. Predicting the restricted mean event time with the subject’s baseline covariates in survival analysis. Biostatistics. 2014;15(2):222–233. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Wang X, Schaubel DE. Modeling restricted mean survival time under general censoring mechanisms. Lifetime Data Anal. 2018;24(1):176–199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Wang MC, Brookmeyer R, Jewell NP. Statistical models for prevalent cohort data. Biometrics. 1993;49(1):1–11. [PubMed] [Google Scholar]
  • 10.Lai TL, Ying Z. Rank regression methods for left-truncated and right-censored data. Ann Stat. 1991;19(2):531–556. [Google Scholar]
  • 11.Alioum A, Commenges D. A proportional hazards model for arbitrarily censored and truncated data. Biometrics. 1996;52(2):512–524. [PubMed] [Google Scholar]
  • 12.Huang CY, Ning J, Qin J. Semiparametric likelihood inference for left-truncated and right-censored data. Biostatistics. 2015;16(4):785–798. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Zhu H, Ning J, Shen Y, Qin J. Semiparametric density ratio modeling of survival data from a prevalent cohort. Biostatistics. 2017;18(1):62–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Lee CH, Ning J, Shen Y. Analysis of restricted mean survival time for length-biased data. Biometrics. 2018;74(2):575–583. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Robins JM, Rotnitzky A. Recovery of information and adjustment for dependent censoring using surrogate markers. In: Jewell N, Dietz K, Farewell B, eds. AIDS Epidemiology. Boston, MA: Birkhäuser; 1992:297–331. [Google Scholar]
  • 16.Robins JM. Information recovery and bias adjustment in proportional hazards regression analysis of randomized trials using surrogate markers. Proc Biopharm Sect Am Stat Assoc. 1993:24–33. [Google Scholar]
  • 17.Robins JM, Finkelstein DM. Correcting for noncompliance and dependent censoring in an AIDS clinical trial with inverse probability of censoring weighted (IPCW) log-rank tests. Biometrics. 2000;56:779–788. [DOI] [PubMed] [Google Scholar]
  • 18.Xiang F, Murray S. Restricted mean models for transplant benefit and urgency. Stat Med. 2012;31(6):561–576. [DOI] [PubMed] [Google Scholar]
  • 19.Andersen PK, Klein JP, Rosthøj S. Generalised linear models for correlated pseudo-observations, with applications to multi-state models. Biometrika. 2003;90(1):15–27. [Google Scholar]
  • 20.Klein JP, Andersen PK. regression modeling of competing risks data based on pseudovalues of the cumulative incidence function. Biometrics. 2005;61(1):223–229. [DOI] [PubMed] [Google Scholar]
  • 21.Klein JP. Modelling competing risks in cancer studies. Stat Med. 2006;25(6):1015–1034. [DOI] [PubMed] [Google Scholar]
  • 22.Grand MK, Putter H, Allignol A, Andersen PK. A note on pseudo-observations and left-truncation. Biometr J. 2019;61(2):290–298. [DOI] [PubMed] [Google Scholar]
  • 23.Guo C, Liang Y. Analyzing Restricted Mean Survival Time Using SAS/STAT. Cary, NC: SAS Institute Inc; 2019. [Google Scholar]
  • 24.Graw F, Gerds TA, Schumacher M. On pseudo-values for regression analysis in competing risks models. Lifetime Data Anal. 2009;15(2):241–255. [DOI] [PubMed] [Google Scholar]
  • 25.Overgaard M, Parner ET, Pedersen J. Asymptotic theory of generalized estimating equations based on jack-knife pseudo-observations. Ann Stat. 2017;45(5):1988–2015. [Google Scholar]
  • 26.Overgaard M, Parner ET, Pedersen J. Pseudo-observations under covariate-dependent censoring. J Stat Plan Infer. 2019;202(8): 112–122. [Google Scholar]
  • 27.Wolfson C, Wolfson DB, Asgharian M, et al. A reevaluation of the duration of survival after the onset of dementia. N Engl J Med. 2001;344(15):1111–1116. [DOI] [PubMed] [Google Scholar]
  • 28.Tsai WY, Jewell NP, Wang MC. A note on the product-limit estimator under right censoring and left truncation. Biometrika. 1987;74(4):883–886. [Google Scholar]
  • 29.Wang MC. Nonparametric estimation from cross-sectional survival data. J Am Stat Assoc. 1991;86(413):130–143. [Google Scholar]
  • 30.Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73(1):13–22. [Google Scholar]
  • 31.Zeger SL, Liang KY. Longitudinal data analysis for discrete and continuous outcomes. Biometrics. 1986;42(1):121–130. [PubMed] [Google Scholar]
  • 32.Andersen PK, Perme MP. Pseudo-observations in survival analysis. Stat Methods Med Res. 2010;19(1):71–99. [DOI] [PubMed] [Google Scholar]
  • 33.Willems S, Schat A, van Noorden M, Fiocco M. Correcting for dependent censoring in routine outcome monitoring data by applying the inverse probability censoring weighted estimator. Stat Methods Med Res. 2018;27(2):323–335. [DOI] [PubMed] [Google Scholar]
  • 34.Kish L. Weighting for unequal Pi. J Offic Stat. 1992;8(2):183–200. [Google Scholar]
  • 35.Cole SR, Hernán MA. Constructing inverse probability weights for marginal structural models. Am J Epidemiol. 2008;168(6):656–664. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Lin DY, Wei LJ, Ying Z. Model-checking techniques based on cumulative residuals. Biometrics. 2002;58(1):1–12. [DOI] [PubMed] [Google Scholar]
  • 37.Perme MP, Andersen PK. Checking hazard regression models using pseudo-observations. Stat Med. 2008;27(25):5309–5328. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Logan BR, Zhang MJ, Klein JP. Marginal models for clustered time to event data with competing risks using pseudovalues. Biometrics. 2011;67(1):1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Andersen PK, Syriopoulou E, Parner ET. Causal inference in survival analysis using pseudo-observations. Stat Med. 2017;36(17):2669–2681. [DOI] [PubMed] [Google Scholar]
  • 40.James LF. A study of a class of weighted bootstraps for censored data. Ann Stat. 1997;25(4):1595–1621. [Google Scholar]
  • 41.Gill RD. Non- and semi-parametric maximum likelihood estimators and the von Mises method (part I). Scand J Stat. 1989;16:97–128. [Google Scholar]
  • 42.Hampel FR. The influence curve and its role in robust estimation. J Am Stat Assoc. 1974;69(346):383–393. [Google Scholar]
  • 43.Huber PJ. Robust Statistical Procedures. 1st ed. Philadelphia, PA: Society for Industrial and Applied Mathematics; 1977. [Google Scholar]
  • 44.Vaart V. Asymptotic Statistics. 1st ed. Cambridge, UK: Cambridge University Press; 1998. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data that are used in our application study are available from the National Cancer Institute. We are not able to share the SEER-medicare breast cancer data due to the data sharing policy of the National Cancer Institute. Data are available at https://healthcaredelivery.cancer.gov/seermedicare/ with the permission of the National Cancer Institute.

RESOURCES