Pseudo-observations for competing risks with covariate dependent censoring

Nadine Binder; Thomas A Gerds; Per Kragh Andersen

doi:10.1007/s10985-013-9247-7

. Author manuscript; available in PMC: 2015 Sep 18.

Published in final edited form as: Lifetime Data Anal. 2013 Feb 22;20(2):303–315. doi: 10.1007/s10985-013-9247-7

Pseudo-observations for competing risks with covariate dependent censoring

Nadine Binder ^1,^✉, Thomas A Gerds ², Per Kragh Andersen ³

PMCID: PMC4573528 NIHMSID: NIHMS678823 PMID: 23430270

Abstract

Regression analysis for competing risks data can be based on generalized estimating equations. For the case with right censored data, pseudo-values were proposed to solve the estimating equations. In this article we investigate robustness of the pseudo-values against violation of the assumption that the probability of not being lost to follow-up (un-censored) is independent of the covariates. Modified pseudo-values are proposed which rely on a correctly specified regression model for the censoring times. Bias and efficiency of these methods are compared in a simulation study. Further illustration of the differences is obtained in an application to bone marrow transplantation data and a corresponding sensitivity analysis.

Keywords: Competing risks, Covariate-dependent censoring, Cumulative incidence, Pseudo-observations

1 Introduction

Survival analysis (and more generally: event history analysis, including competing risks) is characterized by incomplete observation, in particular right-censoring. This has led to the development of special methods for this field which are capable of dealing with censored data. Such methods are summarized in a number of text books including Andersen et al. (1993) and Kalbfleisch and Prentice (2002).

Without censoring the observed survival time, T_i, (or some suitable transformation, such as log(T_i)) could be used as the outcome variable in an ordinary linear regression model, or the indicator I(T_i ≤ t) could be used as outcome in a binary logistic regression model for any fixed time point, t. Furthermore, once a regression model has been set up, its fit can be assessed using standard residual plots and scatter plots.

A step in the direction of doing the same survival analysis with censored observations is provided by means of pseudo-observations [e.g., Andersen et al. (2003); Andersen and Pohar Perme (2010)]. These are defined, as follows. Suppose interest focuses on some function, f(·) of the survival time. For example for a fixed time point t one could consider f(T) = I (T > t) or f(T) = T ∧ t. (Note that, due to right-censoring, in most applications of survival analysis it is often not possible to construct a pseudo-value for the event time itself f(T) = T.)

Let θ̂ be an approximately unbiased estimator of the expectation θ = E(f(T)) based on observing a censored sample [(T̃_i, Δ_i), i = 1,…, n]. Thus, T̃_i = T_i ∧ U_i for some potential right-censoring times U₁, …, U_n and Δ_i = I(T_i ≤ U_i). For f(T) = I (T > t), we have θ = P(T > t) = S(t) and Ŝ(t) would typically be the Kaplan–Meier estimator. The pseudo-observation for subject i is now

{\hat{θ}}_{i} = n \hat{θ} - (n - 1) {\hat{θ}}^{(- i)}, i = 1, \dots, n

with θ̂⁽⁻ⁱ⁾ the estimator applied to the sample of size n − 1 obtained by eliminating subject i from the data.

Note that without censoring, the expectation θ could be estimated by the simple average $(1 / n) \sum_{i = 1}^{n} f (T_{i})$ and the ith pseudo-observation would be simply f(T_i). In that respect, the pseudo-observation θ̂_i is a natural replacement for the incompletely observed random variable f(T_i). This intuition was further strengthened by Graw et al. (2009) who showed equivalence of the uncensored observations and the pseudo-observations with respect to their conditional expectations given covariates, see equation (2) below.

Suppose that failure can be due to a number, k of causes, D = j, j = 1, …, k. The cause j cumulative incidence is then

F_{j} (t) = P (T \leq t, D = j) = E (I (T \leq t, D = j))

and the cumulative incidence can be estimated non-parametrically by the Aalen–Johansen estimator

{\hat{F}}_{j} (t) = \int_{0}^{t} \hat{S} (u -) d {\hat{A}}_{j} (u) .

Here,

{\hat{A}}_{j} (t) = \int_{0}^{t} \frac{\sum_{i = 1}^{n} d N_{i j} (u)}{Y (u)}

is the Nelson–Aalen estimator for the integrated cause-j specific hazard and Ŝ(t) the Kaplan–Meier estimator using all failures. Counting process notation has been introduced: N_ij(t) = I(T̃_i ≤ t, D_i = j, Δ_i = 1) counts observed cause j events for subject i and $Y (t) = \sum_{i = 1}^{n} I ({\tilde{T}}_{i} \geq t)$ is the observed number of subjects still at risk at time t−. Define also the process $N (t) = \sum_{i = 1}^{n} \sum_{j = 1}^{k} N_{i j} (t)$ counting observed events of any cause across the sample.

Klein and Andersen (2005) proposed to use the pseudo-values

{\hat{F}}_{i j} (t) = n {\hat{F}}_{j} (t) - (n - 1) {\hat{F}}_{j}^{(- i)} (t)

to estimate a regression parameter β which quantifies the effect of covariates Z on the cumulative incidence. Specifically, generalized estimating equations (GEE) of the form

U_{n} (β) = \sum_{i = 1}^{n} V (t) {{\hat{F}}_{i j} (t) - g (t, β, Z_{i})} = 0

(1)

were considered, where g is a suitable model, e.g. given via some link function, and V represents weights including a working covariance matrix. In (1) and in what follows a fixed cause j is studied and the fact that β, V etc. may depend on j has been suppressed from the notation.

Graw et al. (2009) then showed that the pseudo-observations have the property

E ({\hat{F}}_{i j} (t) | Z_{i}) = E (I (T_{i} \leq t, D_{i} = j) | Z_{i}) + o_{p} (1) .

(2)

This is a key property for the consistency of the solution to (1). It, however, relies on two assumptions:

P (U > t | Z = z) = P (U > t) = C_{0} (t) (Covariate - independent censoring)

(3)

for a survival distribution function C₀ and

P (U > t) > 0. (Positivity)

(4)

In this article we are concerned with situations in which (3) is violated and the Aalen–Johansen estimator may be biased for F_j(t). Gill (1980, p. 36) noted the relation

\frac{Y (t +)}{n} = \hat{S} (t) \prod_{s \leq t} (1 - \frac{N^{c} (d s)}{Y (s) - N (d s)}) = \hat{S} (t) {\hat{C}}_{0} (t)

where, N^c(t) = I(T̃_i ≤ t, Δ_i = 0) is the counting process for censoring and Ĉ₀ the (unconditional) product limit estimator of C₀(t). This means that the Aalen–Johansen estimator can be represented as a sum:

{\hat{F}}_{j} (t) = \sum_{i = 1}^{n} \int_{0}^{t} \hat{S} (u -) \frac{N_{i j} (d u)}{Y (u)} = \sum_{i = 1}^{n} \int_{0}^{t} \hat{S} (u -) \frac{N_{i j} (d u)}{n \hat{S} (u -) {\hat{C}}_{0} (u -)} = \frac{1}{n} \sum_{i = 1}^{n} \frac{N_{i j} (t)}{{\hat{C}}_{0} ({\hat{T}}_{i} -)} .

(5)

The last equation in (5) follows since we have $\int_{0}^{t} ϕ (u) N_{i j} (d u) = ϕ (T_{i j}) N_{i j} (t)$ for any integrable function ϕ.

If censoring is independent of covariates Z then the Aalen–Johansen estimator is consistent for F_j(t) (Aalen 1978). Now, suppose that U is only conditionally independent of (T, Δ) given Z and denote F̃_j(t|z) = P(T ≤ t, D = j|Z = z) for the conditional cumulative incidence function and also C(t|Z) = P(U > t | Z = z) for the conditional censoring survival distribution. Under conditional independence, we have E(Δ|T = t, Z = z) = C(t − |z) (Begun et al. 1983) which implies E(N_ij(t)|T = t, Z = z) = C(t – |z) F̃_j(t|z). Therefore, in this situation the Aalen–Johansen estimator converges to

E_{Z} {\int_{0}^{t} \frac{C (s - | Z)}{C_{0} (s -)} {\tilde{F}}_{j} (ds | Z)} .

(6)

Thus, if censoring depends on covariates the Aalen–Johansen estimator has a large-sample bias for F_j(t) which is expressed by the difference between F_j(t) and (6). The magnitude of the bias for estimating the marginal survival function was explored in a small simulation study by Andersen and Pohar Perme (2010).

The purpose of the present paper is to study this problem in a competing risks setting. We will examine the bias for estimating regression coefficients and the cumulative incidence when censoring depends on covariates and pseudo-observations are based on the Aalen–Johansen estimator. We will also study to what extent this potential bias can be reduced (or eliminated) when basing pseudo-observations on alternative estimators which are marginally approximately unbiased even in the presence of covariate-dependent censoring. The idea behind such alternative estimators is to use a regression model to estimate C(t | Z). As indicated above, when censoring does depend on covariates and Ĉ is consistent for C then the limit of the modified estimator is

E_{Z} {\int_{0}^{t} \frac{C (s - | Z)}{C (s - | Z)} {\tilde{F}}_{j} (d s | Z)} = E_{z} {{\tilde{F}}_{j} (t | Z)} = F_{j} (t) .

The structure of the paper is as follows. In Sect. 2 alternative, inverse probability of censoring weighted estimators for F_j(t) are introduced and the resulting bias reduction when basing pseudo-observations on such estimators is studied via Monte Carlo simulations in Sect. 3. In Sect. 4 we present a case study, while the final Sect. 5 contains a brief concluding discussion.

2 Alternative estimators

Suppose we have a model for C(t | Z) and a corresponding estimate Ĉ(t | Z) which is consistent if the model is correctly specified. Based on this estimate and motivated by Eqs. (5) and (6) the modified estimator for F_j(t) is defined by:

{\tilde{F}}_{j} (t) = \frac{1}{n} \sum_{i = 1}^{n} \frac{N_{i j} (t)}{{\hat{C}}_{i} ({\tilde{T}}_{i} - | Z_{i})} .

Having obtained such a model, pseudo-observations are calculated in the usual way

{\tilde{F}}_{i j} (t) = n {\tilde{F}}_{j} (t) - (n - 1) {\tilde{F}}_{j}^{(- i)} (t) .

We will study three ways of computing the leave-i-out estimator ${\tilde{F}}_{j}^{(- i)} (t)$ when a standard Cox model

C_{i} (t | Z_{i}) = exp (- B (t) exp (γ Z_{i}))

is used for the censoring distribution, i.e., B(t) is the cumulative censoring baseline hazard and γ is the censoring regression coefficient. The first one is simply to re-fit the censoring model n times by eliminating each subject i = 1, …, n, in turn, to obtain estimators γ̂⁽⁻ⁱ⁾ and B̂⁽⁻ⁱ⁾(·) for the model parameters. Thereby we get

{\tilde{F}}_{j}^{(- i)} (t) = \frac{1}{n - 1} \sum_{k \neq i} \frac{N_{k j} (t)}{exp (- {\hat{B}}^{(- i)} ({\tilde{T}}_{k} -) exp ({\hat{γ}}^{(- i)} Z_{k}))} .

(7)

This approach is computationally expensive as it requires to fit as many Cox models for the censoring times as there are subjects in the data set. Another extreme would be to use the same censoring model for all subjects, i, that is, to fit the censoring model once to obtain estimators γ̂ and B̂(·) and compute the pseudo-value using:

{\tilde{F}}_{j}^{(- i)} (t) = \frac{1}{n - 1} \sum_{k \neq i} \frac{N_{k j} (t)}{exp (- \hat{B} ({\tilde{T}}_{k} -) exp (\hat{γ} Z_{k}))} .

(8)

This leads to F̃_ij(t) = 0 for i without a cause j event observed and this choice is therefore equivalent to the Scheike et al. (2008) “direct binomial regression” approach.

A compromise between (7) and (8) is to re-use the estimates of the regression coefficients γ̂ from the full data analysis, and only to re-estimate the cumulative baseline hazard for the data excluding subject i:

{\hat{B}}^{(- i)} (t) \int_{0}^{t} \frac{\sum_{k \neq i} d N^{c} (u)}{\sum_{k \neq i} Y_{k} (u) exp (\hat{γ} Z_{k})}

n times without one observation, i = 1, …, n. This leads to

{\tilde{F}}_{j}^{(- i)} (t) = \frac{1}{n - 1} \sum_{k \neq i} \frac{N_{k j} (t)}{exp (- {\hat{B}}^{(- i)} ({\tilde{T}}_{k} -) exp (\hat{γ} Z_{k}))} .

(9)

This is computationally less expensive because for given γ̂ the Breslow estimator has an explicit formula.

The results by Graw et al. (2009) on the asymptotic distribution of the solution β̂ to (1) rely on the assumption (3). A necessary condition for consistency of β̂ when using the alternative pseudo-value definitions (7–9) is that the Cox model for the censoring distribution is correctly specified. Under this assumption and under the usual regularity assumptions, Scheike et al. (2008) showed consistency and asymptotic normality of the estimated regression coefficients β̂ when the Cox model was fitted only once and the pseudo-values (8) were used. However, the latter pseudo-values are exactly equal to zero for subjects that are censored before time t. This is not the case for the pseudo-values (7). We thus anticipate that the estimator of the regression coefficients β̂ is more efficient when the Cox model for the censoring times is refitted for all subjects. It is beyond the scope of the present article to derive the asymptotic distribution of the estimator of the regression coefficients in this case. However, it is worth noting that the functional delta method can be applied to show that the estimate (7) is asymptotically linear:

\sqrt{n} ({\tilde{F}}_{j} (t) - F_{j} (t)) = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} I C_{F_{j}} (t) + o_{P} (1),

where IC_{F_j} is the influence function. Based on this representation it seems that the methods of Graw et al. (2009) can be extended to the present situation.

3 Simulation study of bias and efficiency

In this section we will study bias and efficiency for the choices (7–9). Competing risks data were generated according to the Fine-Gray model (Fine and Gray 1999) for the event, “1”, of interest with cumulative incidence function

{\tilde{F}}_{1} (t | Z) = 1 - exp (- Λ_{0} (t) exp (β Z))

(10)

where $Λ_{0} (t) = \int_{0}^{t} λ_{0} (u) d u$ and λ₀(t) = pe⁻^t/(1 – p(1 – e⁻^t)) with probability p of a cause 1 event.

The following scenarios were considered

– p is 0.5 or 0.8
– a single binary covariate Z with P(Z = 1) = 0.25 or 0.5
– covariate effect β ∈ {0, 0.75, 1.25} on the cumulative incidence
– independent exponential censoring with rate = 0.75 (corresponding to approximately 38 % censoring, i.e. moderate) or rate = 1.25 for comparison (approximately 50 % censoring, i.e. high),
– dependent censoring following a Cox model with baseline hazard = 0.5 and covariate effect γ = 0.75 (approximately 38 % censoring) or baseline hazard = 1 andcovariate effect γ = 0.5 (approximately 50 % censoring)
– 500 repetitions of datasets with n = 200 (or n = 500) individuals each.

Pseudo values were calculated at 6 time points equally spaced on the event scale, i.e., timepoints are set after 14, 28, 42, 56, 70, 84 % of collected event status information (i.e., censored events are accounted for as well).

We first briefly examined the bias of the marginal mean E(F̂_i₁(t)). Table 1 shows the results where the average of pseudo-values at 6 time points using different definitions are compared to the true values which are 0.25–0.75 mixtures of F̃₁(t | Z = 1) and F̃₁(t | Z = 0).

Table 1. Estimated marginal mean from the simulation study based on four definitions of pseudo-values; AJ: Aalen–Johansen estimator; SameCens: (8); KeepCox: (9); RefitCox: (7).

	P(Z = 1) = 0.25; n = 200, β = 0.75, p = 0.5

Independent cens. t =	0.08	0.18	0.3	0.45	0.68	1.05
True value	0.04874	0.10337	0.16084	0.22215	0.29778	0.38500
AJ	0.04951	0.10250	0.16077	0.22290	0.29636	0.38337
SameCens	0.04950	0.10250	0.16079	0.22283	0.29617	0.38263
KeepCox	0.04951	0.10252	0.16084	0.22295	0.29643	0.38325
RefitCox	0.04950	0.10250	0.16081	0.22290	0.29636	0.38319

Dependent cens. t =	0.09	0.19	0.32	0.49	0.74	1.16

True value	0.05451	0.10848	0.16965	0.23677	0.31447	0.40479
AJ	0.05235	0.10822	0.16888	0.23201	0.30589	0.39237
SameCens	0.05256	0.10911	0.17101	0.23586	0.31216	0.40238
KeepCox	0.05256	0.10914	0.17109	0.23603	0.31253	0.40327
RefitCox	0.05256	0.10913	0.17106	0.23598	0.31246	0.40327

Open in a new tab

As expected, bias is negligible when censoring does not depend on covariates. However, for covariate-dependent censoring we do see a bias when using pseudo-values based on the Aalen–Johansen estimator, at least for larger values of t, and the bias tends to be eliminated using either of the alternative estimators (7–9). The same tendencies were seen (not shown) for n = 500, for p = 0.8, for β = 1.25 and for larger fractions of censored observations. However, for a symmetrically distributed covariate [P(Z = 1) = 0.5] no bias was seen in any case.

Next, and more importantly, we studied estimation of regression coefficients, β when the cumulative incidence followed the Fine-Gray model (Eq. 10), both when C does and does not depend on covariates (“dependent” and “independent” censoring, respectively). Table 2 shows the results.

Table 2. Means, $\bar{\hat{β}}$ , and SD of estimated regression coefficients from the simulation study and corresponding mean squared errors (MSE) based on four definitions of pseudo-values; AJ: Aalen–Johansen estimator; SameCens: (8); KeepCox: (9); RefitCox: (7).

Independent censoring

Dependent censoring

\bar{\hat{β}}

MSE

\bar{\hat{β}}

MSE

p = 0.5, β = 0.75, moderate censoring

P(Z = 1) = 0.25, n = 200

0.749

0.281

0.080

0.684

0.269

0.068

SameCens

0.751

0.295

0.088

0.746

0.306

0.094

KeepCox

0.749

0.282

0.080

0.745

0.285

0.082

RefitCox

0.747

0.281

0.079

0.742

0.284

0.081

P(Z = 1) = 0.25, n = 500

0.749

0.281

0.031

0.691

0.168

0.025

SameCens

0.751

0.295

0.034

0.753

0.190

0.036

KeepCox

0.749

0.282

0.031

0.752

0.177

0.032

RefitCox

0.747

0.281

0.031

0.751

0.176

0.031

P(Z = 1) = 0.5, n = 200

0.750

0.263

0.069

0.750

0.270

0.073

SameCens

0.750

0.272

0.074

0.750

0.272

0.074

KeepCox

0.750

0.262

0.069

0.749

0.261

0.068

RefitCox

0.748

0.261

0.069

0.747

0.259

0.067

p = 0.8, β = 0.75, moderate censoring

P(Z = 1) = 0.25, n = 200

0.757

0.234

0.055

0.693

0.222

0.046

SameCens

0.758

0.256

0.066

0.754

0.271

0.074

KeepCox

0.756

0.235

0.055

0.752

0.239

0.057

RefitCox

0.755

0.233

0.055

0.749

0.236

0.056

p = 0.5, β = 0.75, higher censoring

P(Z = 1) = 0.25, n = 200

0.749

0.317

0.101

0.687

0.305

0.090

SameCens

0.754

0.333

0.111

0.744

0.346

0.120

KeepCox

0.750

0.317

0.101

0.742

0.325

0.106

RefitCox

0.747

0.316

0.101

0.738

0.323

0.105

p = 0.5, β = 1.25, moderate censoring

P(Z = 1) = 0.25, n = 200

1.251

0.262

0.069

1.159

0.251

0.055

SameCens

1.253

0.279

0.078

1.249

0.291

0.085

KeepCox

1.251

0.262

0.069

1.249

0.265

0.071

RefitCox

1.248

0.259

0.067

1.244

0.260

0.068

Open in a new tab

We see that estimates are unbiased when censoring is independent [and when P(Z = 1) = 0.5]. The estimator based on (8) (“SameCens”), as expected, has a larger variability as a consequence of the fact that observations which do not correspond to observed cause 1 failures do not contribute to the GEE (1). However, when censoring depends on Z, estimates using the Aalen–Johansen estimator are (downward) biased. This bias disappears when replacing the Aalen–Johansen pseudo-values by either (7), (8) or (9). In terms of averages these three estimates perform similarly whereas, as with independent censoring, estimates based on (8) were more variable. The considerably heavier computational burden for using (7) compared to (9) is noteworthy.

4 Case study

4.1 BMT data

In this section we illustrate the impact of choosing different pseudo-values for estimating regression coefficients in real data. We provide a re-analysis of data from a study on leukemia patients (Szydlo et al. 1997). As part of this study, the effects of bone marrow transplantation (BMT) on the risk of relapse were analysed using the data collected for 1715 patients. We analyse the risk of relapse which was observed for 311 patients. The competing risk of relapse-unrelated death was observed for 557 patients. The remaining 847 patients were right-censored at the end of their follow-up time.

Table 3 shows estimated regression coefficients from a Fine-Gray model of the effects on the risk of relapse of the variables: donor type [sibling: 1224 or not sibling (matched and mismatched combined): 491], disease type (ALL:340; AML:537; CML:838), disease stage (early: 1026; intermediate: 410; advanced: 279) and Karnofsky score (negative: 333; positive: 1382). A Cox regression model shows that the hazard of the censoring times significantly depends on the covariates donor type, disease stage and Karnofsky score (Table 4). Despite this dependence we find comparable results with the different pseudo-values (Table 3).

Table 3. Analysis of Fine-Gray regression models for the cumulative incidence of relapse in BMT data: four definitions of pseudo-values; AJ: Aalen–Johansen estimator; SameCens: (8); KeepCox: (9); RefitCox: (7).

	AJ	SameCens	KeepCox	RefitCox
Donor type
Not sibling	−0.569	−0.591	−0.573	−0.574
Disease
AML	−0.176	−0.187	−0.178	−0.182
CML	−0.630	−0.645	−0.628	−0.624
Stage of disease
Intermediate	0.555	0.529	0.548	0.546
Advanced	1.560	1.545	1.564	1.565
Karnofsky score
>90	0.215	0.218	0.208	0.218

Open in a new tab

Table 4. Results from Cox regression of the observed censoring times in the BMT study.

	Coef	95% CI	p value
Donor type
Not sibling	0.432	[0.253; 0.612]	<0.0001
Disease
AML	−0.208	[−0.420; 0.004]	0.054
CML	−0.042	[−0.234; 0.150]	0.6694
Stage of disease
Intermediate	−0.284	[−0.466; −0.101]	0.0023
Advanced	0.214	[−0.085; 0.513]	0.1614
Karnofsky score
>90	−0.312	[−0.528; −0.096]	0.0046

Open in a new tab

The same data have been used to illustrate clinically useful measures in competing risks regression (Gerds et al. 2012) where also the dependence of the censoring times on the covariates was demonstrated.

4.2 Sensitivity analysis

We performed a sensitivity analysis to further investigate if the observed similarities between different pseudo-values for estimating regression coefficients (Table 3) are specific to the current sample or could be expected in this situation. BMT-like data were simulated as follows. Categorical variables were drawn independently from a multinomial distribution with class probabilities equal to the observed class frequencies of the corresponding variable in the original BMT data set. Subsequently, individual competing risks data were generated by first flipping a coin to decide which event occurred. The probability of this binomial experiment was obtained as 1 – (1 − p)^exp(LP) where LP is the linear predictor obtained by multiplying the regression coefficients of the Fine-Gray analysis of relapse in the original BMT data (Table 5) with the simulated covariate matrix. Event times were then drawn from the distribution P(T ≤ t | D = j, Z = z), j ∈ {relapse, death} conditional on the event type. Event times were right-censored if they exceeded the corresponding censoring times which were drawn from the exponential distribution with rates according to the linear predictor obtained with the current covariates and the regression coefficients from the Cox model for the censoring times (Table 4).

Table 5. Average of Fine-Gray regression analyses of the cumulative incidence of relapse in BMT-like simulated data and corresponding mean squared errors in parentheses.

	True values	FG	AJ	SameCens	KeepCox	RefitCox
Donor type
Not sibling	−0.528	−0.596 (0.045)	−0.567 (0.052)	−0.551 (0.069)	−0.552 (0.063)	−0.555 (0.062)
Disease
AML	−0.16	−0.122(0.038)	−0.122(0.053)	−0.148(0.058)	−0.149(0.053)	−0.153 (0.05)
CML	−0.723	−0.724 (0.039)	−0.732 (0.054)	−0.742 (0.063)	−0.743 (0.058)	−0.744 (0.055)
Stage of disease
Intermediate	0.497	0.525 (0.036)	0.51 (0.051)	0.495 (0.049)	0.493 (0.047)	0.492 (0.047)
Advanced	1.484	1.446(0.033)	1.457 (0.041)	1.508 (0.046)	1.506 (0.042)	1.505(0.04)
Karnofsky score
>90	0.145	0.179 (0.045)	0.172(0.058)	0.146(0.072)	0.144 (0.066)	0.142(0.063)

Open in a new tab

Data sets of size 500 were simulated in this way. The various pseudo-values were applied to each data set and regression coefficients were estimated. Table 5 shows average results of 1000 simulation runs. These simulation results show a bias of the standard Fine-Gray analysis and the Aalen–Johansen pseudo-value approach. The pseudo-value approaches which used the (correctly specified) Cox regression model to estimate the conditional censoring distribution reduced the bias for most of the regression parameters, however they had larger variances, indicated by larger mean squared errors.

5 Discussion

We have examined the potential bias when basing inference on the simple Aalen–Johansen pseudo-observations in situations where censoring may depend on covariates. Overall, the bias tended to be small for the scenarios studied and the bias could be eliminated by basing the calculation of pseudo-values on the alternative cumulative incidence estimators (7–9), all of which take the dependent censoring into account.

In terms of bias reduction, the three estimators studied showed similar behavior. However, the estimator (8) (labeled “SameCens” in Tables 1, 2) provided estimators with a larger standard deviation due to the fact that observations in which the failure in question is not observed provide no contributions to the GEE (1). The two remaining estimators (labeled “RefitCox” and “KeepCox” in Tables 1, 2) show variability of similar magnitude and since both are unbiased, this calls for a general recommendation of using the less computationally demanding (9). However, all our results rely on a correct specification of the model for C(t | z) and we did not study robustness to deviations from this assumption. Therefore, a sensitivity analysis comparing results using either Aalen–Johansen pseudo-values or those based on (9) will provide further insight.

In the simulations mimicking the BMT data, the three approaches were able to reduce the bias of the Fine-Gray approach and the Aalen–Johansen pseudo-value approach, both assume that the censoring mechanism is independent of the predictors. However, at the cost of a larger variance. The loss of efficiency can partly be explained by the data reduction induced by selecting a limited number of time points for the pseudo-value approach.

Finally, one may ask why pseudo-values should be used at all when the Fine-Gray model is available, also in versions capable of dealing with covariate-dependent censoring. Pseudo-observations may have their place in competing risks regression because they allow flexible choices of link functions and because, as mentioned in the introduction, they provide the user with an outcome variable which may be used for graphical goodness of fit assessment. As mentioned above, all methods rely on a correct specification of the nuisance model for censoring, so, if at all possible, studies should be planned in such a way that covariates do not affect the distribution of censoring.

Acknowledgments

The research was supported by the Danish Natural Science Research Council [grant number 272-06-0442 “Point process modeling and statistical inference”]. We are grateful to CIBMTR for providing us with the example data [Public Health Service Grant/Cooperative Agreement No. U24-CA76518 from the National Cancer Institute (NCI), the National Heart, Lung and Blood Institute (NHLBI), and the National Institute of Allergy and Infectious Diseases (NIAID)].

Contributor Information

Nadine Binder, Institute of Medical Biometry and Medical Informatics, University Medical Center Freiburg, Stefan-Meier-Str. 26, 79104 Freiburg, Germany.

Thomas A. Gerds, Department of Biostatistics, University of Copenhagen, Øster Farimagsgade 5, 1014 Copenhagen K, Denmark

Per Kragh Andersen, Department of Biostatistics, University of Copenhagen, Øster Farimagsgade 5, 1014 Copenhagen K, Denmark.

References

Aalen O. Nonparametric estimation of partial transition probabilities in multiple decrement models. Ann Stat. 1978;6(3):534–545. [Google Scholar]
Andersen PK, Borgan Ø, Gill RD, Keiding N. Statistical models based on counting processes. Springer; New York: 1993. [Google Scholar]
Andersen PK, Klein JP, Rosthøj S. Generalised linear models for correlated pseudo-observations, with applications to multi-state models. Biometrika. 2003;90(1):15–27. [Google Scholar]
Andersen PK, Pohar Perme M. Pseudo-observations in survival analysis. Stat Methods Med Res. 2010;19(1):71–99. doi: 10.1177/0962280209105020. [DOI] [PubMed] [Google Scholar]
Begun JM, Hall WJ, Huang WM, Wellner JA. Information and asymptotic efficiency in parametric–nonparametric models. Ann Stat. 1983;11(2):432–452. [Google Scholar]
Fine JP, Gray RJ. A proportional hazards model for the subdistribution of a competing risk. J Am Stat Assoc. 1999;94(446):496–509. [Google Scholar]
Gerds TA, Scheike TH, Andersen PK. Absolute risk regression for competing risks: interpretation, link functions, and prediction. Stat Med. 2012;31(29):3921–3930. doi: 10.1002/sim.5459. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gill RD. PhD thesis. Math. Centre Tracts 124. Mathematical Centre; Amsterdam: 1980. Censoring and stochastic integrals. [Google Scholar]
Graw F, Gerds TA, Schumacher M. On pseudo-values for regression analysis in competing risks models. Lifetime Data Anal. 2009;15(2):241–255. doi: 10.1007/s10985-008-9107-z. [DOI] [PubMed] [Google Scholar]
Kalbfleisch JD, Prentice RL. The statistical analysis of failure time data. 2nd. Wiley; New York: 2002. [Google Scholar]
Klein JP, Andersen PK. Regression modeling of competing risks data based on pseudovalues of the cumulative incidence function. Biometrics. 2005;61(1):223–229. doi: 10.1111/j.0006-341X.2005.031209.x. [DOI] [PubMed] [Google Scholar]
Scheike TH, Zhang MJ, Gerds TA. Predicting cumulative incidence probability by direct binomial regression. Biometrika. 2008;95:205–220. [Google Scholar]
Szydlo R, Goldman JM, Klein JP, Gale RP, Ash RC, Bach FH, Bradley BA, Casper JT, Flomenberg N, Gajewski JL, et al. Results of allogeneic bone marrow transplants for leukemia using donors other than HLA-identical siblings. J Clin Oncol. 1997;15(5):1767–1777. doi: 10.1200/JCO.1997.15.5.1767. [DOI] [PubMed] [Google Scholar]

[R1] Aalen O. Nonparametric estimation of partial transition probabilities in multiple decrement models. Ann Stat. 1978;6(3):534–545. [Google Scholar]

[R2] Andersen PK, Borgan Ø, Gill RD, Keiding N. Statistical models based on counting processes. Springer; New York: 1993. [Google Scholar]

[R3] Andersen PK, Klein JP, Rosthøj S. Generalised linear models for correlated pseudo-observations, with applications to multi-state models. Biometrika. 2003;90(1):15–27. [Google Scholar]

[R4] Andersen PK, Pohar Perme M. Pseudo-observations in survival analysis. Stat Methods Med Res. 2010;19(1):71–99. doi: 10.1177/0962280209105020. [DOI] [PubMed] [Google Scholar]

[R5] Begun JM, Hall WJ, Huang WM, Wellner JA. Information and asymptotic efficiency in parametric–nonparametric models. Ann Stat. 1983;11(2):432–452. [Google Scholar]

[R6] Fine JP, Gray RJ. A proportional hazards model for the subdistribution of a competing risk. J Am Stat Assoc. 1999;94(446):496–509. [Google Scholar]

[R7] Gerds TA, Scheike TH, Andersen PK. Absolute risk regression for competing risks: interpretation, link functions, and prediction. Stat Med. 2012;31(29):3921–3930. doi: 10.1002/sim.5459. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] Gill RD. PhD thesis. Math. Centre Tracts 124. Mathematical Centre; Amsterdam: 1980. Censoring and stochastic integrals. [Google Scholar]

[R9] Graw F, Gerds TA, Schumacher M. On pseudo-values for regression analysis in competing risks models. Lifetime Data Anal. 2009;15(2):241–255. doi: 10.1007/s10985-008-9107-z. [DOI] [PubMed] [Google Scholar]

[R10] Kalbfleisch JD, Prentice RL. The statistical analysis of failure time data. 2nd. Wiley; New York: 2002. [Google Scholar]

[R11] Klein JP, Andersen PK. Regression modeling of competing risks data based on pseudovalues of the cumulative incidence function. Biometrics. 2005;61(1):223–229. doi: 10.1111/j.0006-341X.2005.031209.x. [DOI] [PubMed] [Google Scholar]

[R12] Scheike TH, Zhang MJ, Gerds TA. Predicting cumulative incidence probability by direct binomial regression. Biometrika. 2008;95:205–220. [Google Scholar]

[R13] Szydlo R, Goldman JM, Klein JP, Gale RP, Ash RC, Bach FH, Bradley BA, Casper JT, Flomenberg N, Gajewski JL, et al. Results of allogeneic bone marrow transplants for leukemia using donors other than HLA-identical siblings. J Clin Oncol. 1997;15(5):1767–1777. doi: 10.1200/JCO.1997.15.5.1767. [DOI] [PubMed] [Google Scholar]

PERMALINK

Pseudo-observations for competing risks with covariate dependent censoring

Nadine Binder

Thomas A Gerds

Per Kragh Andersen

Abstract

1 Introduction

2 Alternative estimators

3 Simulation study of bias and efficiency

Table 1. Estimated marginal mean from the simulation study based on four definitions of pseudo-values; AJ: Aalen–Johansen estimator; SameCens: (8); KeepCox: (9); RefitCox: (7).

Table 2. Means, $\bar{\hat{β}}$ , and SD of estimated regression coefficients from the simulation study and corresponding mean squared errors (MSE) based on four definitions of pseudo-values; AJ: Aalen–Johansen estimator; SameCens: (8); KeepCox: (9); RefitCox: (7).

4 Case study

4.1 BMT data

Table 3. Analysis of Fine-Gray regression models for the cumulative incidence of relapse in BMT data: four definitions of pseudo-values; AJ: Aalen–Johansen estimator; SameCens: (8); KeepCox: (9); RefitCox: (7).

Table 4. Results from Cox regression of the observed censoring times in the BMT study.

4.2 Sensitivity analysis

Table 5. Average of Fine-Gray regression analyses of the cumulative incidence of relapse in BMT-like simulated data and corresponding mean squared errors in parentheses.

5 Discussion

Acknowledgments

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Pseudo-observations for competing risks with covariate dependent censoring

Nadine Binder

Thomas A Gerds

Per Kragh Andersen

Abstract

1 Introduction

2 Alternative estimators

3 Simulation study of bias and efficiency

Table 1. Estimated marginal mean from the simulation study based on four definitions of pseudo-values; AJ: Aalen–Johansen estimator; SameCens: (8); KeepCox: (9); RefitCox: (7).

Table 2. Means, β^¯, and SD of estimated regression coefficients from the simulation study and corresponding mean squared errors (MSE) based on four definitions of pseudo-values; AJ: Aalen–Johansen estimator; SameCens: (8); KeepCox: (9); RefitCox: (7).

4 Case study

4.1 BMT data

Table 3. Analysis of Fine-Gray regression models for the cumulative incidence of relapse in BMT data: four definitions of pseudo-values; AJ: Aalen–Johansen estimator; SameCens: (8); KeepCox: (9); RefitCox: (7).

Table 4. Results from Cox regression of the observed censoring times in the BMT study.

4.2 Sensitivity analysis

Table 5. Average of Fine-Gray regression analyses of the cumulative incidence of relapse in BMT-like simulated data and corresponding mean squared errors in parentheses.

5 Discussion

Acknowledgments

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Table 2. Means, $\bar{\hat{β}}$ , and SD of estimated regression coefficients from the simulation study and corresponding mean squared errors (MSE) based on four definitions of pseudo-values; AJ: Aalen–Johansen estimator; SameCens: (8); KeepCox: (9); RefitCox: (7).