Analyzing Competing Risk Data Using the R timereg Package

Thomas H Scheike; Mei-Jie Zhang

. Author manuscript; available in PMC: 2012 Jun 14.

Published in final edited form as: J Stat Softw. 2011 Jan;38(2):i02.

Analyzing Competing Risk Data Using the R timereg Package

Thomas H Scheike ¹, Mei-Jie Zhang ²

PMCID: PMC3375021 NIHMSID: NIHMS377756 PMID: 22707920

Abstract

In this paper we describe flexible competing risks regression models using the comp.risk() function available in the timereg package for R based on Scheike et al. (2008). Regression models are specified for the transition probabilities, that is the cumulative incidence in the competing risks setting. The model contains the Fine and Gray (1999) model as a special case. This can be used to do goodness-of-fit test for the subdistribution hazards’ proportionality assumption (Scheike and Zhang 2008). The program can also construct confidence bands for predicted cumulative incidence curves.

We apply the methods to data on follicular cell lymphoma from Pintilie (2007), where the competing risks are disease relapse and death without relapse. There is important non-proportionality present in the data, and it is demonstrated how one can analyze these data using the flexible regression models.

Keywords: binomial modelling, competing risks, goodness of fit, inverse-censoring probability weighting, nonparametric effects, non-proportionality, R, regression effects, timereg

1. Introduction

Competing risks data often arise in biomedical research when subjects are at risk of failure from K different causes. When one event occurs, it precludes the occurrence of any other event. In cancer studies, one common example of competing risks involves disease relapse and death in remission. The cumulative incidence curve, i.e., the probability of failure of a specific type is a useful summary curve when analyzing competing risks data. Unfortunately this is not widely known in the biomedical world, and a very common error is that people report one minus the Kaplan-Meier estimate for each competing cause as a probability of cause-specific free survival. This is not a correct procedure and this estimator overestimates the incidence rates of a particular cause in the presence of all other competing causes (see Klein et al. 2001 for details).

The aim of this work is to estimate and model the cumulative incidence probability of a specific cause of failure. Estimating and modelling the cause-specific hazards has been considered as a standard approach for analyzing competing risks data. Assuming two types of failures k = 1, 2, the cumulative incidence function for cause 1 given a set of covariates x is given by

P_{1} (t; x) = P (T \leq t, ε = 1 ∣ z) = \int_{0}^{t} λ_{1} (s; x) exp [- \int_{0}^{s} {λ_{1} (u; x) + λ_{2} (u; x)} d u] d s,

(1)

where T is the failure time, ε indicates the cause of failure and λ_k(t; x) is the hazard of the kth cause failure conditional on x, which is defined as

λ_{k} (t; x) = lim_{Δ t \to 0} \frac{1}{Δ t} P {t \leq T \leq t + Δ t, ε = k ∣ T \geq t} .

Here, the cause-specific hazards for all causes need to be properly modeled. Cox’s proportional hazards model is the most popular regression model in survival analysis and here the hazard function is given by

λ_{k} (t; x) = λ_{k 0} (t) exp {x^{⊤} β},

where λ_k₀(t) is a cause-specific baseline and β are regression coefficients. Using Cox’s regression model to model the cause-specific hazards with the purpose of estimating the cumulative incidence function (1) was considered by Lunn and McNeil (1995) and Cheng et al. (1998). Shen and Cheng (1999) considered Lin and Ying’s special additive model for the cause-specific hazards and Scheike and Zhang (2002, 2003) considered a flexible Cox-Aalen model. The latter model allows some covariates to have time-varying effects. Modelling of the cause-specific hazards gives a complex nonlinear modelling relationship for the cumulative incidence curves. It is therefore hard to summarize the covariate effect and hard to identify the time-varying effect on the cumulative incidence function for a specific covariate. Recently, it has been suggested to directly model the cumulative incidence function. Fine and Gray (1999, FG) developed a direct Cox regression approach to model the subdistribution hazard function of a specific cause. The cumulative incidence function based on the FG model is given by

P_{1} (t; x) = 1 - exp {- Λ_{1} (t) exp (x^{⊤} β)},

where Λ₁(t) is an unknown increasing function and β is a vector of regression coefficients. FG proposed using an inverse probability of censoring weighting technique to estimate β and Λ₁(t). This approach is implemented in the crr() function in the cmprsk package (Gray 2010) for R (R Development Core Team 2010).

Recently, we considered a class of flexible models of the form

h {P_{1} (t; x, z)} = x^{⊤} α (t) + g (z, γ, t)

(2)

where h and g are known link functions and α(t) and γ are unknown regression coefficients (see Scheike et al. 2008, SZG). FG’s proportional regression model, Lin and Ying’s special additive model and Aalen’s full additive regression model are special sub-models of our model. Any link function can be considered and used here. In this study we focus on two classes of flexible models: proportional models

cloglog {1 - P_{1} (t; x, z)} = x^{⊤} α (t) + z^{⊤} γ

(3)

and additive models

- log {1 - P_{1} (t; x, z)} = x^{⊤} α (t) + (z^{⊤} γ) t .

(4)

The regression coefficients α(t) and γ are estimated by a simple direct binomial regression approach. We have developed a function, comp.risk(), available in the R package timereg, that implements this approach. In addition we have proposed a useful goodness-of-fit test to identify whether time-varying effect is present for a specific covariate.

In medical studies physicians often wish to estimate the predicted cumulative incidence probability for a given set of values of covariates. The predict() function of timereg computes the predicted cumulative incidence probability and an estimate of its variance at each fixed time point, and constructs (1 − α)100% simultaneous confidence bands over a given time interval. One further advantage is that the software can deal with cluster structure, see Scheike et al. (2010).

The estimation procedure and goodness-of-fit test will be presented in Section 2. In Section 3 we will show how the comp.risk() function in the R package timereg can be used to fit our newly proposed flexible models (3) and (4) through a worked example. The package is available from the Comprehensive R Archive Network at http://CRAN.R-project.org/package=timereg.

2. Estimation and goodness-of-fit test

2.1. Estimation

Let T_i and C_i be the event time and right censoring time for the ith individual, respectively. ε_i ∈ {1, …, K} indicates the cause of failure. Let T̃_i = min(T_i, C_i) and Δ_i = Inline graphic (T_i ≤ C_i). We observe n independent identically distributed (i.i.d.) realizations of {T̃_i, Δ_i, Δ_iε_i, X_i, Z_i} for i = 1, …, n, where X_i = (1, X_i₁, …, X_ip)^⊤ and Z_i = (Z_i₁, …, Z_iq)^⊤ are associated covariates. We assume that (T_i, ε_i) are independent of C_i given covariates. Let N_i(t) = Inline graphic (T_i ≤ t, ε_i = 1) be the underlying counting processes associated with cause 1, which are not observable for all t. However, Δ_iN_i(t) are computable for all t and we can show that E{Δ_iN_i(t)/G(T_i ∧ t|X_i, Z_i)} = P₁(T_i ∧ t; X_i, Z_i), where G(t|X, Z) is the survival distribution for the censoring time given the covariates. We therefore considered the inverse probability censoring weighted response Δ_iN_i(t)/G(T_i ∧ t|X_i, Z_i) in the estimating equations and proposed to estimate the regression coefficients α(t) and γ by solving the estimating equations simultaneously. We denote the estimates as α(t) and γ. Under regularity conditions we showed that $\sqrt{n} (\hat{γ} - γ)$ and $\sqrt{n} {\hat{α} (t) - α (t)}$ are jointly asymptotically Gaussian and have the same limit distribution as

\sqrt{n} {\hat{C}}_{γ}^{- 1} \sum_{i} {{\hat{W}}_{1 i} (τ) G_{i}} and \sqrt{n} {\hat{I}}_{α}^{- 1} (t) \sum_{i} {{\hat{W}}_{2 i} (t) G_{i}},

respectively, where (G₁, …, G_n) are i.i.d. standard normals, τ is the of study time point, and explicit expressions for Ĉ_γ, Î_α(t), Ŵ₁_i(t) and Ŵ₂_i(t) are given in Scheike and Zhang (2008). For a given set values of covariates, (x, z), the predicted cumulative incidence function can be estimated by P̂₁(t; x, z) = h⁻¹{x^⊤α̂(t) + g(z, γ, t) }, and we showed in SZG that $\sqrt{n} {{\hat{P}}_{1} (t; x, z) - P_{1} (t; x, z)}$ has the same limit as

\sqrt{n} \frac{\partial h^{- 1} {\hat{P}}_{1} (t; x, z)}{\partial t} \sum_{i} {{\hat{W}}_{3 i} (t; x, z) G_{i}},

where (G₁, …, G_n) are standard normals and Ŵ₃_i(t; x, z) is a residual that can be estimated based on the data (see Scheike and Zhang 2008 for details).

Resampling techniques can be applied to construct (1−α)100% confidence bands for α_j(t), j = 1, …, p, and P₁(t; x, z), and to compute the p value of testing H₀ : α_j(t) = 0 for all t ∈ [0, τ] based on sup_t_∈[0,_τ_] |α̂_j(t)/σ̂(t)|, where σ̂(t) is an estimated standard error of α̂(t).

2.2. Goodness-of-fit test

The FG model is commonly used for analyzing competing risks data and this model assumes that all covariates have constant effects over time (Beyersmann et al. 2009). Recently, we developed a goodness-of-fit test (Scheike and Zhang 2008) for testing whether or not this is a reasonable assumption. We consider an extended version of the FG model

P_{1} (t; x, z) = 1 - exp {- exp (x^{⊤} α (t) + z^{⊤} γ)},

where some effects are proportional as in the FG model (γ) and some effects are allowed to change their effects on the cumulative incidence function over time (α(t)). Therefore testing for example H₀ : α_j(t) = β_j, for all t ∈ [0, τ] will determine whether the effect of x_j is constant. Further, plotting the estimated α̂_j(t) with its confidence band will give a good idea about whether or not the proportionality assumption is satisfied or violated.

To test H₀ there are many possibilities, a simple test that relies only on α̂_j(t) is to look at

T_{j} (t, {\hat{α}}_{j}) = {\hat{α}}_{j} (t) - \frac{1}{τ} \int_{0}^{τ} {\hat{α}}_{j} (s) d s,

(5)

for j = 1, …, p. We derived the asymptotic distribution of this test process and proposed to compute the p value of the test based on a Kolmogorov-Smirnov type test-statistic sup_t_∈[0,_τ_] |T_j(t, α̂_j)| or by a Cramer von Mises type test-statistic $\int_{0}^{τ} {T_{j} (s, {\hat{α}}_{j})}^{2} d s$ . The Cramer von Mises test is an alternative to the Kolmogorov-Smirnov test. Anderson (1962) showed that in the case of the two sample test, the Cramer von Mises test is more powerful than the Kolmogorov-Smirnov test. We compute both tests in the comp.risk() function. In addition, we can plot the observed test process (5) and simulated test processes under the null hypothesis to visually examine whether a specific covariate has a time-varying effect. All large sample properties and resampling techniques used for the test statistics are given in SZG and Scheike and Zhang (2008).

3. Worked example: Follicular cell lymphoma study

We consider the follicular cell lymphoma data from Pintilie (2007) where additional details also can be found. The data set can be downloaded from http://www.uhnres.utoronto.ca/labs/hill/datasets/Pintilie/datasets/follic.txt, and consists of 541 patients with early disease stage follicular cell lymphoma (I or II) and treated with radiation alone (chemo = 0) or a combination treatment of radiation and chemotherapy (chemo = 1). Disease relapse or no response and death in remission are the two competing risks. The patients ages (age: mean = 57 and sd = 14) and haemoglobin levels (hgb: mean = 138 and sd = 15) were also recorded. The median follow-up time was 5.5 years.

First we read the data, compute the cause of failure indicator and code the covariates:


R> fol <- read.table(“follic.txt”, sep = “,”, header = TRUE)
R> evcens <- as.numeric(fol$resp == “NR” | fol$relsite != ““)
R> crcens <- as.numeric(fol$resp == “CR” & fol$relsite == ““ & fol$stat == 1)
R> cause <- ifelse(evcens == 1, 1, ifelse(crcens == 1, 2, 0))
R> table(cause)

cause
  0   1  2
193 272 76

R> stage <- as.numeric(fol$clinstg == 2)
R> chemo <- as.numeric(fol$ch == “Y”)
R> times1 <- sort(unique(fol$dftime[cause == 1]))

There are 272 (no treatment response or relapse) events due to the disease, 76 competing risk events (death without relapse) and 193 censored individuals. The event times are denoted as dftime. The variables times1 gives the distinct event times for causes “1”.

We first estimate the nonparametric cumulative incidence curve using the timereg package and the cmprsk for comparison. We specify the event time and the censoring variable in timereg’s comp.risk() function as Surv(dftime,cause == 0). The regression model contains only an intercept term (+ 1). The cause variable gives the causes associated with the different events. causeS = 1 specifies that we consider type 1 events, and the censoring code is given by the cens.code variable. The times at which the estimates are computed/based can be given by the argument times = times1, the default is to use all cause “1” time points that are numerically stable.

The cumulative incidence curve estimations based on the cmprsk’s cuminc() function and the timereg’s comp.risk() function are both identical to the product-limit estimator in the case without covariates (Figure 1 a and b). Figure 1 (a) shows the cumulative incidence curves for the two causes estimated by the cmprsk package. In Figure 1 (b) we show that the comp.risk() function can also be used to construct 95% confidence intervals (dotted lines) and 95% confidence bands (broken lines) based on resampling which is not available in the cuminc() function. The R packages etm (Allignol et al. 2011) and mstate (de Wreede et al. 2010, 2011) can also be used to compute the cumulative incidence curve with 95% confidence intervals, but they do not provide confidence bands.

(a) Cumulative incidence curves based on the cuminc() function for the two causes and (b) cumulative incidence curve based on comp.risk() function for relapse (solid line) with 95% confidence intervals (dotted lines) and 95% confidence bands (broken lines) based on resampling.


R> library(“timereg”)
R> library(“cmprsk”)
R> out1 <- comp.risk(Surv(dftime, cause == 0) ~ + 1, data = fol,
+   cause, causeS = 1, n.sim = 5000, cens.code = 0, model = “additive”)
R> pout1 <- predict(out1, X = 1)
R> group <- rep(1, nrow(fol))
R> fit <- cuminc(fol$dftime, cause, group, cencode = 0)
R> par(mfrow = c(1, 2))
R> plot(fit,main = “cmprsk”, xlab = “Years (a)”)
R> plot(pout1,xlim = c(0, 30), xlab = “Years (b)”, main = “timereg”,
+   uniform = 3, se = 2)

Both the subdistribution hazard approach and the direct binomial modelling approach are based on an inverse probability of censoring weighting technique. When applying such weights it is crucial that the censoring weights are estimated without bias, otherwise the estimates of the cumulative incidence curve may also be biased. In this example, we find that the censoring distribution depends significantly on the covariates hgb, stage and chemo and is well described by Cox’s regression model. The fit of the Cox model was validated by cumulative residuals, see Martinussen and Scheike (2006) for further details. As a consequence using a simple Kaplan-Meier estimate for the censoring weights may lead to severely biased estimates. We therefore add the option cens.model = “cox” in the function call, this uses all the covariates present in the competing risks model in the Cox model for the censoring weights. More generally it has been established that regression modelling for the inverse probability censoring weights can be used to improve the efficiency (Scheike et al. 2008).

We now use prop in the model option to fit the model

P_{1} (t; x, z) = 1 - exp {- exp (x^{⊤} α (t) + z^{⊤} γ)} .

(6)

We first fit a general proportional model allowing all covariates to have time-varying effects. Only the covariates x in model (6) are defined in the function call below. The covariates z in model (6) are specified by a const operator.


R> outf <- comp.risk(Surv(dftime, cause == 0) ~ stage + age + chemo + hgb,
+   data = fol, cause, causeS = 1, n.sim = 5000, cens.code = 0,
+   model = “prop”, cens.model = “cox”)
R> summary(outf)

OUTPUT:
      Competing risks Model
Test for nonparametric terms
Test for non-significant effects
          Supremum-test of significance p-value H_0: B(t)=0
(Intercept)                       3.29             0150
stage                           5.08            0.0000
age                             4.12            0.0002
chemo                          2.79            0.0558
hgb                             1.16            0.8890
Test for time invariant effects
          Kolmogorov-Smirnov test p-value H_0:constant effect
(Intercept)                   8.6200                 0.0100
stage                        1.0400                 0.0682
age                         0.0900                 0.0068
chemo                       1.7200                 0.0004
hgb                         0.0127                 0.5040
          Cramer von Mises test p-value H_0:constant effect
(Intercept)             3.69e+01                  0.0170
stage                 2.52e+00                  0.0010
age                   4.26e-03                  0.0014
chemo                1.50e+00                  0.0900
hgb                   2.64e-04                  0.4220
Call:
comp.risk(Surv(dftime, cause == 0) ~ stage + age + chemo + hgb,
data = fol, cause, causeS = 1, n.sim = 5000, cens.code = 0,
model = “prop”, cens.model = “cox”)

The tests of significance based on the nonparametric tests show that stage and age are clearly significant, chemo is borderline significant (p = 0.056) and hgb is not significant (p = 0.889) in the fully nonparametric model.

Plot options of sim.ci and score can be used to plot estimated regression coefficients α_j(t) with its 95% confidence bands and to plot the observed test process for constant effects and simulated test processes under the null, respectively.


R> plot(outf, sim.ci = 2)
R> plot(outf, score = 1)

Figure 2 shows the time-varying covariate effects (α(t) of model (6)). It is evident that these effects are not constant over time, effects are considerably pronounced in the early time-period. The 95% pointwise confidence intervals, as well as 95% confidence bands (sim.ci=2 in the plot call, 2 for broken lines).

Estimates of time-varying effects in proportional model (solid lines) with 95% confidence intervals (dotted lines) and 95% confidence bands (broken lines).

Figure 3 shows the related test-processes for deciding whether the time-varying effects are significantly time-varying or whether H₀ : α_j(t) = β_j can be accepted. The summary of these graphs are given in the output, and we see that stage, age and chemo are clearly time-varying, and thus not consistent with the Fine-Gray model. The p values related to these plots are given in the above output, and we see that the Kolmogorov-Smirnov (supremum) test leads to p values of 0.068, 0.007, and 0.000, for stage, age and chemo, respectively. Similarly, the Cramer von Mises test statistics based on the same score processes are 0.001, 0.001, and 0.090, respectively. These test statistics are described in detail in Section 2. Note that the two different summaries of the test processes by the Kolmogorov-Smirnov and Cramer von Mises tests statistics are consistent with the figures, and the overall conclusion is that none of the three variables have proportional Cox type effects. In reality the command plot(outf, score = 1) that produces Figure 3 also leads to a similar plot for the baseline, but we have only plotted the covariate components of the models. To plot, for example, the second covariate (after the intercept), stage, of the model we give the command plot(outf, score = 1, specific.comps = 2).

Observed test process (black line) and simulated test processes under the null (gray lines).

We see that hgb is well described by a constant and we therefore consider the model with hgb having a constant effect and the remaining covariates having time-varying effects.

This final model is fitted with the call


R> outf1 <- comp.risk(Surv(dftime, cause == 0) ~ stage + age + chemo +
+    const(hgb), data = fol, cause, causeS = 1, n.sim = 5000, cens.code = 0,
+   model = “prop”, cens.model= “cox”)
R> summary(outf1)

OUTPUT:
Competing risks Model
Test for nonparametric terms
Test for non-significant effects
          Supremum-test of significance p-value H_0: B(t)=0
(Intercept)                       5.46                0
stage                            5.18                0
age                             4.20                0
chemo                           3.89                0
Test for time invariant effects
          Kolmogorov-Smirnov test p-value H_0:constant effect
(Intercept)                  10.100                  0.000
stage                       1.190                  0.048
age                         0.101                  0.004
chemo                      1.860                  0.000
          Cramer von Mises test p-value H_0:constant effect
(Intercept)                79.90000               0.000
stage                     1.84000                0.006
age                      0.00583                0.000
chemo                    2.53000                0.000
Parametric terms :
          Coef.      SE Robust SE    z P-val
const(hgb) 0.00195 0.00401   0.00401 0.486 0.627
Call:
comp.risk(Surv(dftime, cause == 0) ~ stage + age + const(hgb) +
    chemo, fol, cause, times = times1, model = “prop”)

The covariate hgb has a constant effect over time with β̂ = 0.00195. Note that hgb is non-significant (p = 0.627), as in the nonparametric model (p = 0.889) as well as in the FG model (p = 0.534) where all effects are constant over time (see below). The covariates stage, age and chemo all have significantly time-varying effects, and the estimates of the effects of stage, age and chemo are very similar to those of the fully non-parametric model shown in Figure 2. To make a comparison of the predictions based on the FG model we also fit this model:


R> outfg <- comp.risk(Surv(dftime, cause == 0) ~ const(stage) + const(age) +
+   const(chemo) + const(hgb), data=fol, cause, causeS = 1,
+   n.sim = 5000, cens.code = 0, model = “prop”, cens.model = “cox”)
R> summary(outfg)

Competing risks Model
Test for nonparametric terms
Test for non-significant effects
          Supremum-test of significance p-value H_0: B(t)=0
(Intercept)                       6.32                0
Test for time invariant effects
          Kolmogorov-Smirnov test p-value H_0:constant effect
(Intercept)                    1.93                     0
          Cramer von Mises test p-value H_0:constant effect
(Intercept)                    14.3                   0
Parametric terms :
              Coef.      SE Robust SE      z      P-val
const(stage)  0.45200  0.13500   0.13500  3.340  0.000838
const(age)    0.01450  0.00459   0.00459  3.150  0.001610
const(chemo) −0.37600  0.18800   0.18800 −2.000  0.045800
const(hgb)    0.00249  0.00401   0.00401  0.622  0.534000
Call:
comp.risk(Surv(dftime, cause == 0) ~ const(stage) + const(age) +
const (chemo) + const(hgb), data = fol, cause, causeS = 1,
n.sim = 5000, cens.code = 0, model = “prop”, cens.model = “cox”)

We note that the effect of hgb is almost equivalent with that based on the more appropriate model (shown above). But the estimate could be severely biased due to lack of fit of the other covariates in the model, and could thus misrepresent important features of the data.

Finally, we compare the prediction for the FG model with that of the semiparametric model that gives a more detailed description of the effects. We consider predictions for two different patients defined by the newdata assignment below. Patient type I: disease stage I (stage = 0), 40 years old and without chemotherapy treatment (chemo = 0), and patient type II: disease stage II (stage = 1), 60 years old and the radiation plus chemotherapy combination treatment (chemo = 1).


R> newdata <- data.frame(stage = c(0, 1), age = c(40, 60), chemo = c(0, 1),
+ hgb = c(138, 138))
R> poutf1 <- predict(outf1, newdata)
R> poutfg <- predict(outfg, newdata)
R> par(mfrow = c(1, 2))
R> plot(poutf1, multiple = 1, se = 0, uniform = 0, col = 1:2, lty = 1:2)
R> title(main = “Flexible model predictions”)
R> plot(poutfg, multiple = 1, se = 0, uniform = 0, col = 1:2, lty = 1:2)
R> title(main = “Fine-Gray model predictions”)

To specify the data at which the predictions are computed, one can either specify a newdata argument or more robustly, specify the specific forms of X and Z (const), in the above situation, one could thus equivalently do the following


R> poutf1 <- predict(outf1, X = cbind(1, c(0, 1), c(40, 60), c(0, 1)),
+   Z = cbind(c(138, 138)))

The predictions based on the model may not be monotone. The plot() function plots a pool-adjacent-violators estimate (Robertson et al. 1988) based on the simple direct estimate based on α̂(t) and γ̂.

We plot the predictions without pointwise confidence intervals (se = 0) and without confidence bands (uniform = 0). The predictions shows in Figure 4 (a) are based on the flexible model, and the predictions in Figure 4 (b) are based on the FG model. The cumulative incidence curves of relapse for a type I and a type II patient are plotted in solid and dotted lines, respectively.

(a) Predictions of the cumulative incidence curves based on flexible model and (b) model assuming constant effects.

Figure 5 (a) compares the predictions for a type I patient based on the flexible model and on the FG model. Similarly, Figure 5 (b) compares the predictions for a type II patient. The broken lines around the two predictions represent the confidence band based on the flexible model. Figure 5 is produced by the following code.

Predictions of the cumulative incidence curves based on flexible model and FG model for given type I patient (a) and type II patient (b).


R> par(mfrow = c(1, 2))
R> plot(poutf1, se = 0, uniform = 1, col = 1, lty = 1, specific.comps = 1)
R> plot(poutfg, new = 0, se = 0, uniform = 0, col = 2, lty = 2,
+   specific.comps = 1)
R> title(main = “Type I patients”)
R> legend(1, 1.0, c(“Flexible model”, “Fine-Gray model”), lty = 1:2,
+   col = 1:2)
R> plot(poutf1, se = 0, uniform = 1, col = 1, lty = 1, specific.comps = 2)
R> plot(poutfg, new = 0, se = 0, uniform = 0, col = 2, lty = 2,
+   specific.comps = 2)
R> title(main = “Type II patients”)
R> legend(1, 1.0, c(“Flexible model”, “Fine-Gray model”), lty = 1:2,
+   col = 1:2)

Higher disease stage, increased age and with combination treatment lead to higher cumulative incidence and the effect of this is more pronounced in the early part of the time-period (Figure 4 (a) and Figure 2). Chemo on the other hand increases the cumulative incidence in the initial part of the time period, and subsequently lowers the incidence (Figure 4 (a) and Figure 2). Figure 5 shows that the FG model does not model the time-varying effect accurately. Despite these differences the overall predictions are in this case somewhat similar, especially when the uncertainty of the estimates is taken into account. However, this does not change the fact that the time-varying behavior of the covariates is clearly significant and that the knowledge of this structure in the data is preferred.

4. Discussion

The flexible competing risks regression model for the cumulative incidence curves are implemented in the comp.risk() function in the timereg package for R. These models are useful for a detailed analysis of how covariate effects predicts the cumulative incidence, and allows for a time-varying effect of the covariates. This is particularly useful for examining the fit of simpler models where covariate effects are assumed constant. The goodness-of-fit procedure leads to an asymptotically justified p value. Another nice feature is that the comp.risk() can deal with cluster structure.

The predict() function yields predictions with confidence intervals as well as confidence bands which are useful for the researchers.

Acknowledgments

The research was supported by National Cancer Institute grant 2 R01 CA54706-13 and a grant from the Danish Research Council on “Point process modelling and statistical inference”.

Contributor Information

Thomas H. Scheike, Email: ts@biostat.ku.dk, Department of Biostatistics, University of Copenhagen, Øster Farimagsgade 5 B, P.O.B. 2099, DK-1014 Copenhagen K, Denmark, URL: http://staff.pubhealth.ku.dk/~ts/

Mei-Jie Zhang, Medical College Wisconsin.

References

Allignol A, Schumacher M, Beyersmann J. Empirical Transition Matrix of Multistate Models: The etm Package. Journal of Statistical Software. 2011;38(4):1–15. URL http://www.jstatsoft.org/v38/i04/ [Google Scholar]
Anderson TW. On the Distribution of the Two-Sample Cramer-von Mises Criterion. The Annals of Mathematical Statistics. 1962;33:1148–1159. [Google Scholar]
Beyersmann J, Latouche A, Buchholz A, Schumacher M. Simulating Competing Risks Data in Survival Analysis. Statistics in Medicine. 2009;28:956–971. doi: 10.1002/sim.3516. [DOI] [PubMed] [Google Scholar]
Cheng SC, Fine JP, Wei LJ. Prediction of Cumulative Incidence Function under the Proportional Hazards Model. Biometrics. 1998;54:219–228. [PubMed] [Google Scholar]
de Wreede LC, Fiocco M, Putter H. The mstate Package for Estimation and Prediction in Non- and Semi-Parametric Multi-State and Competing Risks Models. Computer Methods and Programs in Biomedicine. 2010;99:261–274. doi: 10.1016/j.cmpb.2010.01.001. [DOI] [PubMed] [Google Scholar]
de Wreede LC, Fiocco M, Putter H. “mstate: an R Package for the Analysis of Competing Risks and Multi-State Models. Journal of Statistical Software. 2011;38(7):1–30. URL http://www.jstatsoft.org/v38/i07/ [Google Scholar]
Fine JP, Gray RJ. A Proportional Hazards Model for the Subdistribution of a Competing Risk. Journal of the American Statistical Association. 1999;94:496–509. [Google Scholar]
Gray RJ. cmprsk: Subdistribution Analysis of Competing Risks. R package version 2.2-1. 2010 URL http://CRAN.R-project.org/package=cmprsk.
Klein JP, Rizzo JD, Zhang MJ, Keiding N. Statistical Methods for the Analysis and Presentation of the Results of Bone Marrow Transplants: Part I: Unadjusted Analysis. Bone Marrow Transplantation. 2001;28:909–915. doi: 10.1038/sj.bmt.1703260. [DOI] [PubMed] [Google Scholar]
Lunn M, McNeil D. Applying Cox Regression to Competing Risks. Biometrics. 1995;51:524–532. [PubMed] [Google Scholar]
Martinussen T, Scheike TH. Statistics for Biology and Health. Springer-Verlag; New York: 2006. Dynamic Regression Models for Survival Data. [Google Scholar]
Pintilie M. Competing Risks: A Practical Perspective. John Wiley & Sons; New York: 2007. [Google Scholar]
R Development Core Team. R Foundation for Statistical Computing; Vienna, Austria: 2010. R: A Language and Environment for Statistical Computing. URL http://www.R-project.org/ [Google Scholar]
Robertson T, Wright F, Dykstra RL. Order Restricted Statistical Inference. John Wiley & Sons; New York: 1988. [Google Scholar]
Scheike TH, Sun Y, Zhang MJ, Jensen TK. A Semiparametric Random Effects Model for Multivariate Competing Risks Data. Biometrika. 2010;97:133–145. doi: 10.1093/biomet/asp082. [DOI] [PMC free article] [PubMed] [Google Scholar]
Scheike TH, Zhang MJ. An Additive-Multiplicative Cox-Aalen Model. Scandinavian Journal of Statistics in Medicine. 2002;28:75–88. [Google Scholar]
Scheike TH, Zhang MJ. Extensions and Applications of the Cox-Aalen Survival Models. Biometrics. 2003;59:1033–1045. doi: 10.1111/j.0006-341x.2003.00119.x. [DOI] [PubMed] [Google Scholar]
Scheike TH, Zhang MJ. Flexible Competing Risks Regression Modelling and Goodness-of-Fit. Lifetime Data Analysis. 2008;14:464–483. doi: 10.1007/s10985-008-9094-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
Scheike TH, Zhang MJ, Gerds TA. Predicting Cumulative Incidence Probability by Direct Binomial Regression. Biometrika. 2008;95:205–220. [Google Scholar]
Shen Y, Cheng SC. Confidence Bands for Cumulative Incidence Curves under the Additive Risk Model. Biometrika. 1999;55:1093–1100. doi: 10.1111/j.0006-341x.1999.01093.x. [DOI] [PubMed] [Google Scholar]

[R1] Allignol A, Schumacher M, Beyersmann J. Empirical Transition Matrix of Multistate Models: The etm Package. Journal of Statistical Software. 2011;38(4):1–15. URL http://www.jstatsoft.org/v38/i04/ [Google Scholar]

[R2] Anderson TW. On the Distribution of the Two-Sample Cramer-von Mises Criterion. The Annals of Mathematical Statistics. 1962;33:1148–1159. [Google Scholar]

[R3] Beyersmann J, Latouche A, Buchholz A, Schumacher M. Simulating Competing Risks Data in Survival Analysis. Statistics in Medicine. 2009;28:956–971. doi: 10.1002/sim.3516. [DOI] [PubMed] [Google Scholar]

[R4] Cheng SC, Fine JP, Wei LJ. Prediction of Cumulative Incidence Function under the Proportional Hazards Model. Biometrics. 1998;54:219–228. [PubMed] [Google Scholar]

[R5] de Wreede LC, Fiocco M, Putter H. The mstate Package for Estimation and Prediction in Non- and Semi-Parametric Multi-State and Competing Risks Models. Computer Methods and Programs in Biomedicine. 2010;99:261–274. doi: 10.1016/j.cmpb.2010.01.001. [DOI] [PubMed] [Google Scholar]

[R6] de Wreede LC, Fiocco M, Putter H. “mstate: an R Package for the Analysis of Competing Risks and Multi-State Models. Journal of Statistical Software. 2011;38(7):1–30. URL http://www.jstatsoft.org/v38/i07/ [Google Scholar]

[R7] Fine JP, Gray RJ. A Proportional Hazards Model for the Subdistribution of a Competing Risk. Journal of the American Statistical Association. 1999;94:496–509. [Google Scholar]

[R8] Gray RJ. cmprsk: Subdistribution Analysis of Competing Risks. R package version 2.2-1. 2010 URL http://CRAN.R-project.org/package=cmprsk.

[R9] Klein JP, Rizzo JD, Zhang MJ, Keiding N. Statistical Methods for the Analysis and Presentation of the Results of Bone Marrow Transplants: Part I: Unadjusted Analysis. Bone Marrow Transplantation. 2001;28:909–915. doi: 10.1038/sj.bmt.1703260. [DOI] [PubMed] [Google Scholar]

[R10] Lunn M, McNeil D. Applying Cox Regression to Competing Risks. Biometrics. 1995;51:524–532. [PubMed] [Google Scholar]

[R11] Martinussen T, Scheike TH. Statistics for Biology and Health. Springer-Verlag; New York: 2006. Dynamic Regression Models for Survival Data. [Google Scholar]

[R12] Pintilie M. Competing Risks: A Practical Perspective. John Wiley & Sons; New York: 2007. [Google Scholar]

[R13] R Development Core Team. R Foundation for Statistical Computing; Vienna, Austria: 2010. R: A Language and Environment for Statistical Computing. URL http://www.R-project.org/ [Google Scholar]

[R14] Robertson T, Wright F, Dykstra RL. Order Restricted Statistical Inference. John Wiley & Sons; New York: 1988. [Google Scholar]

[R15] Scheike TH, Sun Y, Zhang MJ, Jensen TK. A Semiparametric Random Effects Model for Multivariate Competing Risks Data. Biometrika. 2010;97:133–145. doi: 10.1093/biomet/asp082. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] Scheike TH, Zhang MJ. An Additive-Multiplicative Cox-Aalen Model. Scandinavian Journal of Statistics in Medicine. 2002;28:75–88. [Google Scholar]

[R17] Scheike TH, Zhang MJ. Extensions and Applications of the Cox-Aalen Survival Models. Biometrics. 2003;59:1033–1045. doi: 10.1111/j.0006-341x.2003.00119.x. [DOI] [PubMed] [Google Scholar]

[R18] Scheike TH, Zhang MJ. Flexible Competing Risks Regression Modelling and Goodness-of-Fit. Lifetime Data Analysis. 2008;14:464–483. doi: 10.1007/s10985-008-9094-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] Scheike TH, Zhang MJ, Gerds TA. Predicting Cumulative Incidence Probability by Direct Binomial Regression. Biometrika. 2008;95:205–220. [Google Scholar]

[R20] Shen Y, Cheng SC. Confidence Bands for Cumulative Incidence Curves under the Additive Risk Model. Biometrika. 1999;55:1093–1100. doi: 10.1111/j.0006-341x.1999.01093.x. [DOI] [PubMed] [Google Scholar]

PERMALINK

Analyzing Competing Risk Data Using the R timereg Package

Thomas H Scheike

Mei-Jie Zhang

Abstract

1. Introduction

2. Estimation and goodness-of-fit test

2.1. Estimation

2.2. Goodness-of-fit test

3. Worked example: Follicular cell lymphoma study

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

4. Discussion

Acknowledgments

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Analyzing Competing Risk Data Using the R timereg Package

Thomas H Scheike

Mei-Jie Zhang

Abstract

1. Introduction

2. Estimation and goodness-of-fit test

2.1. Estimation

2.2. Goodness-of-fit test

3. Worked example: Follicular cell lymphoma study

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

4. Discussion

Acknowledgments

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases