Skip to main content
Biostatistics (Oxford, England) logoLink to Biostatistics (Oxford, England)
. 2014 Feb 2;15(3):526–539. doi: 10.1093/biostatistics/kxt059

Concordance for prognostic models with competing risks

Marcel Wolbers 1,*, Paul Blanche 2, Michael T Koller 3, Jacqueline C M Witteman 4, Thomas A Gerds 5
PMCID: PMC4059461  PMID: 24493091

Abstract

The concordance probability is a widely used measure to assess discrimination of prognostic models with binary and survival endpoints. We formally define the concordance probability for a prognostic model of the absolute risk of an event of interest in the presence of competing risks and relate it to recently proposed time-dependent area under the receiver operating characteristic curve measures. For right-censored data, we investigate inverse probability of censoring weighted (IPCW) estimates of a truncated concordance index based on a working model for the censoring distribution. We demonstrate consistency and asymptotic normality of the IPCW estimate if the working model is correctly specified and derive an explicit formula for the asymptotic variance under independent censoring. The small sample properties of the estimator are assessed in a simulation study also against misspecification of the working model. We further illustrate the methods by computing the concordance probability for a prognostic model of coronary heart disease (CHD) events in the presence of the competing risk of non-CHD death.

Keywords: C index, Competing risks, Concordance probability, Coronary heart disease, Prognostic models, Time-dependent AUC

1. Introduction

Clinical decision-making and cost-effectiveness analyses often rely on prognostic models that quantify a subject's absolute risk of a disease event of interest over time. However, study populations increasingly consist of elderly individuals with varying degrees of co-morbidity who are likely to experience one of several disease endpoints other than the endpoint of main interest (Koller, Raatz and others, 2012). As an example, prediction of coronary heart disease (CHD) events in elderly subjects is complicated by the fact that subjects may die from other causes prior to the observation of the disease event of interest (Wolbers and others, 2009; Koller, Leening and others, 2012).

It is well known that the naive application of standard survival analysis leads to bias and risk over-estimation if competing risks are present and that specialized methods are needed (Grunkemeier and others, 2007; Putter and others, 2007). A key quantity for medical decision-making in the presence of competing risks is the absolute risk of the event of interest over time as quantified by its (covariate-dependent) cumulative incidence function (Gail and Pfeiffer, 2005; Wolbers and others, 2009). Thus, regression models are particularly attractive when they provide subject-specific estimates of the absolute risks based on a set of covariates (Fine and Gray, 1999; Gerds and others, 2012).

Several measures for quantifying the accuracy of prognostic models have been adapted from the standard survival setting with only one failure cause to competing risks. Measures include prediction error curves (Schoop and others, 2011), time-dependent sensitivity, specificity, and area under the receiver operating characteristic (ROC) curve (AUC) (Saha and Heagerty, 2010), and reclassification methods (Wolbers and others, 2009; Koller, Leening and others, 2012). For survival data, the concordance index (Harrell and others, 1982) is a frequently reported measure of discrimination and we have previously presented a simple adaptation of Harrell's concordance estimator to the competing risks setting (Wolbers and others, 2009).

In the present paper, we motivate and formally define a cause-specific concordance index in the presence of competing risks. Notably, the proposed concordance index depends only on the cumulative incidence function of the event of interest. We clarify the relation of the concordance to time-dependent AUC measures and discuss a possible alternative definition. We then study estimation of a truncated concordance index in the presence of right-censoring. We introduce an inverse probability of censoring weighted (IPCW) estimator and demonstrate its consistency and asymptotic normality if the working model for the censoring distribution is correctly specified. The empirical bias and mean-square error as well as coverage of asymptotic and bootstrap confidence intervals are examined in a simulation study. Finally, we illustrate the methods for an example of coronary risk prediction in older woman using data from the Rotterdam Study (Hofman and others, 2011).

2. Definition of concordance

2.1. Definition for a simple prognostic score without censoring

Competing risks data without censoring are given by pairs Inline graphic of data where Inline graphic is the time to the event and Inline graphic is the event type. For the purpose of discussing the definition and estimation of the cause-specific concordance index it is sufficient to assume that there are only two competing events. Thus, for simplicity of presentation we let Inline graphic denote the event of interest and Inline graphic the occurrence of any competing event. In applications, it may be important to model all competing events separately.

The concordance index is defined for any prognostic score Inline graphic depending on baseline variables Inline graphic which can be used to order subjects with respect to the risk of an event of type 1. For example, Inline graphic could be a single baseline marker or the linear predictor of a regression model for the event of interest derived on a training data set. We assume that higher values of Inline graphic are associated with higher risks of the event of interest.

To motivate our definition with an example, assume that Inline graphic is the time to death and that a specific treatment were available which prevented death due to the event of interest (Inline graphic) but would not affect death from other causes (Inline graphic). The immediate benefit from such a treatment would be greatest for subject with a high risk of dying from the event of interest early, less for individuals dying from the event of interest late, and negligible for subjects with a low risk of experiencing the event of interest at all (i.e. those likely to die from competing causes). Consequently, for a random pair of subjects Inline graphic and Inline graphic, the first subject would be in greater need of treatment than the second subject if they experienced the event of interest (Inline graphic) and the second subject experienced the event of interest later (Inline graphic and Inline graphic) or not at all (Inline graphic). In these cases, the ranking of the risk marker for the pair of subjects is concordant if Inline graphic. Pairs of individuals where both experience the competing event are not comparable as neither of them would be in need of treatment.

To formally define the concordance probability for the event of interest, we assume an independent test set of i.i.d. realizations of Inline graphic from the joint distribution of the marker and the competing risks outcome and define

2.1. (2.1)

for any randomly chosen pair of subjects Inline graphic from this distribution. The concordance probability for the competing event, Inline graphic, is defined analogously.

Define the cumulative incidence function for the event of interest as Inline graphic and the improper random variable Inline graphic as Inline graphic. Inline graphic has a distribution function equal to Inline graphic for Inline graphic and a point mass of Inline graphic at Inline graphic (Fine and Gray, 1999). As an associate editor pointed out, Inline graphic can be written in terms of Inline graphic leading essentially to the standard definition of concordance for survival data: Inline graphic.

We also note that Inline graphic. Thus,

2.1.

and we can rewrite Inline graphic as

2.1. (2.2)

According to (2.2), the cause-specific concordance for event 1 depends on Inline graphic and the marginal distribution of the marker but not on the cumulative incidence function Inline graphic of the competing event. This feature is not obvious in formula (2.1) but desirable when the aim is to assess the discriminative ability of a marker for Inline graphic.

In Appendix B of supplementary material available at Biostatistics online, we illustrate the properties of the concordance probability for a single marker and competing risks outcomes simulated according to cause-specific proportional hazards models with constant baseline hazards. The illustrations suggest that to achieve a high concordance, the marker needs to be strongly associated with an increased cause-specific hazard of the event of interest but only weakly or, even better, reversely associated with the cause-specific hazard of the competing event. This can be explained by the fact that the overall effect of a covariate on the cumulative incidence function of the event of interest depends on both cause-specific baseline hazards and both cause-specific hazard ratios (Beyersmann and others, 2007; Koller, Raatz and others, 2012).

Finally, it is important to discuss modifications of definition (2.1) for tied data (Yan and Greene, 2008). For example, it may happen that Inline graphic. Depending on the application, it may then be sensible to count such pairs with a weight of Inline graphic:

2.1.

To simplify notation, we use definition (2.1) as the basis for our further developments.

2.2. An alternative definition of concordance and relation to time-dependent AUC measures

We motivated our definition of concordance with a specific treatment for the event of interest which does not affect the competing event. In this situation, a case subject Inline graphic with Inline graphic has a larger immediate benefit from treatment than a control subject Inline graphic with Inline graphic or Inline graphic as subjects experiencing a competing event have no benefit from treatment at all. However, in other situations, the treatment may affect both event types and then subjects Inline graphic with Inline graphic and Inline graphic with Inline graphic and Inline graphic would not be comparable. Here, it would be more relevant to distinguish cases Inline graphic with Inline graphic from those who haven not had any event up to that time point, i.e. those with Inline graphic. This leads to an alternative definition of concordance:

2.2.

Of note, Inline graphic also depends on the cumulative incidence function of the competing event Inline graphic. Thus, it might be less suitable if the main goal is to assess the relevance of a marker or a prognostic model for predicting the absolute risk of the event of interest alone, and we will not pursue it further. However, Inline graphic could be valuable for assessing joint models for the cumulative incidence of both competing events.

The proposed concordance measures are closely related to measures of the time-dependent AUC which have been proposed to assess discrimination for competing risks data at a fixed time point Inline graphic (Saha and Heagerty, 2010; Zheng and others, 2012; Blanche and others, 2013). We review these measures in Appendix C of supplementary material available at Biostatistics online, and show that Inline graphic can be written as a weighted average of one proposed time-dependent AUC measure over time. This is in analogy with a similar result for survival analysis without competing risks (Heagerty and Zheng, 2005) and supports the use of Inline graphic as a global summary measure of performance.

2.3. Assessing prediction models in right-censored data

We now generalize the concordance index defined in Section 2.1 in two ways. First, we replace the simple prognostic score Inline graphic by a more general prediction model Inline graphic for the risk of event Inline graphic until time Inline graphic, i.e. estimates of Inline graphic, which can be obtained by combining cause-specific hazards models, by fitting a Fine and Gray regression model or by direct binomial regression (Fine and Gray, 1999; Scheike and others, 2008). Secondly, to include the typical application where individuals have a limited duration of follow-up we define a truncated version of the concordance index. Following Uno and others (2011) and Gerds and others (2013), we define

2.3. (2.3)

The parameter Inline graphic quantifies the ability of the model to correctly rank events of interest up to time Inline graphic and to discriminate them from competing events. The truncation is necessary to enable estimation of Inline graphic from right-censored data with a limited follow-up duration.

As before, we can write the truncated concordance (2.2) as a functional of Inline graphic and of the marginal distribution of the predictor values of a pair of individuals Inline graphic: If we introduce notation for the order of the predicted risks at time Inline graphic for a pair of individuals,

2.3.

then

2.3. (2.4)

3. Estimation of concordance in the presence of right-censoring

3.1. Right-censored data

To indicate the end of follow-up for subject Inline graphic we introduce a subject-specific censoring time Inline graphic. Thus, we observe only Inline graphic Inline graphic where Inline graphic, Inline graphic and Inline graphic. We also use the following notation:

3.1.

The event-free survival probability conditional on the covariate Inline graphic is then given by

3.1.

We allow the censoring distribution to depend on the covariates Inline graphic but assume throughout that Inline graphic is conditionally independent of Inline graphic given Inline graphic. This implies

3.1. (3.1)

where Inline graphic is the conditional probability of being uncensored at time Inline graphic. Noting Inline graphic we also have for Inline graphic:

3.1. (3.2)

3.2. Ignoring non-evaluable pairs

An asymptotically biased estimate of Inline graphic is given by

3.2.

where Inline graphic is an indicator for the order of predicted risks at time Inline graphic. This can be interpreted as the proportion of definitely concordant pairs amongst evaluable pairs, i.e. pairs for which one individual experiences the event of interest and concordance can be decided based on the observed (potentially censored) data.

This estimate evaluated at the time Inline graphic corresponding to the maximum follow-up duration is a direct adaptation of Harrel's Inline graphic for survival data (Harrell and others, 1982) to the competing risks context and has been previously defined in Wolbers and others (2009). While simple, a major problem of this estimator is that by ignoring non-evaluable pairs without any correction, bias is introduced. It is well known that Harrel's Inline graphic depends on the censoring distribution (Uno and others, 2011; Gerds and others, 2013) and Inline graphic shares this limitation.

3.3. IPCW estimate

We derive an IPCW estimate for Inline graphic based on a working model Inline graphic for Inline graphic. Let Inline graphic be a time point where Inline graphic We assume that the model Inline graphic is correctly specified and that for all Inline graphic there exists a uniformly consistent, weakly asymptotically linear estimator sequence Inline graphic with influence function Inline graphic. This implies (Bickel and others, 1993):

3.3. (3.3)

For example, we can specify a Cox regression model and use the estimate

3.3.

where Inline graphic is the Breslow estimator of the baseline hazard function and Inline graphic the maximum partial likelihood estimator of the regression coefficients. If the censoring is conditionally independent of the competing risks outcome given the predictors and the Cox model correctly specified, then condition (3.3) is satisfied (see, e.g. Cheng and others, 1998). As an alternative, we could assume that the censoring is independent of the competing risks outcome and the predictors. If this assumption is correct, then the Kaplan–Meier estimate for the censoring distribution satisfies (3.3).

Based on Inline graphic we construct the weights

3.3.

and define an IPCW estimate of Inline graphic:

3.3. (3.4)

Lemma 3.1 —

If the working model is correctly specified and Inline graphic is a consistent estimator of Inline graphic then Inline graphic is a consistent estimator of Inline graphic for all Inline graphic. Furthermore, if (3.3) is satisfied, then Inline graphic is asymptotically linear and Inline graphic converges in distribution to a normal random variable with mean 0.

A proof of the lemma is given in Appendix D of supplementary material available at Biostatistics online. For the case of independent censoring, supplementary material available at Biostatistics online also presents an explicit formula for the influence function and a consistent estimator of the asymptotic variance. The proposed concordance estimator has been implemented with the function cindex of the R package pec (Mogensen and others, 2012). Example code is provided in Appendix A of supplementary material available at Biostatistics online.

4. Simulation study

A simulation study was performed to assess bias and root mean square error (RMSE) of the proposed IPCW estimator and coverage of asymptotic and bootstrap confidence intervals. Simulations were for a single prognostic marker Inline graphic and a parameter-free time-independent model Inline graphic. The covariate Inline graphic was simulated to follow a standard normal distribution.

Conditional on Inline graphic, uncensored competing risks data Inline graphic was assumed to follow cause-specific Cox-exponential models (Bender and others, 2006):

4. (4.1)

This was implemented by simulating latent exponentially distributed event times Inline graphic and Inline graphic and then setting Inline graphic and Inline graphic for Inline graphic and Inline graphic for Inline graphic. We consider two competing risks scenarios. In scenario CR1, we set Inline graphic and Inline graphic. In scenario CR2, we set Inline graphic and Inline graphic. As truncation time points, we used the median and the 75% quantile Inline graphic of the marginal distribution of Inline graphic. The truncation time points and corresponding true values of Inline graphic were determined by simulation based on a large uncensored data set of size Inline graphic.

Censoring times Inline graphic were drawn under a third Cox-exponential model:

4.

The observed time Inline graphic was obtained as the minimum of Inline graphic and Inline graphic and considered as right-censored if Inline graphic. We repeated the simulations for independent censoring (Inline graphic) and covariate-dependent censoring (Inline graphic). For each truncation time point Inline graphic the values of Inline graphic were found by simulation such that the expected proportion of right-censored event times amongst observations with Inline graphic was 25%, 50%, or 75%, respectively. For each of the scenarios, we report results for sample sizes 250 and 1000 averaged across 1000 simulated data sets.

In each simulated data set, we computed three different estimators of Inline graphic: the naive estimator Inline graphic, the IPCW estimator based on the marginal Kaplan–Meier estimate of the censoring distribution Inline graphic and the IPCW estimator based on a Cox regression model for the censoring distribution Inline graphic.

Bias and root mean squared errors for Inline graphic chosen as the Inline graphic-quantile of Inline graphic are shown in Table 1. Table 2 shows the associated coverage of percentile bootstrap (all estimators) and asymptotic Wald-type confidence intervals (Inline graphic only) and contrasts empirical standard errors with average bootstrap and asymptotic standard errors for the estimate Inline graphic. Corresponding results for Inline graphic chosen as the median are shown in Appendix E of supplementary material available at Biostatistics online.

Table 1.

Average bias and RMSE for Inline graphic different estimators of Inline graphic averaged over Inline graphic data sets simulated under the Inline graphic scenarios CR1 and CR2 for varying sample size Inline graphic independent Inline graphic or covariate-dependent censoring Inline graphic respectivelyInline graphic and varying censoring rates

Inline graphic Inline graphic Censored before Inline graphic (%) Inline graphic Inline graphic Inline graphic
CR1: Inline graphic, Inline graphic
250 0 25 1.5 (4.0) 0.1 (3.7) 0.1 (3.7)
250 0 50 3.9 (5.9) 0.3 (4.8) 0.3 (4.8)
250 0 75 8.1 (10.0) 2.6 (10.1) 2.8 (9.5)
250 1 25 1.0 (4.0) Inline graphic0.1 (3.9) 0.1 (3.9)
250 1 50 2.6 (5.4) Inline graphic0.3 (4.9) Inline graphic0.2 (5.8)
250 1 75 5.1 (8.4) 0.2 (8.2) Inline graphic1.9 (10.3)
1000 0 25 1.4 (2.3) 0.0 (1.9) 0.0 (1.9)
1000 0 50 3.8 (4.3) 0.0 (2.3) 0.0 (2.3)
1000 0 75 7.9 (8.4) 1.0 (6.4) 1.2 (6.0)
1000 1 25 0.9 (2.1) Inline graphic0.3 (1.9) 0.0 (2.0)
1000 1 50 2.5 (3.3) Inline graphic0.5 (2.3) Inline graphic0.1 (3.4)
1000 1 75 5.0 (5.9) Inline graphic0.4 (3.8) Inline graphic1.8 (7.5)
CR2: Inline graphic, Inline graphic
250 0 25 0.7 (1.8) 0.1 (1.7) 0.1 (1.7)
250 0 50 1.5 (2.4) 0.2 (2.1) 0.2 (2.1)
250 0 75 2.8 (3.7) 0.7 (4.1) 0.8 (3.7)
250 1 25 0.6 (1.9) 0.0 (1.8) 0.1 (1.7)
250 1 50 1.3 (2.5) Inline graphic0.1 (2.3) 0.1 (2.3)
250 1 75 2.2 (3.8) Inline graphic0.2 (4.4) Inline graphic0.3 (5.6)
1000 0 25 0.7 (1.1) 0.0 (0.8) 0.0 (0.8)
1000 0 50 1.4 (1.7) 0.0 (1.0) 0.0 (1.0)
1000 0 75 2.7 (3.0) 0.2 (2.8) 0.3 (2.5)
1000 1 25 0.6 (1.0) 0.0 (0.9) 0.1 (0.8)
1000 1 50 1.3 (1.6) Inline graphic0.2 (1.2) 0.1 (1.1)
1000 1 75 2.1 (2.6) Inline graphic0.7 (2.4) Inline graphic0.7 (4.3)

Inline graphic was chosen as the Inline graphic-quantile of the marginal time-to-event distribution. Column Inline graphic shows the expected proportion of right-censored event times amongst observations with Inline graphic. Columns Inline graphicInline graphic show average bias Inline graphicRMSEInline graphic for the three estimators Inline graphicmultiplied by Inline graphic for easier readabilityInline graphic.

Table 2.

Coverage of confidence intervals for the same simulation scenarios as in Table 1

Standard error KM
Coverage
Inline graphic Inline graphic Censored before Inline graphic (%) Empirical Average asymptotic Average bootstrap Asymptotic KM Bootstrap naive Bootstrap KM Cox
CR1: Inline graphic, Inline graphic
250 0 25 0.0372 0.0384 0.0384 94.5 91.6 94.2 94.3
250 0 50 0.048 0.0473 0.0478 93.4 83.6 93.6 93.2
250 0 75 0.0974 0.0707 0.077 77.8 67.3 81.2 82.1
250 1 25 0.0388 0.0392 0.0395 94.2 93.4 94.5 95.1
250 1 50 0.0486 0.0479 0.0486 93.8 90.1 94.1 93.5
250 1 75 0.0823 0.0696 0.0733 86.4 81.4 88.6 88.4
1000 0 25 0.0188 0.0192 0.019 95.4 88.9 95.1 95.2
1000 0 50 0.0234 0.0238 0.0238 95.1 59.6 95.3 95.1
1000 0 75 0.0635 0.0475 0.0499 80.7 23.2 81.9 82.6
1000 1 25 0.0193 0.0195 0.0195 94.9 91.6 94.6 94.3
1000 1 50 0.0229 0.0239 0.0239 95.4 81.6 95.5 93.4
1000 1 75 0.0376 0.0368 0.037 93.9 64.6 93.8 89.2
CR2: Inline graphic, Inline graphic
250 0 25 0.0168 0.0173 0.0174 95.1 91.4 95.0 95.0
250 0 50 0.0207 0.0208 0.021 94.3 84.2 94.5 94.4
250 0 75 0.0404 0.0343 0.0359 85.4 72.5 88.2 87.9
250 1 25 0.0181 0.0183 0.0186 95.0 92.2 95.3 95.2
250 1 50 0.0232 0.0235 0.024 94.6 87.9 95.2 95.9
250 1 75 0.0439 0.0384 0.0399 87.4 81.2 88.8 93.3
1000 0 25 0.00831 0.00859 0.00856 95.3 86.7 95.5 95.6
1000 0 50 0.0101 0.0103 0.0103 95.6 66.7 95.4 95.7
1000 0 75 0.0277 0.0218 0.0224 87.4 41.4 88.8 89.8
1000 1 25 0.00878 0.00905 0.00914 95.2 88.7 95.4 95.9
1000 1 50 0.0116 0.0118 0.0119 95.8 76.5 95.9 95.3
1000 1 75 0.0234 0.022 0.0221 93.7 67.6 94.2 94.0

Columns Inline graphicInline graphic display observed coverage of Inline graphic percentile bootstrap confidence intervals for all three estimatorsInline graphic column Inline graphic shows coverage of asymptotic Wald-type confidence intervals for Inline graphic. Columns Inline graphicInline graphic show the empirical standard error for the Inline graphic estimates and the average asymptotic and bootstrap standard errors of Inline graphic.

Based on Tables 1 and 2, we draw the following conclusions.

  1. The naive estimator Inline graphic can be biased and this can lead to insufficient coverage.

  2. The ICPW estimator Inline graphic can also be biased if the censoring depends on the covariate. In some cases, Inline graphic has a smaller bias but for high rates of censoring it can do worse than Inline graphic even though the censoring depends on the covariate.

  3. Coverage of confidence intervals for Inline graphic and Inline graphic was generally close to the nominal 95% except for some scenarios with high censoring rates of 75%. Both average asymptotic and bootstrap standard errors closely resembled the empirical standard errors of Inline graphic.

5. Application to coronary risk prediction

Specialist medical societies recommend initiation of preventive treatment for CHD based on a subjects’ predicted 10-year risk for CHD (NCEP, 2002). To accurately predict the absolute risk of CHD in older people, prognostic models for CHD need to account for the competing risk of non-CHD death (Koller, Leening and others, 2012). In this section, we revisit the example of Wolbers and others (2009) on coronary risk prediction based on data of elderly women from the Rotterdam Study, a prospective, population-based cohort of elderly subjects living in a suburb area of Rotterdam, the Netherlands (Hofman and others, 2011).

We analyzed data from 10 years of follow-up of 4144 women aged between 55 and 90 years who were free of CHD at baseline. During that follow-up period, 389 women experienced a CHD event and 921 women died without prior CHD event. Only 41 women of those event-free had less than 10 years of follow-up. We randomly split the data set into a training data set (2763 women with 249 CHD events) and a validation data set (1381 women with 140 CHD events).

Using the training set, we estimated the parameters of a Fine–Gray regression model which included the “traditional” baseline risk factors for CHD: age, treatment for high blood pressure (yes versus no), systolic blood pressure (separate slopes depending on whether the subject was on blood pressure treatment or not), diabetes mellitus, log-transformed total cholesterol to HDL cholesterol ratio, and smoking status (current versus never or former smoker). All these risk factors were associated with an increased CHD risk and, except for diabetes, all reached conventional significance (Inline graphic). We also investigated the role of age, the strongest predictor variable, as a simple marker for CHD.

Concordance estimates were obtained for these models in the validation set. The dependence of the censoring distribution on the covariates was investigated with a Cox regression model which yielded no trends and non-significant Wald tests for all variables. Thus, all IPCW estimates of concordance were based on the marginal Kaplan–Meier estimator for the censoring distribution from the validation set.

The left panel of Figure 1 shows the discrimination ability of the two Fine–Gray models for varying time horizons between 1 and 10 years. The model including all risk factors shows higher discriminative ability compared with the model based on age alone. For both models, concordance estimates stabilize after about 2.5 years of follow-up and remain fairly stable though slightly decreasing. The decrease may occur because earlier events are easier to predict than later events.

Fig. 1.

Fig. 1.

Left panel: IPCW estimates of Inline graphic for the multiple Fine and Gray model (solid line) and the model with age as the only covariate (dashed line) for a follow-up duration of 1–10 years in the validation data. Error bars at 2.5, 5, 7.5, and 10 years of follow-up correspond to bootstrap standard errors. Right panel: time-dependent receiver operating characteristic curve at time Inline graphic years for the multiple Fine and Gray model (solid line) and the model with age as the only covariate (dashed line) in the validation data. Cases were defined as subjects with Inline graphic and Inline graphic, controls as subjects with Inline graphic or Inline graphic.

The right panel of Figure 1 shows time-dependent ROC curves at time Inline graphic10 years for the two models. For this graph, cases were defined as subjects with Inline graphic and Inline graphic, controls as subjects with Inline graphic or Inline graphic. Estimation was also based on IPCW-weighting as implemented in the R package Inline graphic (Blanche and others, 2013).

Table 3 shows the estimated concordance for predicting CHD and non-CHD death, respectively, during the 10 years follow-up in the validation data. The Fine and Gray model for non-CHD death used the same covariates as the model for CHD. Age alone is a strong predictor for non-CHD death in this elderly population but the multiple Fine and Gray model did not substantially improve concordance. This is not surprising as most additional covariates are established CHD-specific risk factors which would not be expected to strongly affect non-CHD death (except for other deaths related to the cardiovascular system).

Table 3.

Estimated concordance and AUC measures in the validation data of the CHD study for both competing risks Inline graphicin Inline graphic

CHD
Non-CHD death
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Age only 63.7 72.3 64.3 75.7 80.8 78.0
Fine–Gray 71.6 79.4 72.4 76.2 81.3 78.6

Truncation point for concordance is Inline graphic years. AUC measures are also reported at Inline graphic years. Cumulative cases were defined for both AUC measures as subjects with Inline graphic and D = “event type studied” (CHD or non-CHD death, respectively), controls as subjects with Inline graphic for Inline graphic and as subjects with Inline graphic or D = “other event type” for Inline graphic.

Table 3 also displays time-dependent AUC measures at Inline graphic years in the validation data using two different definitions of controls (Blanche and others, 2013). AUCs with controls defined as subjects with Inline graphic or Inline graphic (consistently with our concordance definition) were slightly higher but comparable with the concordance whereas AUCs using only subjects with Inline graphic as controls were substantially higher. This could be explained by the fact that age is a strong predictor of both CHD and non-CHD death which hampers discrimination of CHD events from non-CHD deaths.

6. Discussion

We have presented a formal definition of the concordance probability for prognostic models in the presence of competing risks. Like the concordance probability for survival or binary data, it provides a simple overall numeric measure of discrimination. To deal with right-censored data, we derived an IPCW estimator of the truncated concordance probability and established consistency and asymptotic normality under mild assumptions. Asymptotic properties of the proposed estimator rely on the assumption that the censoring distribution is correctly specified and conditionally independent of the competing risks process given covariates. In many applications, it will be reasonable to assume that the censoring mechanism does not depend on covariates and then the marginal Kaplan–Meier estimate of the censoring distribution can be used. However, if for some reason the design or conduct of a clinical study introduced a dependence between the follow-up duration and covariates which also affect the competing risks process (any component, including the cause-specific hazards of competing events), then it is recommended to use a working regression model for the censoring distribution in order to avoid biased conclusions.

The fact that we estimate a truncated version of concordance rather than the unconstrained concordance probability should not be seen as a limitation of our approach. Indeed it is impossible to assess the performance of prognostic models beyond the maximum follow-up duration without strong and untestable assumptions. If we assume independent censoring, our estimator is defined if we truncate at any time before or at the largest observed censoring or event time. As for the Kaplan–Meier estimator, the effect of censoring on the variability of the IPCW estimator is increasing with increasing truncation time. However, it is difficult to recommend a general purpose truncation time, in particular because the truncation time point influences the interpretation of the concordance probability. To avoid unstable results in practical applications, we recommend that the analyst develops an appropriate model for the censoring distribution, e.g. the Kaplan–Meier estimator or a Cox model, and then investigates the predicted probabilities of being uncensored at the candidate truncation times. Multiple truncation time points can be evaluated and it can be useful to compare discrimination ability at different truncation time points. A model which is good at discriminating patients with an early failure time (e.g. after surgery) from others may not be good at discriminating subsequent failure times amongst patients who survive a first high risk period.

As discussed in Appendix C of supplementary material available at Biostatistics online, our approach is related to time-dependent sensitivity, specificity, and ROC curves for competing risks (Saha and Heagerty, 2010) and the concordance can be written as a weighted average of the time-dependent Inline graphic for incident cases (Inline graphic, Inline graphic) and controls defined as observations with Inline graphic or Inline graphic. Thus, the concordance serves as an overall summary of discrimination whereas the time-dependent AUC measures discrimination of the event status at one specific time point.

We assumed that the prognostic model was derived on an independent training data set and only in this setting are asymptotic or bootstrap confidence intervals for the truncated concordance readily available. Clearly, independent training data are not always available, and even if they are, a joint analysis of all data will be more efficient. However, some form of internal cross-validation is needed to develop and assess a prognostic model with a single data set (Efron and Tibshirani, 1997; Gerds and others, 2008; Hastie and others, 2009).

It is important to emphasize that our definition of concordance assesses a prognostic model for the absolute risk of the event of interest in the presence of competing risks. In line with earlier work (Gail and Pfeiffer, 2005; Wolbers and others, 2009), we regard this risk as crucial for medical decision-making in the competing risks setting. However, in many instances explicit consideration of competing events will also be important and modeling the entire competing risks multi-state process will provide further insights (Beyersmann and others, 2007). As an example, our illustration of concordance for a single marker (presented in supplementary material available at Biostatistics online) shows that discrimination of prognostic models for the event of interest is hampered if covariates affect both cause-specific hazards with regression coefficients of the same sign, especially if there is a strong association with the competing risk or if the baseline competing hazard is high. This indicates that to achieve high discrimination ability one needs predictors which are only weakly or, even better, reversely associated with the cause-specific hazard of the competing event. Moreover, in settings where all competing events are of similar importance, joint accuracy criteria for the entire competing risks multi-state process are needed and their development is an important area for future research.

Supplementary material

Supplementary material is available at http://biostatistics.oxfordjournals.org.

Funding

M.W. was supported by the Wellcome Trust and the Li Ka Shing Foundation—University of Oxford Global Health Programme. Funding to pay the Open Access publication charges for this article was provided by the Wellcome Trust.

Supplementary Material

Supplementary Data

Acknowledgments

Computer time for this study was provided by the computing facilities MCIA (Mésocentre de Calcul Intensif Aquitain) of the Université de Bordeaux and of the Université de Pau et des Pays de l’Adour. Conflict of Interest: None declared.

References

  1. Bender R., Augustin T., Blettner M. Generating survival times to simulate Cox proportional hazards models. Statistics in Medicine. 2005;24:1713–1723. doi: 10.1002/sim.2059. [DOI] [PubMed] [Google Scholar]
  2. Beyersmann J., Dettenkofer M., Bertz H., Schumacher M. A competing risks analysis of bloodstream infection after stem-cell transplantation using subdistribution hazards and cause-specific hazards. Statistics in Medicine. 2007;26:5360–5369. doi: 10.1002/sim.3006. [DOI] [PubMed] [Google Scholar]
  3. Bickel P. J., Klaassen C. A., Ritov Y., Wellner J. A. Efficient and Adaptive Estimation for Semiparametric Models. Baltimore: Johns Hopkins University Press; 1993. [Google Scholar]
  4. Blanche P., Dartigues J.-F., Jacqmin-Gadda H. Estimating and comparing time-dependent areas under receiver operating characteristic curves for censored event times with competing risks. Statistics in Medicine. 2013;32:5381–5397. doi: 10.1002/sim.5958. [DOI] [PubMed] [Google Scholar]
  5. Cheng S. C., Fine J. P., Wei L. J. Prediction of cumulative incidence function under the proportional hazards model. Biometrics. 1998;54:219–228. [PubMed] [Google Scholar]
  6. Efron B., Tibshirani R. Improvements on cross-validation: the .632+ bootstrap method. Journal of the American Statistical Association. 1997;92:548–560. [Google Scholar]
  7. Fine J. P., Gray R. J. A proportional hazards model for the subdistribution of a competing risk. Journal of the American Statistical Association. 1999;446:496–509. [Google Scholar]
  8. Gail M. H., Pfeiffer R. M. On criteria for evaluating models of absolute risk. Biostatistics. 2005;6:227–239. doi: 10.1093/biostatistics/kxi005. [DOI] [PubMed] [Google Scholar]
  9. Gerds T. A., Cai T., Schumacher M. The performance of risk prediction models. Biometrical Journal. 2008;50:457–479. doi: 10.1002/bimj.200810443. [DOI] [PubMed] [Google Scholar]
  10. Gerds T. A., Kattan M. W., Schumacher M., Yu C. Estimating a time-dependent concordance index for survival prediction models with covariate dependent censoring. Statistics in Medicine. 2013;32:2173–2184. doi: 10.1002/sim.5681. [DOI] [PubMed] [Google Scholar]
  11. Gerds T. A., Scheike T. H., Andersen P. K. Absolute risk regression for competing risks: interpretation, link functions, and prediction. Statistics in Medicine. 2012;31:3921–3930. doi: 10.1002/sim.5459. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Grunkemeier G. L., Jin R., Eijkemans M. J., Takkenberg J. J. Actual and actuarial probabilities of competing risks: apples and lemons. Annals of Thoracic Surgery. 2007;83:1586–1592. doi: 10.1016/j.athoracsur.2006.11.044. [DOI] [PubMed] [Google Scholar]
  13. Harrell F. E., Califf R. M., Pryor D. B., Lee K. L., Rosati R. A. Evaluating the yield of medical tests. Journal of the American Medical Association. 1982;247:2543–2546. [PubMed] [Google Scholar]
  14. Hastie T., Tibshirani R., Friedman J. H. The Elements of Statistical Learning. 2nd edition. New York: Springer; 2009. Springer Series in Statistics. [Google Scholar]
  15. Heagerty P. J., Zheng Y. Survival model predictive accuracy and ROC curves. Biometrics. 2005;61:92–105. doi: 10.1111/j.0006-341X.2005.030814.x. [DOI] [PubMed] [Google Scholar]
  16. Hofman A., van Duijn C. M., Franco O. H., Ikram M. A., Janssen H. L., Klaver C. C., Kuipers E. J., Nijsten T. E., Stricker B. H., Tiemeier H. The Rotterdam Study: 2012 objectives and design update. European Journal of Epidemiology. 2011;26:657–686. doi: 10.1007/s10654-011-9610-5. and others. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Koller M. T., Leening M. J., Wolbers M., Steyerberg E. W., Hunink M. G., Schoop R., Hofman A., Bucher H. C., Psaty B. M., Lloyd-Jones D. M. Development and validation of a coronary risk prediction model for older U.S. and European persons in the cardiovascular health study and the Rotterdam Study. Annals of Internal Medicine. 2012;157:389–397. doi: 10.7326/0003-4819-157-6-201209180-00002. and others. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Koller M. T., Raatz H., Steyerberg E. W., Wolbers M. Competing risks and the clinical community: irrelevance or ignorance? Statistics in Medicine. 2012;31:1089–1097. doi: 10.1002/sim.4384. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Mogensen U. B., Ishwaran H., Gerds T. A. Evaluating random forests for survival analysis using prediction error curves. Journal of Statistical Software. 2012;50:1–23. doi: 10.18637/jss.v050.i11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. NCEP. Third Report of the National Cholesterol Education Program (NCEP) Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (Adult Treatment Panel III) final report. Circulation. 2002;106:3143–3421. [PubMed] [Google Scholar]
  21. Putter H., Fiocco M., Geskus R. B. Tutorial in biostatistics: competing risks and multi-state models. Statistics in Medicine. 2007;26:2389–2430. doi: 10.1002/sim.2712. [DOI] [PubMed] [Google Scholar]
  22. Saha P., Heagerty P. J. Time-dependent predictive accuracy in the presence of competing risks. Biometrics. 2010;66:999–1011. doi: 10.1111/j.1541-0420.2009.01375.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Scheike T., Zhang M. J., Gerds T. A. Predicting cumulative incidence probability by direct binomial regression. Biometrika. 2008;95:205–220. [Google Scholar]
  24. Schoop R., Beyersmann J., Schumacher M., Binder H. Quantifying the predictive accuracy of time-to-event models in the presence of competing risks. Biometric Journal. 2011;53:88–112. doi: 10.1002/bimj.201000073. [DOI] [PubMed] [Google Scholar]
  25. Uno H., Cai T., Pencina M. J., D'Agostino R. B., Wei L. J. On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Statistics in Medicine. 2011;30:1105–1117. doi: 10.1002/sim.4154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Wolbers M., Koller M. T., Witteman J. C., Steyerberg E. W. Prognostic models with competing risks: methods and application to coronary risk prediction. Epidemiology. 2009;20:555–561. doi: 10.1097/EDE.0b013e3181a39056. [DOI] [PubMed] [Google Scholar]
  27. Yan G., Greene T. Investigating the effects of ties on measures of concordance. Statistics in Medicine. 2008;27:4190–4206. doi: 10.1002/sim.3257. [DOI] [PubMed] [Google Scholar]
  28. Zheng Y., Cai T., Jin Y., Feng Z. Evaluating prognostic accuracy of biomarkers under competing risk. Biometrics. 2012;68:388–396. doi: 10.1111/j.1541-0420.2011.01671.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Biostatistics (Oxford, England) are provided here courtesy of Oxford University Press

RESOURCES