Abstract
The concordance probability is a widely used measure to assess discrimination of prognostic models with binary and survival endpoints. We formally define the concordance probability for a prognostic model of the absolute risk of an event of interest in the presence of competing risks and relate it to recently proposed time-dependent area under the receiver operating characteristic curve measures. For right-censored data, we investigate inverse probability of censoring weighted (IPCW) estimates of a truncated concordance index based on a working model for the censoring distribution. We demonstrate consistency and asymptotic normality of the IPCW estimate if the working model is correctly specified and derive an explicit formula for the asymptotic variance under independent censoring. The small sample properties of the estimator are assessed in a simulation study also against misspecification of the working model. We further illustrate the methods by computing the concordance probability for a prognostic model of coronary heart disease (CHD) events in the presence of the competing risk of non-CHD death.
Keywords: C index, Competing risks, Concordance probability, Coronary heart disease, Prognostic models, Time-dependent AUC
1. Introduction
Clinical decision-making and cost-effectiveness analyses often rely on prognostic models that quantify a subject's absolute risk of a disease event of interest over time. However, study populations increasingly consist of elderly individuals with varying degrees of co-morbidity who are likely to experience one of several disease endpoints other than the endpoint of main interest (Koller, Raatz and others, 2012). As an example, prediction of coronary heart disease (CHD) events in elderly subjects is complicated by the fact that subjects may die from other causes prior to the observation of the disease event of interest (Wolbers and others, 2009; Koller, Leening and others, 2012).
It is well known that the naive application of standard survival analysis leads to bias and risk over-estimation if competing risks are present and that specialized methods are needed (Grunkemeier and others, 2007; Putter and others, 2007). A key quantity for medical decision-making in the presence of competing risks is the absolute risk of the event of interest over time as quantified by its (covariate-dependent) cumulative incidence function (Gail and Pfeiffer, 2005; Wolbers and others, 2009). Thus, regression models are particularly attractive when they provide subject-specific estimates of the absolute risks based on a set of covariates (Fine and Gray, 1999; Gerds and others, 2012).
Several measures for quantifying the accuracy of prognostic models have been adapted from the standard survival setting with only one failure cause to competing risks. Measures include prediction error curves (Schoop and others, 2011), time-dependent sensitivity, specificity, and area under the receiver operating characteristic (ROC) curve (AUC) (Saha and Heagerty, 2010), and reclassification methods (Wolbers and others, 2009; Koller, Leening and others, 2012). For survival data, the concordance index (Harrell and others, 1982) is a frequently reported measure of discrimination and we have previously presented a simple adaptation of Harrell's concordance estimator to the competing risks setting (Wolbers and others, 2009).
In the present paper, we motivate and formally define a cause-specific concordance index in the presence of competing risks. Notably, the proposed concordance index depends only on the cumulative incidence function of the event of interest. We clarify the relation of the concordance to time-dependent AUC measures and discuss a possible alternative definition. We then study estimation of a truncated concordance index in the presence of right-censoring. We introduce an inverse probability of censoring weighted (IPCW) estimator and demonstrate its consistency and asymptotic normality if the working model for the censoring distribution is correctly specified. The empirical bias and mean-square error as well as coverage of asymptotic and bootstrap confidence intervals are examined in a simulation study. Finally, we illustrate the methods for an example of coronary risk prediction in older woman using data from the Rotterdam Study (Hofman and others, 2011).
2. Definition of concordance
2.1. Definition for a simple prognostic score without censoring
Competing risks data without censoring are given by pairs
of data where
is the time to the event and
is the event type. For the purpose of discussing the definition and estimation of the cause-specific concordance index it is sufficient to assume that there are only two competing events. Thus, for simplicity of presentation we let
denote the event of interest and
the occurrence of any competing event. In applications, it may be important to model all competing events separately.
The concordance index is defined for any prognostic score
depending on baseline variables
which can be used to order subjects with respect to the risk of an event of type 1. For example,
could be a single baseline marker or the linear predictor of a regression model for the event of interest derived on a training data set. We assume that higher values of
are associated with higher risks of the event of interest.
To motivate our definition with an example, assume that
is the time to death and that a specific treatment were available which prevented death due to the event of interest (
) but would not affect death from other causes (
). The immediate benefit from such a treatment would be greatest for subject with a high risk of dying from the event of interest early, less for individuals dying from the event of interest late, and negligible for subjects with a low risk of experiencing the event of interest at all (i.e. those likely to die from competing causes). Consequently, for a random pair of subjects
and
, the first subject would be in greater need of treatment than the second subject if they experienced the event of interest (
) and the second subject experienced the event of interest later (
and
) or not at all (
). In these cases, the ranking of the risk marker for the pair of subjects is concordant if
. Pairs of individuals where both experience the competing event are not comparable as neither of them would be in need of treatment.
To formally define the concordance probability for the event of interest, we assume an independent test set of i.i.d. realizations of
from the joint distribution of the marker and the competing risks outcome and define
![]() |
(2.1) |
for any randomly chosen pair of subjects
from this distribution. The concordance probability for the competing event,
, is defined analogously.
Define the cumulative incidence function for the event of interest as
and the improper random variable
as
.
has a distribution function equal to
for
and a point mass of
at
(Fine and Gray, 1999). As an associate editor pointed out,
can be written in terms of
leading essentially to the standard definition of concordance for survival data:
.
We also note that
. Thus,
![]() |
and we can rewrite
as
![]() |
(2.2) |
According to (2.2), the cause-specific concordance for event 1 depends on
and the marginal distribution of the marker but not on the cumulative incidence function
of the competing event. This feature is not obvious in formula (2.1) but desirable when the aim is to assess the discriminative ability of a marker for
.
In Appendix B of supplementary material available at Biostatistics online, we illustrate the properties of the concordance probability for a single marker and competing risks outcomes simulated according to cause-specific proportional hazards models with constant baseline hazards. The illustrations suggest that to achieve a high concordance, the marker needs to be strongly associated with an increased cause-specific hazard of the event of interest but only weakly or, even better, reversely associated with the cause-specific hazard of the competing event. This can be explained by the fact that the overall effect of a covariate on the cumulative incidence function of the event of interest depends on both cause-specific baseline hazards and both cause-specific hazard ratios (Beyersmann and others, 2007; Koller, Raatz and others, 2012).
Finally, it is important to discuss modifications of definition (2.1) for tied data (Yan and Greene, 2008). For example, it may happen that
. Depending on the application, it may then be sensible to count such pairs with a weight of
:
![]() |
To simplify notation, we use definition (2.1) as the basis for our further developments.
2.2. An alternative definition of concordance and relation to time-dependent AUC measures
We motivated our definition of concordance with a specific treatment for the event of interest which does not affect the competing event. In this situation, a case subject
with
has a larger immediate benefit from treatment than a control subject
with
or
as subjects experiencing a competing event have no benefit from treatment at all. However, in other situations, the treatment may affect both event types and then subjects
with
and
with
and
would not be comparable. Here, it would be more relevant to distinguish cases
with
from those who haven not had any event up to that time point, i.e. those with
. This leads to an alternative definition of concordance:
![]() |
Of note,
also depends on the cumulative incidence function of the competing event
. Thus, it might be less suitable if the main goal is to assess the relevance of a marker or a prognostic model for predicting the absolute risk of the event of interest alone, and we will not pursue it further. However,
could be valuable for assessing joint models for the cumulative incidence of both competing events.
The proposed concordance measures are closely related to measures of the time-dependent AUC which have been proposed to assess discrimination for competing risks data at a fixed time point
(Saha and Heagerty, 2010; Zheng and others, 2012; Blanche and others, 2013). We review these measures in Appendix C of supplementary material available at Biostatistics online, and show that
can be written as a weighted average of one proposed time-dependent AUC measure over time. This is in analogy with a similar result for survival analysis without competing risks (Heagerty and Zheng, 2005) and supports the use of
as a global summary measure of performance.
2.3. Assessing prediction models in right-censored data
We now generalize the concordance index defined in Section 2.1 in two ways. First, we replace the simple prognostic score
by a more general prediction model
for the risk of event
until time
, i.e. estimates of
, which can be obtained by combining cause-specific hazards models, by fitting a Fine and Gray regression model or by direct binomial regression (Fine and Gray, 1999; Scheike and others, 2008). Secondly, to include the typical application where individuals have a limited duration of follow-up we define a truncated version of the concordance index. Following Uno and others (2011) and Gerds and others (2013), we define
![]() |
(2.3) |
The parameter
quantifies the ability of the model to correctly rank events of interest up to time
and to discriminate them from competing events. The truncation is necessary to enable estimation of
from right-censored data with a limited follow-up duration.
As before, we can write the truncated concordance (2.2) as a functional of
and of the marginal distribution of the predictor values of a pair of individuals
: If we introduce notation for the order of the predicted risks at time
for a pair of individuals,
![]() |
then
![]() |
(2.4) |
3. Estimation of concordance in the presence of right-censoring
3.1. Right-censored data
To indicate the end of follow-up for subject
we introduce a subject-specific censoring time
. Thus, we observe only
where
,
and
. We also use the following notation:
![]() |
The event-free survival probability conditional on the covariate
is then given by
![]() |
We allow the censoring distribution to depend on the covariates
but assume throughout that
is conditionally independent of
given
. This implies
![]() |
(3.1) |
where
is the conditional probability of being uncensored at time
. Noting
we also have for
:
![]() |
(3.2) |
3.2. Ignoring non-evaluable pairs
An asymptotically biased estimate of
is given by
![]() |
where
is an indicator for the order of predicted risks at time
. This can be interpreted as the proportion of definitely concordant pairs amongst evaluable pairs, i.e. pairs for which one individual experiences the event of interest and concordance can be decided based on the observed (potentially censored) data.
This estimate evaluated at the time
corresponding to the maximum follow-up duration is a direct adaptation of Harrel's
for survival data (Harrell and others, 1982) to the competing risks context and has been previously defined in Wolbers and others (2009). While simple, a major problem of this estimator is that by ignoring non-evaluable pairs without any correction, bias is introduced. It is well known that Harrel's
depends on the censoring distribution (Uno and others, 2011; Gerds and others, 2013) and
shares this limitation.
3.3. IPCW estimate
We derive an IPCW estimate for
based on a working model
for
. Let
be a time point where
We assume that the model
is correctly specified and that for all
there exists a uniformly consistent, weakly asymptotically linear estimator sequence
with influence function
. This implies (Bickel and others, 1993):
![]() |
(3.3) |
For example, we can specify a Cox regression model and use the estimate
![]() |
where
is the Breslow estimator of the baseline hazard function and
the maximum partial likelihood estimator of the regression coefficients. If the censoring is conditionally independent of the competing risks outcome given the predictors and the Cox model correctly specified, then condition (3.3) is satisfied (see, e.g. Cheng and others, 1998). As an alternative, we could assume that the censoring is independent of the competing risks outcome and the predictors. If this assumption is correct, then the Kaplan–Meier estimate for the censoring distribution satisfies (3.3).
Based on
we construct the weights
![]() |
and define an IPCW estimate of
:
![]() |
(3.4) |
Lemma 3.1 —
If the working model is correctly specified and
is a consistent estimator of
then
is a consistent estimator of
for all
. Furthermore, if (3.3) is satisfied, then
is asymptotically linear and
converges in distribution to a normal random variable with mean 0.
A proof of the lemma is given in Appendix D of supplementary material available at Biostatistics online. For the case of independent censoring, supplementary material available at Biostatistics online also presents an explicit formula for the influence function and a consistent estimator of the asymptotic variance. The proposed concordance estimator has been implemented with the function cindex of the R package pec (Mogensen and others, 2012). Example code is provided in Appendix A of supplementary material available at Biostatistics online.
4. Simulation study
A simulation study was performed to assess bias and root mean square error (RMSE) of the proposed IPCW estimator and coverage of asymptotic and bootstrap confidence intervals. Simulations were for a single prognostic marker
and a parameter-free time-independent model
. The covariate
was simulated to follow a standard normal distribution.
Conditional on
, uncensored competing risks data
was assumed to follow cause-specific Cox-exponential models (Bender and others, 2006):
![]() |
(4.1) |
This was implemented by simulating latent exponentially distributed event times
and
and then setting
and
for
and
for
. We consider two competing risks scenarios. In scenario CR1, we set
and
. In scenario CR2, we set
and
. As truncation time points, we used the median and the 75% quantile
of the marginal distribution of
. The truncation time points and corresponding true values of
were determined by simulation based on a large uncensored data set of size
.
Censoring times
were drawn under a third Cox-exponential model:
![]() |
The observed time
was obtained as the minimum of
and
and considered as right-censored if
. We repeated the simulations for independent censoring (
) and covariate-dependent censoring (
). For each truncation time point
the values of
were found by simulation such that the expected proportion of right-censored event times amongst observations with
was 25%, 50%, or 75%, respectively. For each of the scenarios, we report results for sample sizes 250 and 1000 averaged across 1000 simulated data sets.
In each simulated data set, we computed three different estimators of
: the naive estimator
, the IPCW estimator based on the marginal Kaplan–Meier estimate of the censoring distribution
and the IPCW estimator based on a Cox regression model for the censoring distribution
.
Bias and root mean squared errors for
chosen as the
-quantile of
are shown in Table 1. Table 2 shows the associated coverage of percentile bootstrap (all estimators) and asymptotic Wald-type confidence intervals (
only) and contrasts empirical standard errors with average bootstrap and asymptotic standard errors for the estimate
. Corresponding results for
chosen as the median are shown in Appendix E of supplementary material available at Biostatistics online.
Table 1.
Average bias and RMSE for
different estimators of
averaged over
data sets simulated under the
scenarios CR1 and CR2 for varying sample size
independent
or covariate-dependent censoring
respectively
and varying censoring rates
![]() |
![]() |
Censored before (%) |
![]() |
![]() |
![]() |
|---|---|---|---|---|---|
CR1: ,
| |||||
| 250 | 0 | 25 | 1.5 (4.0) | 0.1 (3.7) | 0.1 (3.7) |
| 250 | 0 | 50 | 3.9 (5.9) | 0.3 (4.8) | 0.3 (4.8) |
| 250 | 0 | 75 | 8.1 (10.0) | 2.6 (10.1) | 2.8 (9.5) |
| 250 | 1 | 25 | 1.0 (4.0) |
0.1 (3.9) |
0.1 (3.9) |
| 250 | 1 | 50 | 2.6 (5.4) |
0.3 (4.9) |
0.2 (5.8) |
| 250 | 1 | 75 | 5.1 (8.4) | 0.2 (8.2) |
1.9 (10.3) |
| 1000 | 0 | 25 | 1.4 (2.3) | 0.0 (1.9) | 0.0 (1.9) |
| 1000 | 0 | 50 | 3.8 (4.3) | 0.0 (2.3) | 0.0 (2.3) |
| 1000 | 0 | 75 | 7.9 (8.4) | 1.0 (6.4) | 1.2 (6.0) |
| 1000 | 1 | 25 | 0.9 (2.1) |
0.3 (1.9) |
0.0 (2.0) |
| 1000 | 1 | 50 | 2.5 (3.3) |
0.5 (2.3) |
0.1 (3.4) |
| 1000 | 1 | 75 | 5.0 (5.9) |
0.4 (3.8) |
1.8 (7.5) |
CR2: ,
| |||||
| 250 | 0 | 25 | 0.7 (1.8) | 0.1 (1.7) | 0.1 (1.7) |
| 250 | 0 | 50 | 1.5 (2.4) | 0.2 (2.1) | 0.2 (2.1) |
| 250 | 0 | 75 | 2.8 (3.7) | 0.7 (4.1) | 0.8 (3.7) |
| 250 | 1 | 25 | 0.6 (1.9) | 0.0 (1.8) | 0.1 (1.7) |
| 250 | 1 | 50 | 1.3 (2.5) |
0.1 (2.3) |
0.1 (2.3) |
| 250 | 1 | 75 | 2.2 (3.8) |
0.2 (4.4) |
0.3 (5.6) |
| 1000 | 0 | 25 | 0.7 (1.1) | 0.0 (0.8) | 0.0 (0.8) |
| 1000 | 0 | 50 | 1.4 (1.7) | 0.0 (1.0) | 0.0 (1.0) |
| 1000 | 0 | 75 | 2.7 (3.0) | 0.2 (2.8) | 0.3 (2.5) |
| 1000 | 1 | 25 | 0.6 (1.0) | 0.0 (0.9) | 0.1 (0.8) |
| 1000 | 1 | 50 | 1.3 (1.6) |
0.2 (1.2) |
0.1 (1.1) |
| 1000 | 1 | 75 | 2.1 (2.6) |
0.7 (2.4) |
0.7 (4.3) |
was chosen as the
-quantile of the marginal time-to-event distribution. Column
shows the expected proportion of right-censored event times amongst observations with
. Columns
–
show average bias
RMSE
for the three estimators
multiplied by
for easier readability
.
Table 2.
Coverage of confidence intervals for the same simulation scenarios as in Table 1
| Standard error KM |
Coverage |
||||||||
|---|---|---|---|---|---|---|---|---|---|
![]() |
![]() |
Censored before (%) |
Empirical | Average asymptotic | Average bootstrap | Asymptotic KM | Bootstrap naive | Bootstrap KM | Cox |
CR1: ,
| |||||||||
| 250 | 0 | 25 | 0.0372 | 0.0384 | 0.0384 | 94.5 | 91.6 | 94.2 | 94.3 |
| 250 | 0 | 50 | 0.048 | 0.0473 | 0.0478 | 93.4 | 83.6 | 93.6 | 93.2 |
| 250 | 0 | 75 | 0.0974 | 0.0707 | 0.077 | 77.8 | 67.3 | 81.2 | 82.1 |
| 250 | 1 | 25 | 0.0388 | 0.0392 | 0.0395 | 94.2 | 93.4 | 94.5 | 95.1 |
| 250 | 1 | 50 | 0.0486 | 0.0479 | 0.0486 | 93.8 | 90.1 | 94.1 | 93.5 |
| 250 | 1 | 75 | 0.0823 | 0.0696 | 0.0733 | 86.4 | 81.4 | 88.6 | 88.4 |
| 1000 | 0 | 25 | 0.0188 | 0.0192 | 0.019 | 95.4 | 88.9 | 95.1 | 95.2 |
| 1000 | 0 | 50 | 0.0234 | 0.0238 | 0.0238 | 95.1 | 59.6 | 95.3 | 95.1 |
| 1000 | 0 | 75 | 0.0635 | 0.0475 | 0.0499 | 80.7 | 23.2 | 81.9 | 82.6 |
| 1000 | 1 | 25 | 0.0193 | 0.0195 | 0.0195 | 94.9 | 91.6 | 94.6 | 94.3 |
| 1000 | 1 | 50 | 0.0229 | 0.0239 | 0.0239 | 95.4 | 81.6 | 95.5 | 93.4 |
| 1000 | 1 | 75 | 0.0376 | 0.0368 | 0.037 | 93.9 | 64.6 | 93.8 | 89.2 |
CR2: ,
| |||||||||
| 250 | 0 | 25 | 0.0168 | 0.0173 | 0.0174 | 95.1 | 91.4 | 95.0 | 95.0 |
| 250 | 0 | 50 | 0.0207 | 0.0208 | 0.021 | 94.3 | 84.2 | 94.5 | 94.4 |
| 250 | 0 | 75 | 0.0404 | 0.0343 | 0.0359 | 85.4 | 72.5 | 88.2 | 87.9 |
| 250 | 1 | 25 | 0.0181 | 0.0183 | 0.0186 | 95.0 | 92.2 | 95.3 | 95.2 |
| 250 | 1 | 50 | 0.0232 | 0.0235 | 0.024 | 94.6 | 87.9 | 95.2 | 95.9 |
| 250 | 1 | 75 | 0.0439 | 0.0384 | 0.0399 | 87.4 | 81.2 | 88.8 | 93.3 |
| 1000 | 0 | 25 | 0.00831 | 0.00859 | 0.00856 | 95.3 | 86.7 | 95.5 | 95.6 |
| 1000 | 0 | 50 | 0.0101 | 0.0103 | 0.0103 | 95.6 | 66.7 | 95.4 | 95.7 |
| 1000 | 0 | 75 | 0.0277 | 0.0218 | 0.0224 | 87.4 | 41.4 | 88.8 | 89.8 |
| 1000 | 1 | 25 | 0.00878 | 0.00905 | 0.00914 | 95.2 | 88.7 | 95.4 | 95.9 |
| 1000 | 1 | 50 | 0.0116 | 0.0118 | 0.0119 | 95.8 | 76.5 | 95.9 | 95.3 |
| 1000 | 1 | 75 | 0.0234 | 0.022 | 0.0221 | 93.7 | 67.6 | 94.2 | 94.0 |
Columns
–
display observed coverage of
percentile bootstrap confidence intervals for all three estimators
column
shows coverage of asymptotic Wald-type confidence intervals for
. Columns
–
show the empirical standard error for the
estimates and the average asymptotic and bootstrap standard errors of
.
Based on Tables 1 and 2, we draw the following conclusions.
The naive estimator
can be biased and this can lead to insufficient coverage.The ICPW estimator
can also be biased if the censoring depends on the covariate. In some cases,
has a smaller bias but for high rates of censoring it can do worse than
even though the censoring depends on the covariate.Coverage of confidence intervals for
and
was generally close to the nominal 95% except for some scenarios with high censoring rates of 75%. Both average asymptotic and bootstrap standard errors closely resembled the empirical standard errors of
.
5. Application to coronary risk prediction
Specialist medical societies recommend initiation of preventive treatment for CHD based on a subjects’ predicted 10-year risk for CHD (NCEP, 2002). To accurately predict the absolute risk of CHD in older people, prognostic models for CHD need to account for the competing risk of non-CHD death (Koller, Leening and others, 2012). In this section, we revisit the example of Wolbers and others (2009) on coronary risk prediction based on data of elderly women from the Rotterdam Study, a prospective, population-based cohort of elderly subjects living in a suburb area of Rotterdam, the Netherlands (Hofman and others, 2011).
We analyzed data from 10 years of follow-up of 4144 women aged between 55 and 90 years who were free of CHD at baseline. During that follow-up period, 389 women experienced a CHD event and 921 women died without prior CHD event. Only 41 women of those event-free had less than 10 years of follow-up. We randomly split the data set into a training data set (2763 women with 249 CHD events) and a validation data set (1381 women with 140 CHD events).
Using the training set, we estimated the parameters of a Fine–Gray regression model which included the “traditional” baseline risk factors for CHD: age, treatment for high blood pressure (yes versus no), systolic blood pressure (separate slopes depending on whether the subject was on blood pressure treatment or not), diabetes mellitus, log-transformed total cholesterol to HDL cholesterol ratio, and smoking status (current versus never or former smoker). All these risk factors were associated with an increased CHD risk and, except for diabetes, all reached conventional significance (
). We also investigated the role of age, the strongest predictor variable, as a simple marker for CHD.
Concordance estimates were obtained for these models in the validation set. The dependence of the censoring distribution on the covariates was investigated with a Cox regression model which yielded no trends and non-significant Wald tests for all variables. Thus, all IPCW estimates of concordance were based on the marginal Kaplan–Meier estimator for the censoring distribution from the validation set.
The left panel of Figure 1 shows the discrimination ability of the two Fine–Gray models for varying time horizons between 1 and 10 years. The model including all risk factors shows higher discriminative ability compared with the model based on age alone. For both models, concordance estimates stabilize after about 2.5 years of follow-up and remain fairly stable though slightly decreasing. The decrease may occur because earlier events are easier to predict than later events.
Fig. 1.

Left panel: IPCW estimates of
for the multiple Fine and Gray model (solid line) and the model with age as the only covariate (dashed line) for a follow-up duration of 1–10 years in the validation data. Error bars at 2.5, 5, 7.5, and 10 years of follow-up correspond to bootstrap standard errors. Right panel: time-dependent receiver operating characteristic curve at time
years for the multiple Fine and Gray model (solid line) and the model with age as the only covariate (dashed line) in the validation data. Cases were defined as subjects with
and
, controls as subjects with
or
.
The right panel of Figure 1 shows time-dependent ROC curves at time
10 years for the two models. For this graph, cases were defined as subjects with
and
, controls as subjects with
or
. Estimation was also based on IPCW-weighting as implemented in the R package
(Blanche and others, 2013).
Table 3 shows the estimated concordance for predicting CHD and non-CHD death, respectively, during the 10 years follow-up in the validation data. The Fine and Gray model for non-CHD death used the same covariates as the model for CHD. Age alone is a strong predictor for non-CHD death in this elderly population but the multiple Fine and Gray model did not substantially improve concordance. This is not surprising as most additional covariates are established CHD-specific risk factors which would not be expected to strongly affect non-CHD death (except for other deaths related to the cardiovascular system).
Table 3.
Estimated concordance and AUC measures in the validation data of the CHD study for both competing risks
in 
| CHD |
Non-CHD death |
|||||
|---|---|---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
| Age only | 63.7 | 72.3 | 64.3 | 75.7 | 80.8 | 78.0 |
| Fine–Gray | 71.6 | 79.4 | 72.4 | 76.2 | 81.3 | 78.6 |
Truncation point for concordance is
years. AUC measures are also reported at
years. Cumulative cases were defined for both AUC measures as subjects with
and D = “event type studied” (CHD or non-CHD death, respectively), controls as subjects with
for
and as subjects with
or D = “other event type” for
.
Table 3 also displays time-dependent AUC measures at
years in the validation data using two different definitions of controls (Blanche and others, 2013). AUCs with controls defined as subjects with
or
(consistently with our concordance definition) were slightly higher but comparable with the concordance whereas AUCs using only subjects with
as controls were substantially higher. This could be explained by the fact that age is a strong predictor of both CHD and non-CHD death which hampers discrimination of CHD events from non-CHD deaths.
6. Discussion
We have presented a formal definition of the concordance probability for prognostic models in the presence of competing risks. Like the concordance probability for survival or binary data, it provides a simple overall numeric measure of discrimination. To deal with right-censored data, we derived an IPCW estimator of the truncated concordance probability and established consistency and asymptotic normality under mild assumptions. Asymptotic properties of the proposed estimator rely on the assumption that the censoring distribution is correctly specified and conditionally independent of the competing risks process given covariates. In many applications, it will be reasonable to assume that the censoring mechanism does not depend on covariates and then the marginal Kaplan–Meier estimate of the censoring distribution can be used. However, if for some reason the design or conduct of a clinical study introduced a dependence between the follow-up duration and covariates which also affect the competing risks process (any component, including the cause-specific hazards of competing events), then it is recommended to use a working regression model for the censoring distribution in order to avoid biased conclusions.
The fact that we estimate a truncated version of concordance rather than the unconstrained concordance probability should not be seen as a limitation of our approach. Indeed it is impossible to assess the performance of prognostic models beyond the maximum follow-up duration without strong and untestable assumptions. If we assume independent censoring, our estimator is defined if we truncate at any time before or at the largest observed censoring or event time. As for the Kaplan–Meier estimator, the effect of censoring on the variability of the IPCW estimator is increasing with increasing truncation time. However, it is difficult to recommend a general purpose truncation time, in particular because the truncation time point influences the interpretation of the concordance probability. To avoid unstable results in practical applications, we recommend that the analyst develops an appropriate model for the censoring distribution, e.g. the Kaplan–Meier estimator or a Cox model, and then investigates the predicted probabilities of being uncensored at the candidate truncation times. Multiple truncation time points can be evaluated and it can be useful to compare discrimination ability at different truncation time points. A model which is good at discriminating patients with an early failure time (e.g. after surgery) from others may not be good at discriminating subsequent failure times amongst patients who survive a first high risk period.
As discussed in Appendix C of supplementary material available at Biostatistics online, our approach is related to time-dependent sensitivity, specificity, and ROC curves for competing risks (Saha and Heagerty, 2010) and the concordance can be written as a weighted average of the time-dependent
for incident cases (
,
) and controls defined as observations with
or
. Thus, the concordance serves as an overall summary of discrimination whereas the time-dependent AUC measures discrimination of the event status at one specific time point.
We assumed that the prognostic model was derived on an independent training data set and only in this setting are asymptotic or bootstrap confidence intervals for the truncated concordance readily available. Clearly, independent training data are not always available, and even if they are, a joint analysis of all data will be more efficient. However, some form of internal cross-validation is needed to develop and assess a prognostic model with a single data set (Efron and Tibshirani, 1997; Gerds and others, 2008; Hastie and others, 2009).
It is important to emphasize that our definition of concordance assesses a prognostic model for the absolute risk of the event of interest in the presence of competing risks. In line with earlier work (Gail and Pfeiffer, 2005; Wolbers and others, 2009), we regard this risk as crucial for medical decision-making in the competing risks setting. However, in many instances explicit consideration of competing events will also be important and modeling the entire competing risks multi-state process will provide further insights (Beyersmann and others, 2007). As an example, our illustration of concordance for a single marker (presented in supplementary material available at Biostatistics online) shows that discrimination of prognostic models for the event of interest is hampered if covariates affect both cause-specific hazards with regression coefficients of the same sign, especially if there is a strong association with the competing risk or if the baseline competing hazard is high. This indicates that to achieve high discrimination ability one needs predictors which are only weakly or, even better, reversely associated with the cause-specific hazard of the competing event. Moreover, in settings where all competing events are of similar importance, joint accuracy criteria for the entire competing risks multi-state process are needed and their development is an important area for future research.
Supplementary material
Supplementary material is available at http://biostatistics.oxfordjournals.org.
Funding
M.W. was supported by the Wellcome Trust and the Li Ka Shing Foundation—University of Oxford Global Health Programme. Funding to pay the Open Access publication charges for this article was provided by the Wellcome Trust.
Supplementary Material
Acknowledgments
Computer time for this study was provided by the computing facilities MCIA (Mésocentre de Calcul Intensif Aquitain) of the Université de Bordeaux and of the Université de Pau et des Pays de l’Adour. Conflict of Interest: None declared.
References
- Bender R., Augustin T., Blettner M. Generating survival times to simulate Cox proportional hazards models. Statistics in Medicine. 2005;24:1713–1723. doi: 10.1002/sim.2059. [DOI] [PubMed] [Google Scholar]
- Beyersmann J., Dettenkofer M., Bertz H., Schumacher M. A competing risks analysis of bloodstream infection after stem-cell transplantation using subdistribution hazards and cause-specific hazards. Statistics in Medicine. 2007;26:5360–5369. doi: 10.1002/sim.3006. [DOI] [PubMed] [Google Scholar]
- Bickel P. J., Klaassen C. A., Ritov Y., Wellner J. A. Efficient and Adaptive Estimation for Semiparametric Models. Baltimore: Johns Hopkins University Press; 1993. [Google Scholar]
- Blanche P., Dartigues J.-F., Jacqmin-Gadda H. Estimating and comparing time-dependent areas under receiver operating characteristic curves for censored event times with competing risks. Statistics in Medicine. 2013;32:5381–5397. doi: 10.1002/sim.5958. [DOI] [PubMed] [Google Scholar]
- Cheng S. C., Fine J. P., Wei L. J. Prediction of cumulative incidence function under the proportional hazards model. Biometrics. 1998;54:219–228. [PubMed] [Google Scholar]
- Efron B., Tibshirani R. Improvements on cross-validation: the .632+ bootstrap method. Journal of the American Statistical Association. 1997;92:548–560. [Google Scholar]
- Fine J. P., Gray R. J. A proportional hazards model for the subdistribution of a competing risk. Journal of the American Statistical Association. 1999;446:496–509. [Google Scholar]
- Gail M. H., Pfeiffer R. M. On criteria for evaluating models of absolute risk. Biostatistics. 2005;6:227–239. doi: 10.1093/biostatistics/kxi005. [DOI] [PubMed] [Google Scholar]
- Gerds T. A., Cai T., Schumacher M. The performance of risk prediction models. Biometrical Journal. 2008;50:457–479. doi: 10.1002/bimj.200810443. [DOI] [PubMed] [Google Scholar]
- Gerds T. A., Kattan M. W., Schumacher M., Yu C. Estimating a time-dependent concordance index for survival prediction models with covariate dependent censoring. Statistics in Medicine. 2013;32:2173–2184. doi: 10.1002/sim.5681. [DOI] [PubMed] [Google Scholar]
- Gerds T. A., Scheike T. H., Andersen P. K. Absolute risk regression for competing risks: interpretation, link functions, and prediction. Statistics in Medicine. 2012;31:3921–3930. doi: 10.1002/sim.5459. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grunkemeier G. L., Jin R., Eijkemans M. J., Takkenberg J. J. Actual and actuarial probabilities of competing risks: apples and lemons. Annals of Thoracic Surgery. 2007;83:1586–1592. doi: 10.1016/j.athoracsur.2006.11.044. [DOI] [PubMed] [Google Scholar]
- Harrell F. E., Califf R. M., Pryor D. B., Lee K. L., Rosati R. A. Evaluating the yield of medical tests. Journal of the American Medical Association. 1982;247:2543–2546. [PubMed] [Google Scholar]
- Hastie T., Tibshirani R., Friedman J. H. The Elements of Statistical Learning. 2nd edition. New York: Springer; 2009. Springer Series in Statistics. [Google Scholar]
- Heagerty P. J., Zheng Y. Survival model predictive accuracy and ROC curves. Biometrics. 2005;61:92–105. doi: 10.1111/j.0006-341X.2005.030814.x. [DOI] [PubMed] [Google Scholar]
- Hofman A., van Duijn C. M., Franco O. H., Ikram M. A., Janssen H. L., Klaver C. C., Kuipers E. J., Nijsten T. E., Stricker B. H., Tiemeier H. The Rotterdam Study: 2012 objectives and design update. European Journal of Epidemiology. 2011;26:657–686. doi: 10.1007/s10654-011-9610-5. and others. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Koller M. T., Leening M. J., Wolbers M., Steyerberg E. W., Hunink M. G., Schoop R., Hofman A., Bucher H. C., Psaty B. M., Lloyd-Jones D. M. Development and validation of a coronary risk prediction model for older U.S. and European persons in the cardiovascular health study and the Rotterdam Study. Annals of Internal Medicine. 2012;157:389–397. doi: 10.7326/0003-4819-157-6-201209180-00002. and others. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Koller M. T., Raatz H., Steyerberg E. W., Wolbers M. Competing risks and the clinical community: irrelevance or ignorance? Statistics in Medicine. 2012;31:1089–1097. doi: 10.1002/sim.4384. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mogensen U. B., Ishwaran H., Gerds T. A. Evaluating random forests for survival analysis using prediction error curves. Journal of Statistical Software. 2012;50:1–23. doi: 10.18637/jss.v050.i11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- NCEP. Third Report of the National Cholesterol Education Program (NCEP) Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (Adult Treatment Panel III) final report. Circulation. 2002;106:3143–3421. [PubMed] [Google Scholar]
- Putter H., Fiocco M., Geskus R. B. Tutorial in biostatistics: competing risks and multi-state models. Statistics in Medicine. 2007;26:2389–2430. doi: 10.1002/sim.2712. [DOI] [PubMed] [Google Scholar]
- Saha P., Heagerty P. J. Time-dependent predictive accuracy in the presence of competing risks. Biometrics. 2010;66:999–1011. doi: 10.1111/j.1541-0420.2009.01375.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Scheike T., Zhang M. J., Gerds T. A. Predicting cumulative incidence probability by direct binomial regression. Biometrika. 2008;95:205–220. [Google Scholar]
- Schoop R., Beyersmann J., Schumacher M., Binder H. Quantifying the predictive accuracy of time-to-event models in the presence of competing risks. Biometric Journal. 2011;53:88–112. doi: 10.1002/bimj.201000073. [DOI] [PubMed] [Google Scholar]
- Uno H., Cai T., Pencina M. J., D'Agostino R. B., Wei L. J. On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Statistics in Medicine. 2011;30:1105–1117. doi: 10.1002/sim.4154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wolbers M., Koller M. T., Witteman J. C., Steyerberg E. W. Prognostic models with competing risks: methods and application to coronary risk prediction. Epidemiology. 2009;20:555–561. doi: 10.1097/EDE.0b013e3181a39056. [DOI] [PubMed] [Google Scholar]
- Yan G., Greene T. Investigating the effects of ties on measures of concordance. Statistics in Medicine. 2008;27:4190–4206. doi: 10.1002/sim.3257. [DOI] [PubMed] [Google Scholar]
- Zheng Y., Cai T., Jin Y., Feng Z. Evaluating prognostic accuracy of biomarkers under competing risk. Biometrics. 2012;68:388–396. doi: 10.1111/j.1541-0420.2011.01671.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
































































