Summary
The net reclassification improvement (NRI) and the integrated discrimination improvement (IDI) were originally proposed to characterize accuracy improvement in predicting a binary outcome, when new biomarkers are added to regression models. These two indices have been extended from binary outcomes to multi-categorical and survival outcomes. Working on an AIDS study where the onset of cognitive impairment is competing risk censored by death, we extend the NRI and the IDI to competing risk outcomes, by using cumulative incidence functions to quantify cumulative risks of competing events, and adopting the definitions of the two indices for multi-category outcomes. The “missing” category due to independent censoring is handled through inverse probability weighting. Various competing risk models are considered, such as the Fine and Gray, multistate, and multinomial logistic models. Estimation methods for the NRI and the IDI from competing risk data are presented. The inference for the NRI is constructed based on asymptotic normality of its estimator, and the bias-corrected and accelerated bootstrap procedure is used for the IDI. Simulations demonstrate that the proposed inferential procedures perform very well. The Multicenter AIDS Cohort Study is used to illustrate the practical utility of the extended NRI and IDI for competing risk outcomes.
Keywords: Cumulative incidence function, Fine and Gray’s model, Integrated discrimination improvement, Multinomial logistic model, Multistate model, Net reclassification improvement
1. Introduction
For clinicians, introducing a new biomarker into a statistical model may change the risks associated with various outcomes of interest and subsequently may influence treatment decisions. Risk prediction algorithms using statistical modeling are among the most popular tools to evaluate significance of biomarkers. Although effect size and statistical significance are important, they do not provide direct information on the contribution of new biomarkers to diagnostic accuracy. For the latter, we are interested in evaluating the improvement in correctly “classifying” patients into several outcome categories, such as dementia, death, and “nonevent,” with the additional information from new biomarkers. In contrast, risk prediction algorithms typically attempt to predict the risks associated with each outcome in the course of time. To investigate accuracy improvement over the course of variable additions for binary outcomes, the commonly used receiver operating characteristic curve and its corresponding area under the curve (AUC) were shown to be insensitive to detecting the added values of new markers (Greenland and O’Malley, 2005; Pepe and others, 2004; Ware, 2006), and novel indicators were developed to complement the AUC measure (Pencina and others, 2008), such as the net reclassification improvement (NRI) and the integrated discrimination improvement (IDI). The NRI is the improvement in classification rates of disease categories by the “new” model which incorporates additional markers over those by the “old” model without the additional markers. On the other hand, the IDI quantifies the improvement in the integrated sensitivity minus that of specificity over all possible cutoff values from the model without new biomarkers to the model with new biomarkers. Both indices have become popular in medical fields and been extended from categorical outcomes to survival outcomes (Pencina and others, 2011; Uno and others, 2013).
However, there are few works in quantifying accuracy improvement for competing risk outcomes. Shi and others (2014) were among the first to consider accuracy improvement for competing risks, where the population is divided into two groups at a fixed time point—the “disease” group including participants who have developed the event of interest, and the “healthy” group including those who have not had any event and those who have experienced competing events. Such a definition of the “healthy” group, which is in line with the augmented “at-risk” set in a popular regression model by Fine and Gray (1999) for competing risk data, is reasonable if competing events are not of interest, and those who have developed competing events are more or less similar to those who have not failed yet. However, there are many situations where we would like to separate participants with competing events from those without any events. As an example, consider an analysis of data from the Multicenter AIDS Cohort Study (MACS) which involves two endpoints, death and dementia, where dementia onset may be competing risk censored by death. When the dementia onset is of concern, it does not seem appropriate to group those participants who died with those who were alive and stayed healthy. Ideally they could be treated as separate categories in evaluation of accuracy improvement.
Li and others (2013b) proposed reclassification statistics for assessing improvements in diagnostic accuracy for multi-level outcomes with discussion (Janes, 2013; Li and others, 2013a). Here, we specifically consider how the definitions of the NRI and the IDI for multi-category outcomes can be extended to the competing risk setting where one event prevents others from occurring. The detailed definitions are given in Sections 2.2 and 2.3 for two competing risk outcomes. One issue with estimating the adapted NRI and IDI is that independent censoring often occurs in addition to competing risk censoring, and a participant’s disease status may not be known if this participant was censored before the time of interest. As detailed in Sections 2.2 and 2.3, the “missingness” due to censoring can be overcome by using the method of inverse probability of censoring weighting.
Though the focus of this project is to evaluate diagnostic accuracy, it remains crucial to select a proper regression model to distinguish all survival outcomes and identify covariate effects on each outcome at different time points. In this work, we adopted three models, Cox’s regression, Fine–Gray’s model (Fine and Gray, 1999), and the multinomial logistic risk regression model (Gerds and others, 2012). Three simulation designs were considered in Section 3. For each of these three models, two data designs were examined, with and without an added covariate improving diagnostic accuracy. In Section 4, we applied both NRI and IDI estimators to the MACS data for assessing whether including a new biomarker, e.g., CD4+ cell count, would improve predictive ability over the old model. Some discussion is given in Section 5.
2. Methods
2.1. Notation
In a competing risk setting, there are two or more types of events. To simplify the
notation, only two types are considered here, which are denoted as
, although the proposed
methods can be naturally extended to more than two competing events. Let
be the time to the first event of either
type. With two competing events, we can define three categories according to their disease
status at a fixed time point
. For the ith
participant, if
and
, the participant belongs
to the first category; if
and
, the participant belongs
to the second category, otherwise the participant is in the third category of being
“healthy.” In practice, independent censoring
often exits. Hence,
and the combined cause
indicator
are observed.
Let
, a p-dimension
vector, denote conventional predictors and let
, a
q-dimension vector, denote new biomarkers. The data consist of
.
In the sequel, we denote the “old” model with conventional markers as
and the “new” model with
both conventional and new markers as
.
An extension of the NRI in Li and others (2013b) for a K-level categorical outcome D is
where
is a weight function for the
th category of the outcome and
, and
is the estimated
probability of the outcome from the
th category based on
the model
for
. Let
denote the vector of the estimated probabilities. When there are only two categories,
, and the weights are
for
, then the
is equivalent to the NRI given in Pencina and others (2008). Li and others (2013b) also proposed an
extension of the IDI based on the relationship between the IDI and the increase in the
coefficient of determination
from the “old” multinomial logistic
model to the “new” one with additional markers. That is,
where
is again a weight function for
the
-th category of the outcome, and
is the coefficient of
determination from
,
. Again when
and
, the multi-category IDI
reduces to the original IDI in Pencina and
others (2008).
2.2. Net reclassification improvement for competing risk outcomes
Without loss of generality, we consider competing risk outcomes with three categories as
defined in Section 2.1. For model
, define
,
for
, and
.
A well-calibrated regression model such as the multi-state (Cheng and others, 1998), Fine
and Gray (1999), and Gerds and
others (2012) models can be used. For each participant
, we obtain the estimators
,
,
, and
.
The NRI for multi-category outcomes can thus be extended to the competing risk setting at
any
:
![]() |
(2.1) |
One complication in estimating
with censored
competing risk data is that not every participant status is available. For example, some
participants may have been censored before
, and their disease
status cannot be determined. Therefore, those participants whose disease status can be
decided based on the observed pair
should be
properly weighted to account for those with “missing” disease status due to censoring.
Thus, we propose the following estimator of the NRI at any time point
as:
where
, are weight functions
for the three disease categories and can be simply set to be
if
there is no prior on the categories, and
is the
Kaplan–Meier estimator of the censoring survival function. For each category,
,
is an indicator function
whether the “old” model
makes a wrong prediction on
Category
for the
th
participant while the “new”
correctly identifies it.
Conversely,
indicates whether the
“new” model changes a right prediction from the “old” model.
The consistency and asymptotic normality of
are given in
Section A of the Appendix of the Supplementary material available at Biostatistics online,
and the variance estimate can be obtained from the influence function provided by (A.1) of
the Supplementary material available at Biostatistics
online.
2.3. Integrated discrimination improvement for competing risk outcomes
We first define the time-dependent IDI for competing risk outcomes by adapting the
definition for multi-category outcomes proposed by Li
and others (2013b). The IDI for multi-category outcomes is
defined as a weighted sum of the increased variabilities that are explained by the “new”
model over the “old” model. The explained variabilities are captured by the coefficients
of determination
for cause
,
, from the “new” model when
and from the “old” model when
.
are closely connected
to the probabilities of each category. Thus, we extend the IDI for competing risk outcomes
at time
as:
![]() |
(2.2) |
where
are again some
weight functions. The estimation of the IDI at time
involves the evaluation of
, which is the
proportion of variability in the
-th category that is
explained by model
, for
and
.
Without any covariates, we estimate the probability of falling into the
th category by
, where
,
with
and
.
Hence, the variance without any model is
. With
model
, the variance can be
estimated by
,
where
.
Therefore, we propose the following estimator of the IDI at time
:
Section B of the Appendix of the Supplementary material available at Biostatistics online
shows the consistency and asymptotic normality of
. Based on equation (A.2) of
the Supplementary material available at Biostatistics online,
we can estimate the influence function
to compute its
variance estimate using the sample. However, the IDI estimator relies on the estimated
probabilities from a particular competing risk model, and the asymptotic variance will
change if another model is used. Some competing risk models have well-defined influence
functions
, while others do not
have explicit forms. As a result, it is difficult to obtain an explicit form of variance
estimation for the IDI with various competing risk models of choice. Bootstrap procedures
provide an alternative method of inference. Confidence intervals can be constructed based
on asymptotic normality and bootstrap standard error or by selecting percentiles of the
statistics computed from bootstrapped samples. However, the skewness and the bias in the
bootstrap distribution may lead to misleading results. Thus, we propose to use a
bias-corrected and accelerated (BCa) bootstrap procedure (Efron, 1987; Efron and Tibshirani, 1993)
to obtain confidence intervals for the IDI, which has been shown by Shi and others (2014) to perform better than the
former two for dichotomized outcomes. More specifically, we first bootstrap the original
sample multiple times to obtain the difference (“bias”) between the median of estimators
from all bootstrapped samples and the initial estimator from the original sample. Then, we
use the jackknife approach to calculate the acceleration factor for measuring skewness. At
the end, percentiles are calculated based on the bias-correction factor and the
acceleration parameter to obtain the confidence intervals of the IDI estimator
(Efron and Tibshirani, 1993).
As suggested by Blanche and others
(2013), the independent censoring assumption could be restrictive in practice,
and conditional independence of censoring given biomarkers would make the extended NRI and
IDI more general by allowing the risks of censoring to be correlated with the biomarkers.
Consequently, the Kaplan–Meier estimator
used in inverse
probability of censoring weighting in equations (2.2) and (2.3) can be replaced with a
conditional survival estimator (Blanche and
others, 2013). However, asymptotic theory will become much more
complex and BCa bootstrap can be used for inference.
Though the main purpose of the proposed NRI and IDI is to evaluate the effect of new biomarkers on diagnostic accuracy rather than on a competing risk model itself, the performance of the NRI and IDI relies on estimated probabilities from the competing risk model. As a result, model diagnostics are important before applying the NRI and IDI. In our data analysis, we used the extended Brier score for model calibration (Schoop and others, 2011). Pencina and others (2011) also suggested applying cross-validation (CV) to account for over-optimism in evaluating diagnostic measures based on a fitted model. The probabilities of each outcome are computed from a cross-validated sample, which can then be used to calculate the NRI and IDI. In the simulation studies, we examine how cross-validation impacts these two measures. Due to the complexity of asymptotic theory with CV, we propose to use BCa bootstrap for obtaining confidence intervals for the proposed estimators.
3. Simulation studies
In practice, we usually do not know the “right” model, and there is a chance that we could
pick a reasonable yet incorrect model for our data. Thus, we need to evaluate the impact of
model choices on the performance of accuracy improvement evaluation with new biomarkers
included. Here, we first designed three different sets of data with respect to three popular
competing risk models, including multi-state, Fine and Gray, and multinomial logistic
models, to examine the proposed estimators for the extended NRI and IDI in competing risk
settings. Three covariates were used in all three designs, where
and
were generated from the standard normal
distribution, truncated at
to prevent extreme values, and
was generated from a Bernoulli (0.7)
distribution. The three cases of data were simulated as follows:
Case 1. We simulated the event time from a Weibull model with three covariates,
![]() |
where
was generated from the standard extreme
value distribution. This error distribution gives the proportional hazard interpretations
for all covariates. We set
,
,
,
and
. Since the coefficient for the new
marker
is three times the size of the
coefficients
and
for two conventional predictors, we expect that the “new” model including
,
and
would have improved predictive ability
over the “old” model that only uses
and
. The cause indicators,
, were generated with equal
probability. The censoring time was simulated from uniform
and uniform
distribution for 30% and 50%
censoring.
Case 2. We used a simulation design similar to the one proposed by Fine and Gray (1999) in this case. The subdistribution for cause 1 is defined by
![]() |
with a mass of
when
is at
and
all covariates are zeros. When a uniform random number exceeds
, participants are
assumed to experience the cause 2 event with the conditional probability
![]() |
We set
,
,
,
,
,
, and
.
Including
in the model, in addition to
and
, is
expected to improve prediction over the one not including
. The
censoring distribution follows uniform
and uniform
with 30% and 50% censoring.
Case 3. We considered a multinomial logistic regression model (Gerds and others, 2012). Define
,
. For cause
,
logistic-transformed probabilities were set as
![]() |
where
was set to be
,
,
and
. Since
is twice the size of
and
,
we expect the new model including
to have a better
predictive ability than the old one using only
and
. The event time was simulated by
inverting the survival probability, and cause indicators were assigned with equal
probabilities. Independent censoring time was simulated from a uniform
distribution for 30% censoring and
from a uniform
distribution for 50%
censoring.
According to Demler and others (2017), if the models are under alternatives (Cases 1–3), both the NRI and the IDI are non-degenerate and variance estimators based on the U-statistics theory should work, though some adjustments are needed for the IDI. The bootstrap technique is also valid under this situation. On the other hand, if the null model is true and nested within the alternative model, both the NRI and the IDI are degenerate and the theoretical formulas for variance do not apply. This raises special concerns in practice, because, in evaluating the accuracy improvement associated with new biomarkers, we are comparing the “new” model with the additional variables and the “old” model without them. In light of Demler and others (2017), we also want to examine the robustness of the inferential procedures for the NRI and the IDI under the null, where adding the new biomarker into the “old” model does not improve the predictive ability. Thus, we consider the following three scenarios:
Case 4. Similar to Case 1, we set
,
,
,
and
. Censoring time was simulated from
uniform [3, 32] for 30% censoring, and from uniform [1, 21] for 50% censoring.
Case 5. Similar to Case 2, we set
,
,
,
,
,
, and
.
The censoring distributions followed a uniform
distribution for
30% censoring and from uniform
for 50% censoring.
Case 6. Similar to Case 3, we set
,
and
. Independent censoring time was
simulated from uniform [0, 32] for 30% censoring and from uniform [0, 29.2] for 50%
censoring.
The cumulative incidence functions (CIFs) under different covariate configurations and the
six cases are illustrated in Figure S1 of the Supplementary material available at Biostatistics online. We
can see that probabilities depend on covariate values, and true NRIs and IDIs are difficult
to obtain for Cases 1, 2, and 3. Thus, 1000 samples of
size 1000 without censoring are used to calculate true values. Under Cases 4,
5, and 6, we expect that the predictive ability of the “new”
model would not be improved. Thus, the true NRI and IDI are zero. For each case, we
generated 1000 samples of size 400 and applied all three models (i.e., Cox’s proportional
hazard model, Fine–Gray’s subdistribution hazard model, and Gerds’ multinomial logistic risk
regression) without CV. Probabilities
for each cause
(
) and survival
at chosen time points were obtained
with each model and then used for the NRI and IDI calculation. Cox regression and
Fine–Gray’s models estimate the CIF for each cause separately, while Gerds’ model estimates
CIFs for both causes simultaneously. We built confidence intervals (CIs) for the NRI based
on (A.1) of the Supplementary material available at Biostatistics online and
compared it with the CIs using BCa bootstrapping. For the IDI, we also calculated the CIs
using BCa bootstrapping. The simulations were run in R, where
packages survival, lme4, and
cmprsk were used for competing risk modeling. The simulation
results for the NRI and the IDI from Cases 1, 2, and 3 under 30% censoring, in which model
predictive ability should improve with the “new” marker, are shown in Tables 1 and 2,
respectively. Tables 3 and 4 summarize the simulation results for the NRI and the IDI under Cases 4,
5, and 6 with 30% censoring when the added covariate does not improve prediction
accuracy.
Table 1.
NRI and IDI simulation results when the added covariate improves predictability (30%
censoring). Results from correct models are given in bold. True means
and
at
times
were calculated using 1000
samples with size 1000. 1000 samples with size 400 each was used to calculate the sample
means
,
and empirical standard errors
,
.
and
are the means of
estimated standard deviations for
and bootstrap
standard deviations (with 1,000 bootstrap samples) for
. Coverage rate
=(count of true NRI
entering the intervals
)/1000.
Bias is the difference between sample mean and true mean. Coverage rate
or
=(count of true
value entering the 95% BCa bootstrap intervals)/1000.
| NRI | Cox regression | Fine–Gray | Gerds | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
||
| Weibull (Case 1) | ||||||||||
|
0.114 | 0.100 | 0.126 | 0.114 | 0.100 | 0.126 | 0.114 | 0.100 | 0.126 | |
|
0.002
|
0.005 |
0.001
|
-0.056 | 0.005 | -0.026 | -0.006 | 0.004 | -0.005 | |
|
0.027 | 0.030 | 0.031 | 0.027 | 0.029 | 0.033 | 0.025 | 0.030 | 0.031 | |
|
0.024 | 0.030 | 0.030 | 0.016 | 0.028 | 0.026 | 0.024 | 0.030 | 0.030 | |
|
0.935 | 0.944 | 0.943 | 0.346 | 0.926 | 0.754 | 0.930 | 0.945 | 0.937 | |
|
0.918 | 0.898 | 0.899 | 0.621 | 0.869 | 0.855 | 0.910 | 0.896 | 0.904 | |
|
|
|
|
|
|
|
|
|
||
| Fine–Gray (Case 2) | ||||||||||
|
0.109 | 0.127 | 0.123 | 0.109 | 0.127 | 0.123 | 0.109 | 0.127 | 0.123 | |
|
0.005 | 0.004 | 0.004 | 0.006 | 0.001 | 0.003 | 0.012 | 0.005 | 0.003 | |
|
0.035 | 0.033 | 0.028 | 0.038 | 0.035 | 0.030 | 0.035 | 0.033 | 0.030 | |
|
0.032 | 0.029 | 0.025 | 0.033 | 0.032 | 0.026 | 0.032 | 0.028 | 0.025 | |
|
0.919 | 0.911 | 0.902 | 0.911 | 0.918 | 0.909 | 0.907 | 0.886 | 0.873 | |
|
0.908 | 0.900 | 0.905 | 0.910 | 0.898 | 0.914 | 0.882 | 0.914 | 0.912 | |
|
|
|
|
|
|
|
|
|
||
| Gerds (Case 3) | ||||||||||
|
0.209 | 0.189 | 0.169 | 0.209 | 0.189 | 0.169 | 0.209 | 0.189 | 0.169 | |
|
-0.001 | 0.004 | 0.007 | -0.144 | -0.046 | -0.004 |
0.004
|
0.003
|
0.003 | |
|
0.027 | 0.031 | 0.031 | 0.029 | 0.028 | 0.029 | 0.028 | 0.031 | 0.031 | |
|
0.025 | 0.027 | 0.030 | 0.015 | 0.020 | 0.027 | 0.026 | 0.027 | 0.030 | |
|
0.924 | 0.897 | 0.933 | 0 | 0.435 | 0.933 | 0.929 | 0.920 | 0.937 | |
|
0.914 | 0.918 | 0.897 | 0.019 | 0.757 | 0.894 | 0.912 | 0.898 | 0.889 | |
| IDI | Cox regression | Fine–Gray | Gerds | |||||||
|
|
|
|
|
|
|
|
|
||
| Weibull (Case 1) | ||||||||||
|
0.109 | 0.106 | 0.099 | 0.109 | 0.106 | 0.099 | 0.109 | 0.106 | 0.099 | |
|
0.001 | 0.002 | 0.001 | -0.091 | -0.081 | -0.065 | -0.008 | -0.014 | -0.020 | |
|
0.016 | 0.016 | 0.015 | 0.005 | 0.006 | 0.008 | 0.017 | 0.016 | 0.016 | |
|
0.017 | 0.016 | 0.016 | 0.007 | 0.008 | 0.010 | 0.018 | 0.017 | 0.016 | |
|
0.953 | 0.952 | 0.946 | 0 | 0 | 0.008 | 0.935 | 0.873 | 0.810 | |
|
|
|
|
|
|
|
|
|
||
| Fine–Gray (Case 2) | ||||||||||
|
0.151 | 0.158 | 0.165 | 0.151 | 0.158 | 0.165 | 0.151 | 0.158 | 0.165 | |
|
-0.023 | -0.021 | -0.007 |
0.003
|
0.003
|
0.003
|
-0.116 | -0.113 | -0.110 | |
|
0.020 | 0.021 | 0.022 | 0.020 | 0.021 | 0.021 | 0.014 | 0.015 | 0.016 | |
|
0.021 | 0.022 | 0.023 | 0.021 | 0.021 | 0.022 | 0.015 | 0.017 | 0.017 | |
|
0.661 | 0.730 | 0.781 | 0.946 | 0.946 | 0.950 | 0.012 | 0.002 | 0.002 | |
|
|
|
|
|
|
|
|
|
||
| Gerds (Case 3) | ||||||||||
|
0.288 | 0.271 | 0.251 | 0.288 | 0.271 | 0.251 | 0.288 | 0.271 | 0.251 | |
|
-0.022 | -0.020 | -0.017 | -0.254 | -0.226 | -0.193 | 0.001 | 0.001 | 0.001 | |
|
0.026 | 0.024 | 0.023 | 0.007 | 0.008 | 0.010 | 0.020 | 0.018 | 0.018 | |
|
0.027 | 0.024 | 0.023 | 0.010 | 0.011 | 0.012 | 0.023 | 0.020 | 0.019 | |
|
0.512 | 0.606 | 0.732 | 0 | 0 | 0 | 0.947 | 0.958 | 0.949 | |
Table 2.
Simulation details for the IDI and IAUC from Shi
and others (2014) when the added covariate improves
predictability (30% censoring). Results for each case were obtained with correct models
specified. 1000 samples with size 400 each was used to calculate the empirical standard
error
and sample means
and
.
| Weibull (Case 1) | Fine–Gray (Case 2) | Gerds (Case 3) | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
||
| Cause 1 Shi and others (2014) | ||||||||||
|
0.097 | 0.099 | 0.096 | -0.0001 | -0.0000 | 0.0001 | 0.267 | 0.253 | 0.236 | |
|
0.032 | 0.033 | 0.034 | 0.002 | 0.001 | 0.002 | 0.041 | 0.039 | 0.038 | |
|
0.359 | 0.382 | 0.411 | 0.0007 | -0.0002 | 0.0005 | 0.688 | 0.679 | 0.674 | |
|
0.080 | 0.084 | 0.099 | 0.017 | 0.012 | 0.015 | 0.053 | 0.058 | 0.067 | |
|
|
|
|
|
|
|
|
|
||
| Cause 2 Shi and others (2014) | ||||||||||
|
0.098 | 0.101 | 0.097 | -0.0001 | 0.0000 | 0.0000 | 0.269 | 0.253 | 0.235 | |
|
0.031 | 0.032 | 0.034 | 0.001 | 0.001 | 0.002 | 0.042 | 0.041 | 0.041 | |
|
0.362 | 0.387 | 0.413 | -0.0001 | 0.0004 | -0.001 | 0.690 | 0.682 | 0.671 | |
|
0.077 | 0.085 | 0.098 | 0.009 | 0.011 | 0.016 | 0.053 | 0.059 | 0.069 | |
|
|
|
|
|
|
|
|
|
||
| Both causes our methods | ||||||||||
|
0.112 | 0.102 | 0.125 | 0.115 | 0.128 | 0.126 | 0.205 | 0.186 | 0.172 | |
|
0.027 | 0.030 | 0.031 | 0.038 | 0.035 | 0.030 | 0.028 | 0.031 | 0.031 | |
|
0.110 | 0.108 | 0.100 | 0.148 | 0.155 | 0.162 | 0.289 | 0.272 | 0.252 | |
|
0.016 | 0.016 | 0.015 | 0.020 | 0.021 | 0.021 | 0.020 | 0.018 | 0.018 | |
Table 3.
NRI and IDI simulation results when the added covariate does not improve predictability
(30% censoring). Results from correct models are given in bold. 1000 samples with size
400 each was used to calculate the sample means
,
and empirical standard errors
,
.
and
are the means of
estimated standard deviations for
and bootstrap
standard deviations (with 1,000 bootstrap samples) for
. Coverage rate
=(count of true NRI
entering the intervals
)/1000,
and coverage rate after unnesting the models is denoted by
. Bias
is the difference between sample mean and true mean (0). Coverage rate
or
=(count of true
value entering the 95% BCa bootstrap intervals)/1000.
| NRI | Cox regression | Fine–Gray | Gerds | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
||
| Weibull (Case 4) | ||||||||||
|
0.005 | 0.005 | 0.005 | 0.005 | 0.004 | 0.003 | 0.004 | 0.004 | 0.004 | |
|
0.015 | 0.015 | 0.015 | 0.013 | 0.015 | 0.015 | 0.015 | 0.015 | 0.016 | |
|
0.013 | 0.014 | 0.014 | 0.010 | 0.013 | 0.015 | 0.014 | 0.014 | 0.015 | |
|
0.915 | 0.919 | 0.939 | 0.845 | 0.918 | 0.940 | 0.931 | 0.949 | 0.935 | |
|
0.949 | 0.941 | 0.945 | 0.913 | 0.933 | 0.957 | 0.940 | 0.927 | 0.933 | |
|
0.855 | 0.845 | 0.834 | 0.835 | 0.860 | 0.868 | 0.842 | 0.817 | 0.859 | |
|
|
|
|
|
|
|
|
|
||
| Fine Gray (Case 5) | ||||||||||
|
0.005 | 0.004 | 0.002 | 0.005 | 0.004 | 0.002 | 0.006 | 0.004 | 0.002 | |
|
0.016 | 0.015 | 0.012 | 0.015 | 0.015 | 0.011 | 0.016 | 0.016 | 0.011 | |
|
0.014 | 0.014 | 0.010 | 0.013 | 0.013 | 0.010 | 0.014 | 0.014 | 0.010 | |
|
0.915 | 0.934 | 0.891 | 0.913 | 0.913 | 0.897 | 0.916 | 0.919 | 0.918 | |
|
0.939 | 0.946 | 0.951 | 0.946 | 0.955 | 0.952 | 0.937 | 0.942 | 0.959 | |
|
0.902 | 0.908 | 0.882 | 0.903 | 0.917 | 0.898 | 0.889 | 0.872 | 0.901 | |
|
|
|
|
|
|
|
|
|
||
| Gerds (Case 6) | ||||||||||
|
0.005 | 0.005 | 0.005 | 0.004 | 0.007 | 0.005 | 0.005 | 0.006 | 0.006 | |
|
0.016 | 0.016 | 0.016 | 0.010 | 0.015 | 0.016 | 0.018 | 0.017 | 0.016 | |
|
0.015 | 0.015 | 0.015 | 0.006 | 0.011 | 0.014 | 0.015 | 0.015 | 0.015 | |
|
0.915 | 0.922 | 0.933 | 0.709 | 0.856 | 0.907 | 0.769 | 0.783 | 0.786 | |
|
0.938 | 0.945 | 0.946 | 0.856 | 0.889 | 0.935 | 0.945 | 0.947 | 0.945 | |
|
0.818 | 0.836 | 0.827 | 0.732 | 0.831 | 0.868 | 0.819 | 0.836 | 0.857 | |
| IDI | Cox regression | Fine–Gray | Gerds | |||||||
|
|
|
|
|
|
|
|
|
||
| Weibull (Case 4) | ||||||||||
|
0.002 | 0.002 | 0.002 | 0.002 | 0.002 | 0.002 | 0.002 | 0.002 | 0.002 | |
|
0.002 | 0.003 | 0.003 | 0.003 | 0.003 | 0.003 | 0.002 | 0.002 | 0.002 | |
|
0.004 | 0.004 | 0.005 | 0.004 | 0.005 | 0.005 | 0.004 | 0.004 | 0.004 | |
|
0.962 | 0.947 | 0.953 | 0.897 | 0.903 | 0.909 | 0.955 | 0.957 | 0.955 | |
|
|
|
|
|
|
|
|
|
||
| Fine Gray (Case 5) | ||||||||||
|
0.002 | 0.003 | 0.003 | 0.002 | 0.002 | 0.002 | 0.0004 | 0.0007 | 0.0009 | |
|
0.003 | 0.003 | 0.003 | 0.002 | 0.002 | 0.003 | 0.0006 | 0.0008 | 0.001 | |
|
0.004 | 0.005 | 0.005 | 0.004 | 0.004 | 0.004 | 0.001 | 0.001 | 0.002 | |
|
0.864 | 0.857 | 0.842 | 0.776 | 0.773 | 0.755 | 0.865 | 0.872 | 0.870 | |
|
|
|
|
|
|
|
|
|
||
| Gerds (Case 6) | ||||||||||
|
0.003 | 0.003 | 0.003 | 0.002 | 0.002 | 0.002 | 0.002 | 0.002 | 0.002 | |
|
0.003 | 0.003 | 0.003 | 0.003 | 0.003 | 0.004 | 0.002 | 0.002 | 0.003 | |
|
0.005 | 0.005 | 0.005 | 0.004 | 0.005 | 0.005 | 0.004 | 0.004 | 0.004 | |
|
0.953 | 0.927 | 0.932 | 0.910 | 0.907 | 0.905 | 0.960 | 0.943 | 0.935 | |
Table 4.
NRI and IDI results for the MACS data at times 10 and 12 years, along with the IDI and IAUC from Shi and others (2014). Competing risk censoring by death occurred when participants died without cognitive impairment (CogImp).
| Model | Time | NRI | IDI | IDI
|
IAUC
|
Brier Score | |||
|---|---|---|---|---|---|---|---|---|---|
| CogImp | Death | CogImp | Death | CogImp | Death | ||||
| Cox regression | |||||||||
| t = 10 | 0.050 | 0.061 | 0.009 | 0.091 | 0.215 | 0.347 | 0.100 | 0.173 | |
| [0.031, 0.063] | [0.047, 0.073] | [0.003, 0.019] | [0.069, 0.114] | [0.089, 0.318] | [0.292, 0.413] | ||||
| t = 12 | 0.081 | 0.070 | 0.010 | 0.105 | 0.180 | 0.327 | 0.113 | 0.212 | |
| [0.061, 0.102] | [0.055, 0.083] | [0.003, 0.021] | [0.079, 0.132] | [0.045, 0.319] | [0.271, 0.392] | ||||
| Gerds | t = 10 | 0.041 | 0.062 | 0.024 | 0.087 | 0.214 | 0.146 | 0.100 | 0.173 |
| [0.021, 0.054] | [0.046, 0.080] | [0.015, 0.036] | [0.056, 0.117] | [0.130, 0.274] | [0.074, 0.211] | ||||
| t = 12 | 0.076 | 0.064 | 0.035 | 0.087 | 0.219 | 0.126 | 0.114 | 0.208 | |
| [0.055, 0.097] | [0.048, 0.081] | [0.025, 0.050] | [0.058, 0.120] | [0.133, 0.273] | [0.073, 0.200] | ||||
| Fine and Gray | |||||||||
| t = 10 | 0.045 | 0.037 | 0.0004 | 0.087 | 0.011 | 0.370 | 0.097 | 0.155 | |
| [0.024, 0.063] | [0.027, 0.047] | [ 0.003, 0.006] |
[0.066, 0.113] | [ 0.164, 0.238] |
[0.314, 0.437] | ||||
| t = 12 | 0.070 | 0.045 | 0.0001 | 0.101 | 0.002 | 0.339 | 0.109 | 0.196 | |
| [0.051, 0.085] | [0.034, 0.057] | [ 0.004, 0.006] |
[0.075, 0.127] | [ 0.179, 0.283] |
[0.281, 0.393] | ||||
From Table 1, we first notice that, for both NRI and
IDI, estimated
and
on average are very close to true values
and
, with the correct model for a specific
data design. The average standard deviations of the estimated NRIs based on formula (A.1) of
the Supplementary material available at Biostatistics online
approximate the empirical standard errors closely. The 95% CIs based on asymptotic normality
and estimated standard deviation cover the true values about 95% of the time, though the
coverage rates are a bit lower than 95% in some cases. One possible reason is the use of
approximation from Taylor’s expansion, and our formula-based asymptotic variance could
underestimate the true variance of the proposed NRI estimators in this situation.
Nevertheless, when models are specified correctly, the results are very good in general.
Similar to the NRI, IDI estimators are close to their true values when models are correctly
specified, average bootstrap standard deviations are comparable to empirical standard
errors, and coverage rates are around 95% using BCa bootstrap CIs. As the censoring rate
increases from 30% to 50% and sample size decreases from 400 to 200 (results shown in
Tables S1–S6 of the Supplementary material available at Biostatistics online),
standard errors of both NRI and IDI estimators increase and the coverage rates are still
satisfactory except when
and the censoring rate is 50%. However,
despite the appealing interpretation of covariate effects on CIFs, Fine and Gray’s model
does not guarantee that the sum of all cause probabilities is equal to one. Thus, the
proposed standard deviation estimation for the NRI often underestimates the true
variability. The underestimation is worse for the IDI estimators when Fine and Gray is
misused in predicting event probabilities.
Table 2 presents the results from the IDI and the incremental AUC (IAUC) that were proposed by Shi and others (2014) under the same settings as in Table 1. To facilitate comparisons, we also list our proposed estimates of the IDI and the NRI at the bottom of Table 2. The IDI and the IAUC proposed by Shi and others (2014) consider two causes separately, and thus they are not directly comparable to our methods which evaluate the two causes simultaneously. However, listing the estimates from the two methods in the same table can help appreciate the effects relative to their errors for each cause and for both causes. The methods from Shi and others (2014) lump the competing events with healthy controls together and can lead to the wrong conclusion of no effect of the added covariate on competing risk outcomes as shown in Case 2, despite the fact that the new marker is clearly related to competing risk outcomes. In contrast, our proposed IDI and NRI estimates both properly demonstrate the usefulness of the added covariate.
Table 3 summarizes the results from our proposed methods under Cases 4, 5, and 6. Although the true underlying data are from the null and both NRI and IDI are degenerate, the probabilities of covering zero are high for both the NRI’s formula-based CIs and the IDI’s BCa bootstrap CIs. Demler and others (2017) suggested to “un-nest” the models by including independent weak predictors in both models so that they are no longer nested. As a result, we added independent and non-informative noises from the standard normal distribution as additional covariates into both models. The coverage rates are improved for the NRI estimation by un-nesting the models, except for Fine and Gray’s model, probably because all cause probabilities do not sum up to one. However, un-nesting the models would introduce bias into the IDI estimation, which might lead to lower coverage rates of CIs. Thus, for the IDI, we chose to simply use the original BCa bootstrapping procedure instead. Results for the null under 50% censoring and with a 200 sample size are summarized in Tables S2, S4, and S6 of the Supplementary material available at Biostatistics online. The same patterns are observed.
Results from cross-validation (CV) when models are correctly specified are shown in Tables S7 and S8 of the Supplementary material available at Biostatistics online. Under the alternatives, the true NRI values from the Cox regression and the Gerds model are slightly smaller than the ones without CV, which is consistent with what has been generally observed in Pencina and others (2011). However, the true values from the Fine and Gray model become slightly larger after CV. This may be due to the improper probabilistic design of Fine and Gray. Alternatively, as pointed out by Pencina and others (2011), though probabilities from both models tend to be smaller in the validation sample, their difference may not change or even go up, resulting in a larger NRI value. The coverage rates of cross-validated NRI with BCa bootstrap are relatively low. A better bootstrap procedure or explicit asymptotic theory can be further explored. The performance of BCa bootstrap of the IDI with CV is satisfactory, except for Gerds’ model, where the B-spline technique is used for model approximation, and which may not be accurate at later time points with more censoring. As for Cox regression and Fine and Gray’s models, the IDIs are close to the ones without CV. The different impact of CV on the IDI and NRI estimation suggests that probabilities estimated from CV did not change significantly, while the small changes could make a difference in how the outcomes are recategorized. This is particularly relevant for our simulations, as cause 1 and cause 2 events were set to occur with equal probability under cases 1 and 3. When the added covariate did not have any effect on competing risk outcomes, as represented by Table S8 of the Supplementary material available at Biostatistics online, the CV NRIs and IDIs are all very close to the true value of zero. However, the CV bootstrap method tends to overestimate the variability, and as a result the coverage rates are overly conservative.
As a comparison, we also implemented the 0.632 bootstrap method proposed by Zheng and others (2013) and tried both percentile bootstrap and normal approximation bootstrap. Although this method is efficient to run and accounts for “over-optimism” of survival models, the coverage rates for the NRI under all six cases are much higher (sometimes 100%) when models are correctly specified.
4. Application to the Multicenter AIDS Cohort Study
We applied the NRI and IDI methods to data obtained by the MACS, an ongoing study of men who have sex with men and at risk for or infected with HIV, recruited from four institutions in Baltimore, Chicago, Pittsburgh, and Los Angeles (Kingsley and others, 1987; Kaslow and others, 1987). The data used for this analysis were gathered between April 2, 1984 and April 8, 2017. Each participant underwent a clinical examination semi-annually and neuropsychological testing approximately every 2 years (however, see Miller and others (1990) and Becker and others (2014) for details) until they dropped out of the study voluntarily or died. The current analysis utilizes the data from a substudy of the legacy effect of HIV on cognitive impairment among 2783 HIV seropositive men.
Individuals with HIV disease have historically been at risk for cognitive impairment. The MACS measured cognitive functions over time with a battery of neuropsychological (NP) tests, which were summarized by T scores in six cognitive domains: working memory and attention, learning, motor speed and coordination, executive functioning, speed of information processing, and memory. We adopted the multivariate normative comparison (MNC) method to define abnormality in cognition as in Huizenga and others (2007) and Wang and others (2019). Time to impairment was defined as the interval between study entry and the first visit where the six domain scores were deemed abnormal by the MNC method. Those participants who were impaired at their first visit were excluded from the current analysis. Although cognitive impairment and death could be thought of as semi-competing risks where death may censor impairment but not vice versa, we treated them as competing risk data by defining the events as cognitive impairment and death without impairment. If a participant died after the last complete NP visit and no cognitive impairment was detected, his time to impairment was competing risk censored by death. Otherwise, participants were censored at their last visit.
In the presence of competing risk censoring, techniques such as Cox regression, Fine–Gray’s model, and Gerds’ model can be used to identify potential risk factors affecting cognition after the onset of HIV infection. However, these methods do not directly quantify the relative importance of a factor in predicting who might develop impairment, who might die, or who might be alive and disease free after a fixed time interval. Here we apply the NRI and the IDI treating CD4+ cell count as the “new” biomarker (with both linear and quadratic terms to account for nonlinearity) to examine whether the inclusion of this variable will yield a better prediction. In the Legacy substudy, three other predictors—age, center for epidemiologic studies depression scale, and recruitment cohort (before or after 2001)— were found to be significantly related to cognitive impairment and were treated here as conventional predictors. All four predictors were measured at study entry. The final dataset for the analysis included 1972 seropositive men who had at least one visit with complete cognitive tests and the information on four predictors.
Within this subsample, 553 men were classified with cognitive impairment using the MNC
method (28.0%), 597 died during follow-up without any cognitive impairment (30.3%), and 822
were censored by the “end” (at the data freeze) of the study (41.7%). Time to event or time
in the study ranged from 5 months to 33 years. We examined the performance of CD4+ cell
count and its quadratic transformation as the “new” biomarkers in predicting health status
at 10 and 12 years since the start of the study with a proportional hazard model,
Fine–Gray’s model, and Gerds’ model, using both NRI and IDI. The two events, cognitive
impairment and death without cognitive impairment, were again modeled separately with Cox’s
model or Fine–Gray’s model, and they were modeled simultaneously in the Gerds model.
Five-fold cross-validation was used to compute the probabilities of both events and survival
at selected times. Based on the predicted probabilities of both events
and
and predicted survival
that were calculated from the
three models at 10 and 12 years, we computed the values of the NRI and IDI. Ten thousand
bootstrap samples were used to produce 95% BCa CIs.
To select the most suitable regression model, we also computed cause-specific Brier scores
(Schoop and others, 2011) with
5-fold cross-validation. At a selected time
, the competing events
happening before
contributed to the score and were weighted
using the method of inverse probability of censoring weighting, while right censored ones
before
were omitted. As a comparison,
cause-specific IDI and IAUC proposed by Shi and
others (2014) were also computed. The results are summarized in Table
4. We can see that the estimated NRI and IDI and
their 95% CIs are comparable across the three different models. Among the three models, the
Fine and Gray model has the lowest Brier scores for both events at 10 and 12 years,
suggesting that Fine and Gray is the most suitable competing risk model for our data.
Moreover, based on the goodness-of-fit testing procedures for the proportionality of
subdistribution hazards ratio, the linearity of covariates, and the log-log link function of
the Fine and Gray model (Li and others,
2015), the Fine and Gray model seems to provide a good fit to the MACS data
(P values
0.36).
From the Fine and Gray model, the estimated NRIs at 10 and 12 years since the start of the
study are 0.045 and 0.070 with 95% BCa CIs [0.024, 0.063] and [0.051, 0.085], respectively.
The estimated IDIs are 0.037 and 0.045 with 95% BCa CIs [0.027, 0.047] and [0.034, 0.057].
Because the 95% CIs of both NRI and IDI do not include zero, we conclude that including the
CD4+ cell counts in competing risk models increases the accuracy of predicting cognitive
impairment and death after 10 and 12 years in the study. More specifically, the
probabilities of correctly predicting health status (impairment, death, or neither) for a
participant after 10 and 12 years of observation improves by 2.4% and 6.8% by simply
incorporating CD4+ cell counts with its quadratic transformation into the model. Also, the
variability explained by the predictive model is increased by 3.0% and 3.8% for events at 10
and 12 years with the addition of the CD4+ cell counts. We found that IDI estimators with CV
remain very close to the ones without CV (differing at the
or
significant digit) because
estimated probabilities did not change significantly with CV. In comparison, the IDI and the
IAUC proposed by Shi and others
(2014) did not identify a significant increase in accuracy of predicting cognitive
impairment or death after including CD4+ cell counts, with their 95% CIs including zero.
This discrepancy is consistent with what we have observed in our simulation results (Table
2), where Shi’s methods failed to detect the
accuracy improvement when competing events were improperly lumped with none events.
Finally, it is important to recognize that some participants withdrew from the legacy substudy and died many years afterwards. If a participant died more than four years after his last NP visit, he may have experienced cognitive impairment between his last NP visit and his death. As a sensitivity analysis, we censored such participants four years after their last NP visit, assuming cognition stayed relatively stable over two consecutive NP visits (about four years as scheduled). In this way, 553 men were classified with cognitive impairment using the MNC method (28.0%), 425 died within four years after the last NP visit without any cognitive impairment (21.6%), and 994 were censored either at their last study visit or four years following their last NP exam, whichever was first (50.4%). Using the Fine and Gray model, the estimated NRIs at 10 and 12 years since the start of the study are 0.097 and 0.098 with 95% BCa confidence intervals [0.077, 0.118] and [0.077, 0.119], respectively. The estimated IDIs are 0.101 and 0.101 with 95% BCa confidence intervals of [0.075, 0.118] and [0.076, 0.114]. Again, these findings suggest that including CD4+ cell counts in competing risk models can increase prediction accuracy of death and cognitive impairment after 10 and 12 years in the study.
5. Discussion
We have demonstrated here the reliable practical performance of the extended NRI and IDI in competing risk settings. Although a CI for the IDI can be efficiently constructed based on the asymptotic linear representation for a well-studied regression model, the BCa bootstrap method serves as a flexible alternative when a model is relatively new and its theoretical properties are less known. When the added variables have no effect on the events and the models to be compared are nested, Demler and others (2017) showed that the theory based on U-statistics fails. Still, the CIs for the NRI based on asymptotic normality and the BCa bootstrap CIs for the IDI seem to have satisfactory coverage as demonstrated by simulations. After un-nesting the models, the CI for the NRI is improved.
In this work, we have considered three reasonable competing risk models. However, one can use any other semiparametric or parametric models such as Scheike and others (2008) and Cheng (2009). The limitation of the extended NRI and IDI is that they are model dependent and are not robust against model mis-specification. As a result, it remains important to select the proper predictive model before examining diagnostic accuracy improvement over the course of the variable addition (Leening and others, 2014). Metrics, such as the Brier score (Schoop and others, 2011) or the C statistic proposed by Uno and others (2011), are useful in choosing the most appropriate model for the data.
Competing risks are common in biomedical research although they are often neglected in
analysis. The extended NRI and IDI for competing events provide alternative and
straightforward interpretations of the importance of new biomarkers on top of conventional
factors. They also serve as more unifying metrics than do model coefficients such as hazards
ratios or odds ratios, since the latter depend on the types and the scales of covariates.
Moreover, this is in line with recent debates about moving away from statistical
significance of 0.05 level (Wasserstein and
others, 2019). Instead of simply looking at
values
for the added variables in a regression model, one can assess the contribution of additional
risk factors in prediction through interval estimates of the IDI and NRI. Thus, the extended
NRI and IDI for multiple competing endpoints might be useful in screening and selecting
covariates in high-dimensional settings.
Supplementary Material
Acknowledgments
We are grateful to the associate editor and two anonymous reviewers for their constructive comments and suggestions that led to an improved article. This research was supported in part by the University of Pittsburgh Center for Research Computing through the resources provided. Data in this manuscript were collected by the MACS. MACS (Principal Investigators): Johns Hopkins University Bloomberg School of Public Health (Joseph Margolick, Todd Brown), U01-AI35042; Northwestern University (Steven Wolinsky), U01-AI35039; University of California, Los Angeles (Roger Detels, Otoniel Martinez-Maza, Otto Yang), U01-AI35040; University of Pittsburgh (Charles Rinaldo, Lawrence Kingsley, Jeremy Martinson), U01-AI35041; the Center for Analysis and Management of MACS, Johns Hopkins University Bloomberg School of Public Health (Lisa Jacobson, Gypsyamber D’Souza), UM1-AI35043. The MACS is funded primarily by the National Institute of Allergy and Infectious Diseases (NIAID), with additional co-funding from the National Cancer Institute (NCI), the National Institute on Drug Abuse (NIDA), and the National Institute of Mental Health (NIMH). Targeted supplemental funding for specific projects was also provided by the National Heart, Lung, and Blood Institute (NHLBI), and the National Institute on Deafness and Communication Disorders (NIDCD). MACS data collection is also supported by UL1-TR001079 (JHU ICTR) from the National Center for Advancing Translational Sciences (NCATS) a component of the National Institutes of Health (NIH), and NIH Roadmap for Medical Research. Additional support was provided by the Johns Hopkins University Center for AIDS Research (P30AI094189). The contents of this publication are solely the responsibility of the authors and do not represent the official views of the National Institutes of Health (NIH), Johns Hopkins ICTR, or NCATS. The MACS website is located at http://aidscohortstudy.org/.
Conflict of Interest: None declared.
Contributor Information
Zheng Wang, Department of Statistics, University of Pittsburgh, Pittsburgh, PA 15260, USA.
Yu Cheng, Departments of Statistics and Biostatistics, University of Pittsburgh, Pittsburgh, PA 15260, USA yucheng@pitt.edu.
Eric C Seaberg, Department of Epidemiology, Johns Hopkins University, Baltimore, MD 21202, USA.
James T Becker, Departments of Psychiatry, Neurology, and Psychology, University of Pittsburgh, Pittsburgh, PA 15260, USA.
Software
Software in the form of R code and a sample dataset are available at https://github.com/WangandYu/NRIandIDI.
Supplementary Material
Supplementary material is available online at http://biostatistics.oxfordjournals.org.
Funding
This work was partially supported by the National Institute on Aging (R01 AG034852 to J.T.B.) and National Science Foundation Division of Mathematical Sciences (1916001 to Y.C.).
References
- Becker, J. T., Kingsley, L. A., Molsberry, S., Reynolds, S., Aronow, A., Levine, A. J., Martin, E., Miller, E. N., Munro, C. A., Ragin, A., Sacktor, N.. and others. (2014). Cohort profile: recruitment cohorts in the neuropsychological substudy of the Multicenter AIDS Cohort Study. International Journal of Epidemiology 44, 1506–1516. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blanche, P., Dartigues, J. and Jacqmin–Gadda, H. (2013). Estimating and comparing time-dependent areas under receiver operating characteristic curves for censored event times with competing risks. Statistics in Medicine 32, 5381–5397. [DOI] [PubMed] [Google Scholar]
- Cheng, S. C., Fine, Jason P. and Wei, L. J. (1998). Prediction of cumulative incidence function under the proportional hazards model. Biometrics 54, 219–228. [PubMed] [Google Scholar]
- Cheng, Y. (2009). Modeling cumulative incidences of dementia and dementia-free death using a novel three-parameter logistic function. The International Journal of Biostatistics 5, 1557–4679. [Google Scholar]
- Demler, O. V., Pencina, M. J., Cook, N. R. and D’Agostino, R. B. (2017). Asymptotic distribution of δAUC, NRIs, and IDI based on theory of U-statistics. Statistics in Medicine 36, 3334–3360. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Efron, B. (1987). Better bootstrap confidence intervals. Journal of the American Statistical Association 82, 171–185. [Google Scholar]
- Efron, B. and Tibshirani, R. J. (1993). An Introduction to the Bootstrap. New York: Springer. [Google Scholar]
- Fine, J. P. and Gray, R. J. (1999). A proportional hazards model for the subdistribution of a competing risk. Journal of the American Statistical Association 94, 496–509. [Google Scholar]
- Gerds, T. A., Scheike, T. H. and Andersen, P. K. (2012). Absolute risk regression for competing risks: interpretation, link functions, and prediction. Statistics in Medicine 31, 3921–3930. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Greenland, P. and O’Malley, P. G. (2005). When is a new prediction marker useful? A consideration of lipoprotein-associated phospholipase a2 and c-reactive protein for stroke risk. Archives of Internal Medicine 165, 2454–2456. [DOI] [PubMed] [Google Scholar]
- Huizenga, H. M., Smeding, H., Grasman, R. P. P. P. and Schmand, B. (2007). Multivariate normative comparisons. Neuropsychologia 45, 2534 – 2542. [DOI] [PubMed] [Google Scholar]
- Janes, H. (2013). Letter to the editor on “Multicategory reclassification statistics for assessing improvements in diagnostic accuracy”. Biostatistics 14, 807–808. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kaslow, R. A., Ostrow, D. G., Detels, R., Phair, J. P., Polk, F. B. and Rinaldo, C. R. Jr. (1987). The multicenter AIDS cohort study: rationale, organization, and selected characteristics of the participants. American Journal of Epidemiology 126, 310–318. [DOI] [PubMed] [Google Scholar]
- Kingsley, L., Kaslow, R., Rinaldo, C. J. R., Detre, K., Odaka, N. and Vanraden, M. (1987). Risk factors for seroconversion to human immunodeficiency virus among male homosexuals. The Lancet 329, 345–349. [DOI] [PubMed] [Google Scholar]
- Leening, M. J. G., Steyerberg, E. W., Van Calster, B., D’Agostino Sr., R. B. and Pencina, M. J. (2014). Net reclassification improvement and integrated discrimination improvement require calibrated models: relevance from a marker and model perspective. Statistics in Medicine 33, 3415–3418. [DOI] [PubMed] [Google Scholar]
- Li, J., Jiang, B. and Fine, J. P. (2013a). Authors’ response on “Multicategory reclassification statistics for assessing improvements in diagnostic accuracy”. Biostatistics 14, 809–810. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li, J., Jiang, B. and Fine, J. P. (2013b). Multicategory reclassification statistics for assessing improvements in diagnostic accuracy. Biostatistics 14, 382–394. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li, J., Scheike, T. H. and Zhang, M. (2015). Checking Fine and Gray subdistribution hazards model with cumulative sums of residuals. Lifetime Data Analysis 21, 197–217. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miller, E. N., Seines, O. A., McArthur, J. C., Satz, P., Becker, J. T., Cohen, B. A., Sheridan, K., Machado, A. M., Van Gorp, W. G. and Visscher, B. (1990). Neuropsychological performance in HIV-1-infected homosexual men. Neurology 40, 197. [DOI] [PubMed] [Google Scholar]
- Pencina, M. J., D’ Agostino, R. B., D’ Agostino, R. B. and Vasan, R. S. (2008). Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Statistics in Medicine 27, 157–172. [DOI] [PubMed] [Google Scholar]
- Pencina, M. J., D’Agostino, R. B. and Steyerberg, E. W. (2011). Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers. Statistics in Medicine 30, 11–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pepe, M. S., Janes, H., Longton, G., Leisenring, W. and Newcomb, P. (2004). Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. American Journal of Epidemiology 159, 882–890. [DOI] [PubMed] [Google Scholar]
- Scheike, T. H., Zhang, M. and Gerds, T. A. (2008). Predicting cumulative incidence probability by direct binomial regression. Biometrika 95, 205–220. [Google Scholar]
- Schoop, R., Beyersmann, J., Schumacher, M. and Binder, H. (2011). Quantifying the predictive accuracy of time-to-event models in the presence of competing risks. Biometrical Journal 53, 88–112. [DOI] [PubMed] [Google Scholar]
-
Shi, H., Cheng, Y. and Li, J. (2014). Assessing diagnostic accuracy
improvement for survival or competing
risk censored
outcomes. Canadian Journal of
Statistics 42,
109–125. [Google Scholar] - Uno, H., Cai, T., Pencina, M. J., D’Agostino, R. B. and Wei, L. J. (2011). On the c-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Statistics in Medicine 30, 1105–1117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Uno, H., Tian, L., Cai, T., Kohane, I. S. and Wei, L. J. (2013). A unified inference procedure for a class of measures to assess improvement in risk prediction systems with survival data. Statistics in Medicine 32, 2430–2442. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang, Z., Molsberry, S. A., Cheng, Y., Kingsley, L., Levine, A. J., Martin, E., Munro, C.A., Ragin, A., Rubin, L. H., Sacktor, N., Seaberg, E.. and others. (2019). Cross-sectional analysis of cognitive function using multivariate normative comparisons in men with HIV disease. AIDS 33, 2115–2124. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ware, J. H. (2006). The limitations of risk factors as prognostic tools. New England Journal of Medicine 355, 2615–2617. [DOI] [PubMed] [Google Scholar]
-
Wasserstein, R. L., Schirm, A. L and Lazar, N. A. (2019). Moving to a world beyond
“p
0.05”. The
American Statistician 73:sup1,
1–19. [Google Scholar] - Zheng, Y., Parast, L., Cai, T. and Brown, M. (2013). Evaluating incremental values from new predictors with net reclassification improvement in survival analysis. Lifetime Data Analysis 19, 350–370. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.












