Abstract
The continuous net reclassification improvement (NRI) statistic is a popular model change measure that was developed to assess the incremental value of new factors in a risk prediction model. Two prominent statistical issues identified in the literature call the utility of this measure into question: (1) it is not a proper scoring function and (2) it has a high false positive rate when testing whether new factors contribute to the risk model. For binary response regression models, these subjects are interrogated and a modification of the continuous NRI, guided by the likelihood-based score residual, is proposed to address these issues. Within a nested model framework, the modified NRI may be viewed as a distance measure between two risk models. An application of the modified NRI is illustrated using prostate cancer data.
Keywords: Binary response model, L1 distance, Nested models, Proper score, Valid test
1. INTRODUCTION
In the clinical setting, individual risk assessment is often derived through a regression model, which incorporates a combination of risk factors due to biological complexity. These risk models are used in forecasting future health outcomes of an individual such as treatment response or survival. The quality of the risk model, evaluated using statistical measures such as calibration, discrimination, explained variation, and likelihood based, reflects the level of confidence in the forecast (Gerds and Kattan 2021). When the objective is to incorporate a new set of factors to an existing risk model, assessing the impact of these new factors on the forecast is critical. For binary response regression, a discrimination measure, the net reclassification improvement (NRI), is one statistic used for this evaluative process. The NRI, also referred to as the net reclassification index, was developed to ascertain whether the introduction of new risk factors move a model derived forecast in a direction consonant with the binary response outcome (Pencina et al. 2008).
The NRI statistic has been criticized on numerous grounds with two prevailing points of contention. First, the NRI is not a proper scoring function. A statistic, or more generally an objective function, is called proper if it attains its minimum or maximum when the data generating process is properly specified. The use of a proper score statistic rewards the analyst for correctly identifying the data generation mechanism (Gneiting and Raftery, 2007). The second point of concern is that the NRI has a high false positive rate when testing whether the new factors contribute to the risk model, even in situations that include independent training and test datasets (Kerr et al. 2014 and Pepe et al. 2014, 2015). As a practical matter, measures with high false positive rates lead to the introduction of irrelevant factors into the model development process.
Despite these critiques, the NRI is a popular statistic, and in the three-year time period 2019–2021, it was cited in PubMed over 800 times. The purpose of this work is to elucidate the methodology underlying these two concerns and to propose a likelihood guided modification to the NRI to rectify these issues.
The NRI is defined through a series of nested regression models, where it is assumed that the existing factors alone , which includes a constant for the intercept term, or combined with new factors are modeled as
| (1) |
where is a binary outcome denoted as event or non-event , is a known monotone inverse link from a generalized linear model (McCullagh and Nelder, 1983), is the base model risk score, is the expanded model risk score, and is the constant model risk score. Throughout this work, random variables are represented with upper case, their observed copies are written in lower case, and vectors are indicated in bold.
The log-likelihood used to estimate the model parameters is
where are independent identically distributed copies of . The maximum likelihood estimates from the three models are represented as: , , and , the observed proportion of events.
Historically, the NRI was developed under the assumption that the base model risk score could be placed in risk classification categories. It was a measure of whether the expanded model risk score, due to the addition of new factors, would move into higher risk categories for subjects with an event and into lower risk categories for subjects without an event. This framework, however, requires apriori clinically meaningful risk categories, which are often not apparent at the time of analysis, particularly in the early stage of model development. As a result, the continuous NRI was developed (Pencina et al. 2011) and it is this measure that is the focus of this work.
The population NRI is defined as
where and . When multiplied by , the population NRI is estimated as
| (2) |
Assuming at least one component of is continuous, it can be asserted without loss of generality, that the indicator function can be extended as
| (3) |
Although the net reclassification improvement statistic is a frequently applied model change measure, its lack of propriety and high false positive rate are problematic. In Section 2, a modified NRI (mNRI) is developed that satisfies the concept of a proper change score, which adapts the proper scoring principle to model change measures (Pepe et al. 2015). Section 3 demonstrates that a smooth version of the mNRI provides a valid test procedure when the population NRI is zero. This result is established in the single sample and the independent training and test data case. In Section 4, a prostate cancer data example is used to illustrate these concepts and Section 5 contains a discussion.
2. THE mNRI IS A PROPER CHANGE SCORE
For a correctly specified parametric risk model, a performance measure is a proper score if its expected value is minimized/maximized at the true model parameter value (Gneiting and Raftery, 2007). For example, the expected value of the Brier score applied to the expanded model
is minimized at . If a performance measure is not a proper score, then the analyst may find inconsistent parameter estimates that make the measure look better. Population performance measures such as the expected value of the area under the curve (AUC), the Brier score (BS), and Kullback-Leibler divergence (KL), are maximized/minimized at their true parameter values and therefore are proper scores.
Proper scoring is more difficult to achieve for model change measures. Consider the case where a performance measure is applied separately to the expanded model and the base model, and the change measure is
If the performance measure (M) is convex,
but the difference of two convex functions is not necessarily convex, and in general,
To adapt proper scoring to change measures, Pepe et al. (2015) orient the model parameter space so that the base model is evaluated at the true parameter . Based on their definition, a measure is a proper change score if its expected value is minimized/maximized at the true expanded model parameter value. In this setting, is termed a proper change score, since
recreating the single model evaluation. The term proper change score is used here to acknowledge the adaptation of the proper scoring principle to change measures. Under this definition, , and are proper change scores.
The NRI differs from other change measures because it is a statistic based on within subject change and not between model change as above. In addition, the statistic is composed of parameter estimates from three nested models. As a result, it is not covered under the previous argument. To satisfy the proper change score criterion, the NRI is modified
| (4) |
which is constructed by replacing the constant model score residual in (2) with the base model score residual , where
The modified NRI (mNRI) is closely akin to the maximum score statistic and the least absolute deviation statistic (Manski 1985, Horowitz 1998), which provide the framework for the derivation in Theorem 1.
Theorem 1.
Consider the mNRI scoring function derived from a single random variable, with the base and constant model parameters given
The mNRI scoring function is a proper change score,
The theorem is proved in the appendix.
An interpretation of the mNRI statistic is obtained by rewriting it as
where is the base model score residual vector and is a sign vector with subject components . The mNRI is a function of the propensity of the event outcome and a regression coefficient representing the association between the direction of the risk score due to adding and the event outcome after taking into account . This perspective is analogous to a partial residual plot, where a model covariate of interest is replaced by a between model directional covariate .
An alternative interpretation of the mNRI may be considered from the viewpoint of its limiting value
| (5) |
where the weight stems from the base model score residual,
Thus, the population mNRI is a weighted distance measure between the nested event probabilities. An important special case occurs when is logistic and
which results in an unweighted L1 distance measure. Here, the population mNRI is proportional to the mean absolute deviation (MAD) of the nested event probabilities. In addition to using the MAD as a summary measure, this result suggests that graphical insight into the mNRI may be obtained by plotting the base model event probability estimates by the expanded model event probability estimates.
Inference with the modified NRI (4) is complicated due to the presence of the indicator function. To proceed, a smooth version of the mNRI is
| (6) |
where Φ(u/h) is a normal distribution function with scale parameter (bandwidth) h that goes to zero as n gets large. The smooth NRI is employed in Theorem 2 below. This theorem provides the inferential framework through a two-step process. First, it demonstrates the asymptotic equivalence between the mNRI and its smooth counterpart. Second, via this asymptotic equivalence, the asymptotic distribution of the mNRI is derived.
Theorem 2.
Consider , and defined in (4), (6), and (5), with . Assume the scale parameter in the smooth mNRI (6) is chosen so that as , and . Then
.
Theorem 2(a) was demonstrated in Heller (2007). Theorem 2(b) is proved in the appendix.
3. THE NRI FALSE POSITIVE RATE
Empirical research on the utility of the NRI has raised questions as to whether it has an unacceptably high false positive rate, signifying a larger than anticipated value when the new factors have no effect on the binary response (Kerr et al. 2014 and Pepe et al. 2014, 2015). In this section, this issue is investigated, and a valid test procedure is developed, both in the case of a single sample and when independent training and test samples are included.
Pencina et al. (2008) state that under the null , the asymptotic distribution of the estimated NRI in (2) is
| (7) |
where accounting for the multiplication by , the asymptotic variance is estimated as . Further work by Pencina et al. (2011, 2012) modified the asymptotic variance calculation. In a series of simulation experiments, Kerr et al. (2014) and Pepe et al. (2014, 2015) evaluated the adequacy of this result, using a conditional binormal model to produce nested logistic regression models. They found that on average, under the null, the NRI estimate was positive and that the type 1 error rate using the asymptotic normal reference distribution was as high as 0.63 (Pepe et al. 2014). Additional simulations that incorporated independent training and test datasets produced similar conclusions. Taken in total, these results represent a critical indictment against the test procedure in (7). A problem, recognized by these authors, and Demler et al. (2017), is that the asymptotic normal reference distribution is incorrect.
Consider a smooth NRI
| (8) |
where similar to Theorem 2, the extended indicator function is replaced by a continuous normal distribution function . The purpose of this smoothing is to facilitate the derivation of the asymptotic null reference distribution. Interestingly, in contrast to the smoothing result in Section 2, under the null , one can essentially set the scale parameter in the smooth NRI test statistic. This is due to the convergence properties of the maximum likelihood estimates under the null (Pepe et al. 2013), . By setting ,
creating an asymptotically valid test statistic at a faster rate than allowing with increasing .
Theorem 3.
Assume the binary response regression models in (1) are properly specified and the covariate vectors and have dimension and , respectively. If , then
Theorem 3 is derived in the appendix. The first term is the inner product of two positively correlated, q-dimensional, mean zero normal random vectors, and the second term is bilinear, where is a dimensional random vector with quadratic components, and is a dimensional constant vector. This result demonstrates that the null distribution of the NRI is not normal, the distribution is not symmetric about zero, and in general, does not have mean zero, which explains the anomalous findings in Kerr et al. (2014) and Pepe et al. (2014, 2015).
The reference distribution for the NRI test statistic is complex and difficult to apply. In contrast, the mNRI test statistic
has a straightforward null reference distribution.
Theorem 4.
Assume the binary regression models in (1) are properly specified and the covariate vectors and have dimension and , respectively. If , then
where is the standard normal density function evaluated at 0, and is a chi-square random variable with degrees of freedom. A proof of this result is found in the appendix.
Theorems 3 and 4 reorient one’s understanding of what constitutes meaningful NRI and mNRI statistics and Theorem 4 provides an uncomplicated metric to test the mNRI distance from zero. If the new clinical factors () are noise, then small positive values are simply random variation under the null, and only large positive values, as determined by the scaled chi-square reference distribution, are considered meaningful. A precursor to this result is found in Kerr et al. (2011).
Theorem 4 covers the single sample case. Alternatively, the test statistic may be constructed from two independent data sets from the same population, where the regression coefficients are estimated from the training data and the test data , and the data for the test statistic are drawn from the independent test data. Under these conditions, the reference distribution for the smooth mNRI test statistic
is provided in Theorem 5.
Theorem 5.
Assume the binary regression models for the training and test data have the same specification and are given in (1), where the covariate vector has dimension and the covariate vector has dimension . If ,
where is defined in Theorem 4, are independent chi-square random variables each with one degree of freedom, and represent eigenvalues determined from the product matrix (Baldessari 1967), where
and . The details are provided in the appendix.
4. SIMULATION STUDIES
Simulation studies were performed to assess the adequacy of the asymptotic distribution of the mNRI derived in Theorem 2 and the validity of the reference distributions derived in Theorems 4 and 5. A conditional bivariate normal covariate distribution was used to generate nested logistic risk models. The conditioning variable was the event status with Pr(Y = 1) = {0.25, 0.50, 0.75}. The bivariate normal had a common variance-covariance matrix across event status, with correlation parameters 0 or 0.5. The mean of (X, Z) was 0 for Y = 0. The mean of X for Y = 1 was {0.25, 0.50, 0.75, 1.0}. The mean of Z for Y = 1 was chosen to produce specified true mNRI values. Simulations with 200 and 500 observations per replicate were conducted. Five thousand replicates were run for each simulation.
Tables 1 and 2 estimate the mNRI, and use Theorem 2 to compute the asymptotic standard error of this estimate along with its coverage based on an asymptotic 95% confidence interval. Tables 3 and 4 compare the size estimates for the mNRI reference distributions in Theorems 4 and 5 with the NRI normal reference distribution in (7). The nominal type 1 error in all simulations was 0.05. Power simulations are summarized in Table 5, comparing the mNRI test from Theorem 4 with the Wald test for regression coefficients associated with the new factors.
TABLE 1.
Simulation results for the modified NRI (mNRI). The true mNRI is equal to 0.05 based on 50,000 replicates.
| ρ = 0 | ρ = 0.5 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
|
| |||||||||||
| n | π 0 | μX | Sim se | Coverage | Sim se | Coverage | |||||
|
| |||||||||||
| 200 | 0.25 | 0.25 | 0.0663 | 0.0908 | 0.0613 | 0.9624 | 0.0649 | 0.1037 | 0.0607 | 0.9654 | |
| 200 | 0.25 | 0.50 | 0.0642 | 0.0905 | 0.0600 | 0.9634 | 0.0625 | 0.1053 | 0.0586 | 0.9682 | |
| 200 | 0.25 | 0.75 | 0.0608 | 0.0910 | 0.0573 | 0.9608 | 0.0632 | 0.1001 | 0.0582 | 0.9598 | |
| 200 | 0.25 | 1.00 | 0.0637 | 0.0824 | 0.0570 | 0.9580 | 0.0653 | 0.0932 | 0.0566 | 0.9654 | |
|
| |||||||||||
| 200 | 0.50 | 0.25 | 0.0577 | 0.0784 | 0.0531 | 0.9644 | 0.0563 | 0.0881 | 0.0516 | 0.9660 | |
| 200 | 0.50 | 0.50 | 0.0559 | 0.0785 | 0.0508 | 0.9636 | 0.0579 | 0.0852 | 0.0524 | 0.9618 | |
| 200 | 0.50 | 0.75 | 0.0569 | 0.0735 | 0.0508 | 0.9594 | 0.0600 | 0.0837 | 0.0517 | 0.9628 | |
| 200 | 0.50 | 1.00 | 0.0553 | 0.0705 | 0.0487 | 0.9550 | 0.0630 | 0.0792 | 0.0507 | 0.9546 | |
|
| |||||||||||
| 200 | 0.75 | 0.25 | 0.0643 | 0.0947 | 0.0590 | 0.9660 | 0.0680 | 0.1037 | 0.0601 | 0.9620 | |
| 200 | 0.75 | 0.50 | 0.0674 | 0.0918 | 0.0591 | 0.9650 | 0.0640 | 0.1071 | 0.0597 | 0.9594 | |
| 200 | 0.75 | 0.75 | 0.0628 | 0.0897 | 0.0585 | 0.9588 | 0.0645 | 0.1113 | 0.0583 | 0.9568 | |
| 200 | 0.75 | 1.00 | 0.0627 | 0.0840 | 0.0560 | 0.9640 | 0.0654 | 0.1082 | 0.0576 | 0.9516 | |
|
| |||||||||||
| 500 | 0.25 | 0.25 | 0.0562 | 0.0545 | 0.0434 | 0.9498 | 0.0518 | 0.0583 | 0.0428 | 0.9484 | |
| 500 | 0.25 | 0.50 | 0.0518 | 0.0544 | 0.0422 | 0.9484 | 0.0512 | 0.0577 | 0.0412 | 0.9558 | |
| 500 | 0.25 | 0.75 | 0.0499 | 0.0518 | 0.0403 | 0.9506 | 0.0524 | 0.0544 | 0.0407 | 0.9540 | |
| 500 | 0.25 | 1.00 | 0.0556 | 0.0490 | 0.0407 | 0.9450 | 0.0548 | 0.0527 | 0.0409 | 0.9460 | |
|
| |||||||||||
| 500 | 0.50 | 0.25 | 0.0496 | 0.0456 | 0.0383 | 0.9406 | 0.0444 | 0.0509 | 0.0363 | 0.9520 | |
| 500 | 0.50 | 0.50 | 0.0446 | 0.0463 | 0.0358 | 0.9468 | 0.0497 | 0.0477 | 0.0371 | 0.9434 | |
| 500 | 0.50 | 0.75 | 0.0494 | 0.0433 | 0.0362 | 0.9414 | 0.0523 | 0.0451 | 0.0366 | 0.9462 | |
| 500 | 0.50 | 1.00 | 0.0487 | 0.0410 | 0.0353 | 0.9384 | 0.0576 | 0.0421 | 0.0359 | 0.9354 | |
|
| |||||||||||
| 500 | 0.75 | 0.25 | 0.0520 | 0.0550 | 0.0422 | 0.9516 | 0.0556 | 0.0581 | 0.0430 | 0.9514 | |
| 500 | 0.75 | 0.50 | 0.0550 | 0.0533 | 0.0424 | 0.9502 | 0.0520 | 0.0595 | 0.0423 | 0.9478 | |
| 500 | 0.75 | 0.75 | 0.0519 | 0.0514 | 0.0414 | 0.9460 | 0.0527 | 0.0574 | 0.0417 | 0.9474 | |
| 500 | 0.75 | 1.00 | 0.0535 | 0.0497 | 0.0398 | 0.9548 | 0.0571 | 0.0561 | 0.0405 | 0.9430 | |
= smooth mNRI; Coverage = Simulation coverage from 95% confidence interval
= average standard error for smooth mNRI
Sim se = Simulation standard error for smooth mNRI
n = Sample size within each simulation ρ = Correlation between covariates (X, Z); π0 = Pr(Y = 1); μX = Population mean for X when Y = 1
TABLE 2.
Simulation results for the modified NRI (mNRI). The true mNRI is equal to 0.10 based on 50,000 replicates.
| ρ = 0 | ρ = 0.5 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
|
| |||||||||||
| n | π 0 | μX | Sim se | Coverage | Sim se | Coverage | |||||
|
| |||||||||||
| 200 | 0.25 | 0.25 | 0.1053 | 0.0834 | 0.0701 | 0.9404 | 0.1019 | 0.0890 | 0.0695 | 0.9464 | |
| 200 | 0.25 | 0.50 | 0.1019 | 0.0823 | 0.0684 | 0.9444 | 0.0957 | 0.0896 | 0.0674 | 0.9462 | |
| 200 | 0.25 | 0.75 | 0.0956 | 0.0797 | 0.0661 | 0.9372 | 0.1002 | 0.0859 | 0.0672 | 0.9436 | |
| 200 | 0.25 | 1.00 | 0.1021 | 0.0754 | 0.0648 | 0.9394 | 0.1029 | f 0.0801 | 0.0647 | 0.9464 | |
|
| |||||||||||
| 200 | 0.50 | 0.25 | 0.0984 | 0.0698 | 0.0606 | 0.9470 | 0.0932 | 0.0754 | 0.0608 | 0.9370 | |
| 200 | 0.50 | 0.50 | 0.0941 | 0.0691 | 0.0599 | 0.9418 | 0.0963 | 0.0722 | 0.0601 | 0.9388 | |
| 200 | 0.50 | 0.75 | 0.0963 | 0.0649 | 0.0584 | 0.9388 | 0.1011 | 0.0696 | 0.0587 | 0.9400 | |
| 200 | 0.50 | 1.00 | 0.0969 | 0.0614 | 0.0557 | 0.9414 | 0.1060 | 0.0645 | 0.0561 | 0.9356 | |
|
| |||||||||||
| 200 | 0.75 | 0.25 | 0.1008 | 0.0829 | 0.0687 | 0.9446 | 0.1058 | 0.0881 | 0.0696 | 0.9418 | |
| 200 | 0.75 | 0.50 | 0.1059 | 0.0822 | 0.0687 | 0.9440 | 0.1002 | 0.0896 | 0.0687 | 0.9360 | |
| 200 | 0.75 | 0.75 | 0.1004 | 0.0775 | 0.0671 | 0.9346 | 0.1014 | 0.0942 | 0.0674 | 0.9390 | |
| 200 | 0.75 | 1.00 | 0.1008 | 0.0749 | 0.0643 | 0.9464 | 0.1028 | 0.0887 | 0.0655 | 0.9334 | |
|
| |||||||||||
| 500 | 0.25 | 0.25 | 0.1041 | 0.0490 | 0.0480 | 0.9442 | 0.0983 | 0.0500 | 0.0473 | 0.9380 | |
| 500 | 0.25 | 0.50 | 0.0984 | 0.0483 | 0.0465 | 0.9444 | 0.0928 | 0.0497 | 0.0470 | 0.9324 | |
| 500 | 0.25 | 0.75 | 0.0933 | 0.0468 | 0.0458 | 0.9428 | 0.0974 | 0.0476 | 0.0454 | 0.9476 | |
| 500 | 0.25 | 1.00 | 0.1014 | 0.0449 | 0.0441 | 0.9458 | 0.1003 | 0.0455 | 0.0440 | 0.9372 | |
|
| |||||||||||
| 500 | 0.50 | 0.25 | 0.0971 | 0.0423 | 0.0422 | 0.9436 | 0.0904 | 0.0431 | 0.0412 | 0.9358 | |
| 500 | 0.50 | 0.50 | 0.0915 | 0.0414 | 0.0404 | 0.9436 | 0.0958 | 0.0419 | 0.0406 | 0.9400 | |
| 500 | 0.50 | 0.75 | 0.0962 | 0.0397 | 0.0391 | 0.9492 | 0.0995 | 0.0401 | 0.0394 | 0.9426 | |
| 500 | 0.50 | 1.00 | 0.0961 | 0.0379 | 0.0383 | 0.9384 | 0.1056 | 0.0381 | 0.0374 | 0.9386 | |
|
| |||||||||||
| 500 | 0.75 | 0.25 | 0.0978 | 0.0495 | 0.0470 | 0.9448 | 0.1026 | 0.0501 | 0.0475 | 0.9358 | |
| 500 | 0.75 | 0.50 | 0.1028 | 0.0484 | 0.0468 | 0.9460 | 0.0970 | 0.0501 | 0.0469 | 0.9346 | |
| 500 | 0.75 | 0.75 | 0.0978 | 0.0469 | 0.0456 | 0.9456 | 0.0985 | 0.0488 | 0.0458 | 0.9356 | |
| 500 | 0.75 | 1.00 | 0.0989 | 0.0448 | 0.0437 | 0.9470 | 0.1023 | 0.0471 | 0.0432 | 0.9316 | |
= smooth mNRI; Coverage = Simulation coverage from 95% confidence interval
= average standard error for smooth mNRI
Sim se = Simulation standard error for smooth mNRI
n = Sample size within each simulation; ρ = Correlation between covariates (X, Z); π0 = Pr(Y = 1); μX = Population mean for X when Y = 1
TABLE 3.
Type 1 error for the NRI and the modified NRI test procedures using a single sample.
| ρ = 0 | ρ = 0.5 | ||||||
|---|---|---|---|---|---|---|---|
|
| |||||||
| n | π0 | μX | mNRI test |
NRI test |
mNRI test |
NRI test |
|
| 200 | 0.25 | 0.25 | 0.0494 | 0.0468 | 0.0496 | 0.0504 | |
| 0.50 | 0.0504 | 0.0578 | 0.0466 | 0.0582 | |||
| 0.75 | 0.0484 | 0.0776 | 0.0470 | 0.0724 | |||
| 1.00 | 0.0452 | 0.1046 | 0.0494 | 0.1034 | |||
|
| |||||||
| 0.50 | 0.25 | 0.0508 | 0.0574 | 0.0516 | 0.0678 | ||
| 0.50 | 0.0488 | 0.0820 | 0.0546 | 0.0804 | |||
| 0.75 | 0.0538 | 0.1028 | 0.0500 | 0.1008 | |||
| 1.00 | 0.0510 | 0.1242 | 0.0444 | 0.1276 | |||
|
| |||||||
| 0.75 | 0.25 | 0.0454 | 0.0466 | 0.0444 | 0.0466 | ||
| 0.50 | 0.0432 | 0.0588 | 0.0462 | 0.0568 | |||
| 0.75 | 0.0464 | 0.0756 | 0.0474 | 0.0832 | |||
| 1.00 | 0.0426 | 0.1032 | 0.0462 | 0.1036 | |||
|
| |||||||
| 500 | 0.25 | 0.25 | 0.0522 | 0.0630 | 0.0456 | 0.0604 | |
| 0.50 | 0.0468 | 0.0926 | 0.0510 | 0.1040 | |||
| 0.75 | 0.0496 | 0.1468 | 0.0480 | 0.1392 | |||
| 1.00 | 0.0572 | 0.2012 | 0.0502 | 0.1910 | |||
|
| |||||||
| 0.50 | 0.25 | 0.0596 | 0.0726 | 0.0494 | 0.0646 | ||
| 0.50 | 0.0494 | 0.1128 | 0.0478 | 0.1116 | |||
| 0.75 | 0.0462 | 0.1532 | 0.0482 | 0.1590 | |||
| 1.00 | 0.0582 | 0.2152 | 0.0506 | 0.2076 | |||
|
| |||||||
| 0.75 | 0.25 | 0.0470 | 0.0624 | 0.0480 | 0.0650 | ||
| 0.50 | 0.0480 | 0.0976 | 0.0504 | 0.0940 | |||
| 0.75 | 0.0470 | 0.1488 | 0.0506 | 0.1472 | |||
| 1.00 | 0.0474 | 0.1946 | 0.0490 | 0.1864 | |||
mNRI test = Modified NRI test with Theorem 3 reference distribution
NRI test = NRI test with normal reference distribution
n = Sample size within each simulation
ρ = Correlation between covariates (X, Z); π0 = Pr(Y = 1); μX = Population mean for X when Y = 1
TABLE 4.
Type 1 error for the NRI and the modified NRI test procedures using a training and an independent test sample.
| ρ = 0 | ρ = 0.5 | ||||||
|---|---|---|---|---|---|---|---|
|
| |||||||
| n | π0 | μX | mNRI test |
NRI test |
mNRI test |
NRI test |
|
|
| |||||||
| 200 | 0.25 | 0.25 | 0.0518 | 0.0492 | 0.0490 | 0.0454 | |
| 0.50 | 0.0500 | 0.0586 | 0.0528 | 0.0590 | |||
| 0.75 | 0.0506 | 0.0750 | 0.0498 | 0.0718 | |||
| 1.00 | 0.0524 | 0.1128 | 0.0468 | 0.1058 | |||
|
| |||||||
| 0.50 | 0.25 | 0.0500 | 0.0602 | 0.0508 | 0.0556 | ||
| 0.50 | 0.0456 | 0.0722 | 0.0534 | 0.0818 | |||
| 0.75 | 0.0496 | 0.0966 | 0.0486 | 0.0984 | |||
| 1.00 | 0.0568 | 0.1210 | 0.0484 | 0.1288 | |||
|
| |||||||
| 0.75 | 0.25 | 0.0516 | 0.0486 | 0.0532 | 0.0520 | ||
| 0.50 | 0.0496 | 0.0582 | 0.0478 | 0.0640 | |||
| 0.75 | 0.0492 | 0.0834 | 0.0516 | 0.0834 | |||
| 1.00 | 0.0560 | 0.1076 | 0.0484 | 0.1070 | |||
|
| |||||||
| 500 | 0.25 | 0.25 | 0.0510 | 0.0620 | 0.0528 | 0.0634 | |
| 0.50 | 0.0528 | 0.1060 | 0.0494 | 0.0932 | |||
| 0.75 | 0.0542 | 0.1340 | 0.0532 | 0.1448 | |||
| 1.00 | 0.0536 | 0.2012 | 0.0554 | 0.1862 | |||
|
| |||||||
| 0.50 | 0.25 | 0.0594 | 0.0684 | 0.0480 | 0.0654 | ||
| 0.50 | 0.0518 | 0.1068 | 0.0560 | 0.1140 | |||
| 0.75 | 0.0516 | 0.1510 | 0.0488 | 0.1528 | |||
| 1.00 | 0.0524 | 0.1924 | 0.0464 | 0.1922 | |||
|
| |||||||
| 0.75 | 0.25 | 0.0526 | 0.0628 | 0.0564 | 0.0610 | ||
| 0.50 | 0.0530 | 0.0980 | 0.0498 | 0.0998 | |||
| 0.75 | 0.0514 | 0.1476 | 0.0500 | 0.1302 | |||
| 1.00 | 0.0504 | 0.1894 | 0.0486 | 0.1862 | |||
mNRI test = Modified NRI test with Theorem 4 reference distribution
NRI test = NRI test with normal reference distribution
n = Sample size within each simulation; ρ = Correlation between covariates (X, Z); π0 = Pr(Y = 1); μX = Population mean for X when Y = 1
TABLE 5.
Power of the modified NRI and Wald test procedures.
| ρ = 0 | ρ = 0.5 | ||||||||
|---|---|---|---|---|---|---|---|---|---|
|
| |||||||||
| n | π0 | μX | μZ | mNRI test |
Wald test |
μZ | mNRI test |
Wald test |
|
| 200 | 0.25 | 0.25 | 0.50 | 0.8448 | 0.8466 | 0.55 | 0.8400 | 0.8422 | |
| 0.25 | 0.50 | 0.50 | 0.8408 | 0.8426 | 0.70 | 0.8592 | 0.8622 | ||
| 0.25 | 0.75 | 0.50 | 0.8038 | 0.8066 | 0.85 | 0.8818 | 0.8820 | ||
| 0.25 | 1.00 | 0.50 | 0.7678 | 0.7740 | 1.00 | 0.8834 | 0.8854 | ||
|
| |||||||||
| 0.50 | 0.25 | 0.50 | 0.9286 | 0.9288 | 0.55 | 0.9198 | 0.9202 | ||
| 0.50 | 0.50 | 0.50 | 0.9184 | 0.9190 | 0.65 | 0.8690 | 0.8698 | ||
| 0.50 | 0.75 | 0.50 | 0.8990 | 0.8998 | 0.80 | 0.8940 | 0.8944 | ||
| 0.50 | 1.00 | 0.50 | 0.8708 | 0.8716 | 0.95 | 0.8978 | 0.9000 | ||
|
| |||||||||
| 0.75 | 0.25 | 0.50 | 0.8438 | 0.8488 | 0.55 | 0.8386 | 0.8422 | ||
| 0.75 | 0.50 | 0.50 | 0.8358 | 0.8410 | 0.70 | 0.8658 | 0.8692 | ||
| 0.75 | 0.75 | 0.50 | 0.8118 | 0.8156 | 0.85 | 0.8790 | 0.8820 | ||
| 0.75 | 1.00 | 0.50 | 0.7796 | 0.7838 | 1.00 | 0.8862 | 0.8882 | ||
|
| |||||||||
| 500 | 0.25 | 0.25 | 0.30 | 0.8190 | 0.8196 | 0.40 | 0.8594 | 0.8598 | |
| 0.25 | 0.50 | 0.30 | 0.7996 | 0.8022 | 0.55 | 0.9060 | 0.9070 | ||
| 0.25 | 0.75 | 0.30 | 0.7854 | 0.7856 | 0.70 | 0.9318 | 0.9322 | ||
| 0.25 | 1.00 | 0.30 | 0.7506 | 0.7500 | 0.80 | 0.8540 | 0.8542 | ||
|
| |||||||||
| 0.50 | 0.25 | 0.30 | 0.9120 | 0.9114 | 0.35 | 0.8114 | 0.8104 | ||
| 0.50 | 0.50 | 0.30 | 0.8960 | 0.8958 | 0.50 | 0.8818 | 0.8816 | ||
| 0.50 | 0.75 | 0.30 | 0.8848 | 0.8852 | 0.65 | 0.9094 | 0.9092 | ||
| 0.50 | 1.00 | 0.30 | 0.8396 | 0.8408 | 0.75 | 0.8218 | 0.8214 | ||
|
| |||||||||
| 0.75 | 0.25 | 0.30 | 0.8206 | 0.8224 | 0.40 | 0.8576 | 0.8592 | ||
| 0.75 | 0.50 | 0.30 | 0.8078 | 0.8104 | 0.55 | 0.9106 | 0.9108 | ||
| 0.75 | 0.75 | 0.30 | 0.7832 | 0.7860 | 0.70 | 0.9312 | 0.9320 | ||
| 0.75 | 1.00 | 0.30 | 0.7482 | 0.7514 | 0.80 | 0.8604 | 0.8618 | ||
mNRI test = Modified NRI test with Theorem 4 reference distribution
Wald test = Test for γ = 0 in the extended model
n = Sample size within each simulation; ρ = Correlation between covariates (X, Z); μX = Population mean for X when Y = 1; μZ = Population mean for Z when Y = 1; π0 = Pr(Y = 1)
The results in Tables 1 and 2 were generated for true mNRI values equal to 0.05 and 0.10, respectively. The true mNRI values were determined from simulations with 50000 replicates, under the simulation structure detailed above, but using the known regression coefficient parameters. For the estimated smooth mNRI, the bandwidth was set equal to , where is the estimated standard deviation of , and the exponent was chosen to satisfy . When the true mNRI value is 0.05, the estimate has a small bias for , which improves when the sample size increases to 500. The mNRI estimate is relatively unbiased for all simulations when the true is 0.10. The same pattern occurs when comparing the average standard error for mNRI to its simulation standard error and the coverage of the asymptotic confidence interval.
For the single sample simulations in Table 3, using Theorem 4, the average type 1 error was 0.048 (n=200) and 0.050 (n=500). In contrast, applying the normal reference distribution in (7), produced average type 1 errors equal to 0.079 (n=200) and 0.129 (n=500). Similar results were found for the independent training-test sample simulations in Table 4. From Theorem 5, the average type 1 error was 0.051 (n=200) and 0.052 (n=500), whereas when using the normal reference distribution it was 0.079 (n=200) and 0.124 (n=500). These simulation results confirm that the modified NRI test statistics, with their associated reference distributions, are valid test procedures, and they confirm the poor operating characteristics of the asymptotic normal reference distribution, with divergence increasing with sample size. Table 5 provides the power calculations, and demonstrate that the power of the mNRI test statistic is comparable to the Wald test for regression coefficients associated with the new factors.
5. PROSTATE CANCER DATA
Patients with metastatic prostate cancer are by definition high risk. Nevertheless, there is significant variability in the survival times of these patients (Sayegh, Swami, and Agarwal, 2021). Given this heterogeneity, there is a pressing need to identify new biomarkers that can accurately assess patient risk. Historically, the use of prostate specific antigen (PSA) and other blood based biomarkers have produced risk models with only moderate calibration and discrimination in the metastatic prostate cancer setting (Gafita et al. 2021). As a result, exploring informative new biomarkers continues, with a recent focus around circulating tumor cells and serum testosterone (Cieslikowski et al. 2021; Ryan et al. 2019).
An application of the net reclassification improvement (NRI), based on the addition of circulating tumor cells and serum testosterone, was undertaken for metastatic prostate cancer patients treated on the control arm of a multicenter phase 3 randomized clinical trial (Saad et al. 2015). The control arm of the randomized trial, patients treated with steroids alone, is useful to assess the added prognostic utility of new biomarkers, because it approximates the natural history of the disease.
Four hundred and eighteen patients with a complete set of biomarkers and sufficient follow-up were used in the analysis. The binary endpoint was survival 24 months after the start of treatment. In this cohort, forty seven percent of the patients survived longer than two years. In addition to circulating tumor cells and serum testosterone, traditional biomarkers for metastatic prostate cancer were incorporated into the risk model. The complete set of eight biomarkers included in the analysis were: albumin, alkaline phosphatase, circulating tumor cells, Gleason score, hemoglobin, lactate dehydrogenase, prostate specific antigen, and serum testosterone. Nested logistic regression models were fit for the binary 24 month survival endpoint; the expanded model incorporated all eight biomarkers and the base model represented a subset of seven biomarkers. All biomarkers except Gleason score were continuous. To create greater flexibility in the models, a restricted cubic spline with four knots was fit to each continuous biomarker. The knots were located at the {0.05, 0.35, 0.65, 0.95} quantiles of each covariate. Gleason score, an ordinal variable ranging from 2–10, representing tumor complexity as determined by pathology, and was dichotomized as 1–7 and 8–10.
Table 6 summarizes the results of the NRI, mNRI, and the p-values generated from their respective test procedures described in Section 3. For the logistic models, the mNRI equates to a scaled mean absolute difference (MAD) between the estimated event probabilities
For the prostate data, the observed proportion of events was 0.47, and so the mNRI ≈ 2 × MAD.
TABLE 6.
NRI and modified NRI for the prostate data.
| Omitted factor | NRI | P-value NRI test |
mNRI | P-value mNRI test |
|---|---|---|---|---|
|
| ||||
| Albumin | 0.116 | 0.236 | 0.018 | 0.920 |
| Alkaline phosphatase | 0.336 | < 0.001 | 0.106 | 0.014 |
| Circulating tumor cells | 0.627 | < 0.001 | 0.190 | < 0.001 |
| Gleason score | 0.086 | 0.381 | 0.034 | 0.849 |
| Hemoglobin | 0.351 | < 0.001 | 0.088 | 0.020 |
| Lactate dehydrogenase | 0.027 | 0.787 | 0.056 | 0.322 |
| Prostate specific antigen | 0.359 | < 0.001 | 0.080 | 0.138 |
| Serum testosterone | 0.195 | 0.046 | 0.044 | 0.490 |
P-value NRI test = P-value generated from the NRI test procedure with a normal reference distribution
P-value mNRI test = P-value generated from the mNRI test procedure with the reference distribution specified in Theorem 4.
With the addition of serum testosterone, the mean absolute distance was only 0.022, and using the smooth mNRI, a test of whether the population NRI differed from zero generated a p-value equal to 0.490. Figure 1 provides corroborating evidence that adding serum testosterone does not meaningfully change the predicted event probabilities. An application of the NRI with a normal reference distribution (7), however, produced a p-value equal to 0.046, which mirrors the high false positive rate for the NRI found in the simulations. When the circulating tumor cell (CTC) biomarker was added to the risk model, the mean absolute difference between the estimated event probabilities was large and equal to 0.095, with an attending p-value less than 0.001. The addition of circulating tumor cells had a marked effect on the predicted probability of death within 24 months. This result is confirmed visually in Figure 2, where the estimated event probabilities change significantly from the base model to the expanded model due to the addition of CTC. Thus, the addition of CTC but not serum testosterone would consequentially change the predicted probabilities of surviving greater than 24 months. Furthermore, for other single variable deletions, only the addition of alkaline phosphatase and hemoglobin appreciably change the expanded model probabilities.
Figure 1:

Event probabilities for each individual estimated from the base model and the expanded model. The expanded model includes all eight biomarkers and the base model omits the biomarker serum testosterone. The symbols ‘o’ and ‘x’ represent individuals that survived 24 months from the start of treatment and those who did not.
Figure 2:

Event probabilities for each individual estimated from the base model and the expanded model. The expanded model includes all eight biomarkers, and the base model omits the biomarker circulating tumor cells. The symbols ‘o’ and ‘x’ represent individuals that survived 24 months from the start of treatment and those who did not.
6. DISCUSSION
The net reclassification improvement (NRI) statistic is a measure of change for a model based risk score due to the addition of new factors. Although the NRI is frequently applied, identified weaknesses of the statistic include that it is not a proper scoring function (or proper change score) and it does not produce a valid test procedure. A modification of this statistic (mNRI) corrects these deficiencies. The mNRI can be interpreted as a measure of association between the directional change in the risk score and the base model score residual. In the special but frequently applied case of logistic regression, an asymptotic analysis demonstrates that the mNRI is proportional to a mean absolute deviation measure, putting the mNRI on an easily interpretable difference in probability scale.
There remain, however, some concerns with the NRI that are not resolved through the mNRI (Kerr et al. 2014). The mNRI does not include risk thresholds for the purpose of intervention strategies, and therefore does not include the costs and benefits of a risk threshold guided intervention. As a result, its application should be directed to the model development stage. On this topic, there has been significant discussion surrounding the utility of the NRI, and even with the modification proposed here, the debate will almost surely continue. The contribution of this work is to put the statistic on a stronger statistical foundation and to clear away some of the arguments that obscure its properties, perhaps shedding more light and less heat on this measure.
Highlights.
A modification of the Net Reclassification Improvement statistic can produce a proper score.
This modified statistic has an interpretation as a mean absolute deviation measure.
The modification can also produce a valid test statistic in contrast to the conventional Net Reclassification Improvement test statistic methodology.
This testing approach can be used with either single samples or independent training-test samples.
ACKNOWLEDGEMENTS.
This work was supported by NIH Grants R01CA207220 and P30CA008748.
The author thanks an associate editor and the reviewers for comments that improved the content of this manuscript.
APPENDIX: PROOF OF THEOREMS
The following conditions and notation will be used in the appendix.
(C1). The set of binary response nested models
specify the relationship between the -dimensional existing factors , the -dimensional new factors , and the binary event outcome . The model with no covariates is the constant model, alone is the base model and is the expanded model. The inverse link function is known and monotonically increasing. Throughout this work, random variables are represented with upper case, their observed copies are written in lower case, and vectors are indicated in bold.
(C2). The log-likelihood used to estimate the regression coefficients is
where are independent identically distributed copies of . For , the expanded model maximum likelihood estimate is denoted by , and the two sets of restricted maximum likelihood estimates are for the base model, and , which is equal to the mean number of events , for the constant model.
(C3). The score vector, observed information matrix, and expected information matrix for are partitioned as
For the likelihood evaluation under the restriction , we use the notation
For all evaluations, the elements of the inverse information matrix are denoted with superscripts. For example, the upper left submatrix of the inverse of is represented as .
(C4). The likelihood parameterization will be utilized, where is the risk score and the corresponding score residual is
which will be useful to rewrite as
It is assumed that the score residual is bounded over .
Theorem 1. The modified NRI (mNRI) is a proper change score
Proof of Theorem 1
For a single random variable, the modified NRI with the base and constant model parameters evaluated at their true value is
Its expected value is equal to
where is a component of the score residual in (C4) evaluated under the base model.
To show is maximized at , and therefore the modified NRI is a proper change score, consider
This expectation is evaluated under two cases:
Case (i):
The first term in square brackets, , is non-negative due to the monotonicity of , and the second term in square brackets, the difference in indicator functions, is either 0 or 1. Therefore, since the weight function is positive, the expectation is non-negative for any .
Case (ii): .
Under this constraint, the first term in square brackets is negative and the second term in square brackets is either 0 or −1. It follows that the expectation is again non-negative for any .
Combining these two cases, is maximized at and therefore, the modified NRI is a proper change score.
Theorem 2.
Let represent a normal distribution function with scale parameter , which is chosen so that as , and . For
Then
Theorem 2a was proved in Lemma A.1 of Heller (2007).
Proof of Theorem 2b
Consider
From Theorem 2a, under the condition , the first term is . Focusing on the second term, a first order expansion of around , and the asymptotic identity (Cox and Hinkley 1974)
results in
Where
Theorem 2b follows from the central limit theorem.
Theorem 3.
Assume the covariate vectors and have dimension and , respectively. If , then the smooth NRI test statistic
The first term is the inner product of two positively correlated, q-dimensional, mean zero normal random vectors, and the second term is bilinear, where is a dimensional random vector with quadratic components, and is a dimensional constant vector.
Proof of Theorem 3:
The smooth NRI test statistic is
To determine its null reference distribution, Pepe et al. (2013) demonstrate that for correctly specified nested models iff and . This allows consideration of a second order Taylor expansion of around
| (A.1) |
where represents the standard normal density function evaluated at 0, and since its derivative evaluated at zero, , each element of the matrix in the quadratic term of the expansion is equal to 0.
To further simplify, note that
| (A.2) |
which follows from a second order Taylor series approximation of the score statistic (C3), around , with
Substituting (A.2) into (A.1),
| (A.3) |
To obtain the result in Theorem 3, consider the elements in (A.3),
The remaining element is
First, under the null
which is rewritten as
| (A.4) |
where , and
The motivation for the weight comes from the Bernoulli loglikelihood (C2, C3)
and the recognition that
a matrix of zeros.
Therefore by projection theory (Tsiatis 2006),
and so the expectation in (A.4) is equal to zero.
It follows from the central limit theorem,
In addition, from (A.5) in Theorem is strongly positively correlated with .
Theorem 3 now results from assembling the elements in (A.3).
Theorem 4.
Assume the binary regression models in (C1) are properly specified and the covariate vectors and have dimension and , respectively. If , then
where is the standard normal density function evaluated at 0, and is a chi-square random variable with degrees of freedom.
Proof of Theorem 4:
The mNRI test statistic is,
where is the score residual defined in (C4).
From Pepe et al. (2013), since and , a second order Taylor expansion of around results in
This approximation may be further simplified through the recognition that is the efficient score statistic for estimating in the presence of and evaluated under the constraint . It follows that (Bickel, Klassen, Ritov, and Wellner, 1993)
| (A.5) |
and therefore,
That is,
where and is a chi-square random variable with degrees of freedom.
Theorem 5.
Assume the binary regression models for the training and test data have the same specification and are given in (C1), where the covariate vector has dimension and the covariate vector has dimension . Denote the estimated regression coefficients from the training data by , the coefficients from the test data by , and the data are drawn from the test sample. If ,
where is defined in Theorem are independent chi-square random variables each with one degree of freedom, and represent eigenvalues determined from the product matrix , where
and
Proof of Theorem 5:
The test statistic for the NRI derived from training and test data are
Employing the arguments provided in the proof of Theorem 4, the smooth mNRI may be asymptotically approximated by
The test statistic is bilinear, due to the different coefficient estimates from the training and test data. This statistic may be transformed to the quadratic
It follows from Baldessari (1967) that as ,
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
REFERENCES
- Baldessari B (1967), ”The Distribution of a Quadratic Form of Normal Random Variables,” Annals of Mathematical Statistics, 38, 1700–1704. [Google Scholar]
- Bickel PJ, Klaassen CAJ, Ritov Y, and Wellner JA (1993), Efficient and Adaptive Estimation for Semiparametric Models, The Johns Hopkins University Press. [Google Scholar]
- Cieslikowski WA, Antczak A, Nowicki M, Zabel M, Budna-Tukan J (2021), ”Clinical Relevance of Circulating Tumor Cells in Prostate Cancer Management,” Biomedicines, 9, 1179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cox DR and Hinkley DV (1974), Theoretical Statistics. Chapman and Hall. [Google Scholar]
- Demler OV, Pencina MJ, Cook NR, and D’Agostino RB Sr (2017), ”Asymptotic distribution of ΔAUC, NRIs, and IDI based on theory of U-statistics, ”Statistics in Medicine, 36, 3334–3360. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gafita A, Calais J, Grogan TR, Hadaschik B, Wang H, Weber M, Sandhu S, Kratochwil C, Esfandiari R, Tauber R, Zeldin A, Rathke H, Armstrong WR, Robertson A, Thin P, D’Alessandria C, Rettig MB, Delpassand ES, Haberkorn U, Elashoff D, Herrmann K, Czernin J, Hofman MS, Fendler WP, Eiber M (2021), ”Nomograms to predict outcomes after 177 Lu-PSMA therapy in men with metastatic castration-resistant prostate cancer: an international, multicentre, retrospective study, ”Lancet Oncology, 22, 1115–1125. [DOI] [PubMed] [Google Scholar]
- Gerds TA and Kattan MW (2021), Medical Risk Prediction Models With Ties to Machine Learning. CRC Press. [Google Scholar]
- Gneiting T and Raftery AE (2007), ”Strictly proper scoring rules, prediction, and estimation,” Journal of The American Statistical Association, 102, 359–378. [Google Scholar]
- Heller G (2007), ”Smoothed rank regression with censored data,” Journal of The American Statistical Association, 552–559. [Google Scholar]
- Horowitz JL (1998), Semiparametric Methods in Econometrics. Springer-Verlag. [Google Scholar]
- Kerr KF, McClelland RL, Brown ER, and Lumley T (2011), ”Evaluating the Incremental Value of New Biomarkers With Integrated Discrimination Improvement,” American Journal of Epidemiology, 174, 364–374. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kerr KF, Wang Z, Janes H, McClelland RL, Psaty BM, and Pepe MS (2014), ”Net reclassification indices for evaluating risk-prediction instruments: A critical review,” Epidemiology, 25, 114–121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Manski CF (1985), ”Semiparametric analysis of discrete response: Asymptotic properties of the maximum score estimator,” Journal of Econometrics, 27, 313–333. [Google Scholar]
- McCullagh P and Nelder JA (1983), Generalized Linear Models. Chapman and Hall. [Google Scholar]
- Pencina MJ, D’Agostino RB Sr, D’Agostino RD Jr, and Vasan R (2008), ”Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond,” Statistics in Medicine, 27, 157–172. [DOI] [PubMed] [Google Scholar]
- Pencina MJ, D’Agostino RB Sr, and Steyerberg EW (2011), ”Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers,” Statistics in Medicine, 30, 11–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pencina MJ, D’Agostino RB Sr, and Demler OV (2012), ”Novel metrics for evaluating improvement in discrimination: net reclassification and integrated discrimination improvement for normal variables and nested models,” Statistics in Medicine, 31, 101–113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pepe MS, Fan J, Feng Z, Gerds T, and Hilden J (2015), ”The net reclassification index (NRI): A misleading measure of prediction improvement even with independent test data sets,” Statistics in Biosciences, 7, 282–295. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pepe MS, Janes H, and Li CI (2014), Net risk reclassification p values: Valid or misleading? Journal of the National Cancer Institute, 106, 1–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pepe MS, Kerr KF, Longton G, and Wang Z (2013), ”Testing for improvement in prediction model performance,” Statistics in Medicine, 32, 1467–1482. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ryan CJ, Dutta S, Kelly WK, Russell C, Small EJ, Morris MJ, Taplin ME, Halabi S (2020), ”Androgen Decline and Survival During Docetaxel Therapy in Metastatic Castration Resistant Prostate Cancer (mCRPC),” Prostate Cancer and Prostatic Disease, 23, 66–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Saad F, Fizazi K, Jinga V, Efstathiou E, Fong PC, Hart LL, Jones R, McDermott R, Wirth M, Suzuki K, MacLean DB, Wang L, Akaza H, Nelson J, Scher HI, Dreicer R, Webb IJ, de Wit R ELM-PC 4 investigators. (2015), ”Orteronel plus prednisone in patients with chemotherapy naive metastatic castration-resistant prostate cancer (ELM-PC 4): a double-blind, multicentre, phase 3, randomised, placebo-controlled trial,” Lancet Oncology, 16, 338–348. [DOI] [PubMed] [Google Scholar]
- Sayegh N, Swami U, and Agarwal N (2021), ”Recent Advances in the Management of Metastatic Prostate Cancer,” JCO Oncology Practice, 18, 45–55. [DOI] [PubMed] [Google Scholar]
- Tsiatis AA (2006), Semiparametric Theory and Missing Data. Springer-Verlag. [Google Scholar]
