Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Dec 1.
Published in final edited form as: J Stat Plan Inference. 2023 Mar 11;227:18–33. doi: 10.1016/j.jspi.2023.03.001

A Modified Net Reclassification Improvement Statistic

Glenn Heller 1
PMCID: PMC10079138  NIHMSID: NIHMS1882642  PMID: 37035267

Abstract

The continuous net reclassification improvement (NRI) statistic is a popular model change measure that was developed to assess the incremental value of new factors in a risk prediction model. Two prominent statistical issues identified in the literature call the utility of this measure into question: (1) it is not a proper scoring function and (2) it has a high false positive rate when testing whether new factors contribute to the risk model. For binary response regression models, these subjects are interrogated and a modification of the continuous NRI, guided by the likelihood-based score residual, is proposed to address these issues. Within a nested model framework, the modified NRI may be viewed as a distance measure between two risk models. An application of the modified NRI is illustrated using prostate cancer data.

Keywords: Binary response model, L1 distance, Nested models, Proper score, Valid test

1. INTRODUCTION

In the clinical setting, individual risk assessment is often derived through a regression model, which incorporates a combination of risk factors due to biological complexity. These risk models are used in forecasting future health outcomes of an individual such as treatment response or survival. The quality of the risk model, evaluated using statistical measures such as calibration, discrimination, explained variation, and likelihood based, reflects the level of confidence in the forecast (Gerds and Kattan 2021). When the objective is to incorporate a new set of factors to an existing risk model, assessing the impact of these new factors on the forecast is critical. For binary response regression, a discrimination measure, the net reclassification improvement (NRI), is one statistic used for this evaluative process. The NRI, also referred to as the net reclassification index, was developed to ascertain whether the introduction of new risk factors move a model derived forecast in a direction consonant with the binary response outcome (Pencina et al. 2008).

The NRI statistic has been criticized on numerous grounds with two prevailing points of contention. First, the NRI is not a proper scoring function. A statistic, or more generally an objective function, is called proper if it attains its minimum or maximum when the data generating process is properly specified. The use of a proper score statistic rewards the analyst for correctly identifying the data generation mechanism (Gneiting and Raftery, 2007). The second point of concern is that the NRI has a high false positive rate when testing whether the new factors contribute to the risk model, even in situations that include independent training and test datasets (Kerr et al. 2014 and Pepe et al. 2014, 2015). As a practical matter, measures with high false positive rates lead to the introduction of irrelevant factors into the model development process.

Despite these critiques, the NRI is a popular statistic, and in the three-year time period 2019–2021, it was cited in PubMed over 800 times. The purpose of this work is to elucidate the methodology underlying these two concerns and to propose a likelihood guided modification to the NRI to rectify these issues.

The NRI is defined through a series of nested regression models, where it is assumed that the existing factors alone (x), which includes a constant for the intercept term, or combined with new factors (z) are modeled as

Pr(Y=1)=GβPr(Y=1x)=Gβ0TxPr(Y=1x,z)=Gβ0Tx+γ0Tz (1)

where Y is a binary outcome denoted as event (Y=1) or non-event (Y=0), G is a known monotone inverse link from a generalized linear model (McCullagh and Nelder, 1983), β0Tx is the base model risk score, β0Tx+γ0Tz is the expanded model risk score, and π0=Gβ is the constant model risk score. Throughout this work, random variables are represented with upper case, their observed copies are written in lower case, and vectors are indicated in bold.

The log-likelihood used to estimate the model parameters is

Lβ,γ=iyilogGβTxi+γTzi+1yilog1GβTxi+γTzi,

where yi,xi,zi,i=1,,n are independent identically distributed copies of (Y,X,Z). The maximum likelihood estimates from the three models are represented as: θˆ=(βˆ,γˆ), θˆ0=(βˆ0,0), and πˆ=y, the observed proportion of events.

Historically, the NRI was developed under the assumption that the base model risk score could be placed in risk classification categories. It was a measure of whether the expanded model risk score, due to the addition of new factors, would move into higher risk categories for subjects with an event and into lower risk categories for subjects without an event. This framework, however, requires apriori clinically meaningful risk categories, which are often not apparent at the time of analysis, particularly in the early stage of model development. As a result, the continuous NRI was developed (Pencina et al. 2011) and it is this measure that is the focus of this work.

The population NRI is defined as

ρ(θ0;θ0;π0)=2{Pr(β0TX+γ0TZβ0TXY=1)Pr(β0TX+γ0TZβ0TXY=0)}

where θ0=β0,γ0 and θ0=β0,0. When multiplied by 1/2, the population NRI is estimated as

Rn(θˆ;θˆ0;πˆ)=[ny1y]1iyiyI(βˆTxi+γˆTziβˆ0Txi>0)12. (2)

Assuming at least one component of x is continuous, it can be asserted without loss of generality, that the indicator function can be extended as

I(u>0)=1ifu>0I(u>0)=12ifu=0I(u>0)=0ifu<0. (3)

Although the net reclassification improvement statistic is a frequently applied model change measure, its lack of propriety and high false positive rate are problematic. In Section 2, a modified NRI (mNRI) is developed that satisfies the concept of a proper change score, which adapts the proper scoring principle to model change measures (Pepe et al. 2015). Section 3 demonstrates that a smooth version of the mNRI provides a valid test procedure when the population NRI is zero. This result is established in the single sample and the independent training and test data case. In Section 4, a prostate cancer data example is used to illustrate these concepts and Section 5 contains a discussion.

2. THE mNRI IS A PROPER CHANGE SCORE

For a correctly specified parametric risk model, a performance measure is a proper score if its expected value is minimized/maximized at the true model parameter value (Gneiting and Raftery, 2007). For example, the expected value of the Brier score applied to the expanded model

EYGβTX+γTZ2,

is minimized at (β,γ)=β0,γ0. If a performance measure is not a proper score, then the analyst may find inconsistent parameter estimates that make the measure look better. Population performance measures such as the expected value of the area under the curve (AUC), the Brier score (BS), and Kullback-Leibler divergence (KL), are maximized/minimized at their true parameter values and therefore are proper scores.

Proper scoring is more difficult to achieve for model change measures. Consider the case where a performance measure M is applied separately to the expanded model and the base model, and the change measure is

ΔMb,g;b0=M(b,g)Mb0

If the performance measure (M) is convex,

β0,γ0=argmin(b,g)E[M(b,g)]β0=argminb0EMb0,

but the difference of two convex functions is not necessarily convex, and in general,

β0,γ0,β0argminb,g,b0EΔMb,g;b0.

To adapt proper scoring to change measures, Pepe et al. (2015) orient the model parameter space so that the base model is evaluated at the true parameter β0. Based on their definition, a measure is a proper change score if its expected value is minimized/maximized at the true expanded model parameter value. In this setting, ΔM is termed a proper change score, since

β0,γ0=argmin(b,g)EΔMb,g;β0,

recreating the single model evaluation. The term proper change score is used here to acknowledge the adaptation of the proper scoring principle to change measures. Under this definition, ΔAUC,ΔBS, and ΔKL are proper change scores.

The NRI differs from other change measures because it is a statistic based on within subject change and not between model change as above. In addition, the statistic is composed of parameter estimates from three nested models. As a result, it is not covered under the previous argument. To satisfy the proper change score criterion, the NRI is modified

Tn(θˆ;θˆ0;πˆ)=[ny(1y)]1ir(β0ˆTxi)I(βˆTxi+γˆTziβ0ˆTxi>0)12, (4)

which is constructed by replacing the constant model score residual yy in (2) with the base model score residual r(β0ˆTx), where

r(β0Tx)=G(β0Tx)(β0Tx)G(β0Tx)(1G)β0Tx))1yiG(β0Tx).

The modified NRI (mNRI) is closely akin to the maximum score statistic and the least absolute deviation statistic (Manski 1985, Horowitz 1998), which provide the framework for the derivation in Theorem 1.

Theorem 1.

Consider the mNRI scoring function derived from a single random variable, with the base and constant model parameters given

T1θ;θ0;π0=π01π01r(β0TX)I(βTX+γTZβ0TX>0)12.

The mNRI scoring function is a proper change score,

ET1θ0;θ0;π0ET1θ;θ0;π0foranyθ=β,γ.

The theorem is proved in the appendix.

An interpretation of the mNRI statistic is obtained by rewriting it as

Tn(θˆ;θ0ˆ,πˆ)=[2y(1y)]1[s(θˆ;θ0ˆ)]T[r(θ0ˆ)][s(θˆ;θ0ˆ)]T[s(θˆ;θ0ˆ)]

where r(θ0ˆ)=[r(β0ˆTx1),,r(β0ˆTxn)] is the base model score residual vector and s(θˆ;θ0ˆ) is a sign vector with subject components si(θˆ;θ0ˆ)=2I(βˆTxi+γˆTziβ0ˆTxi>0)1. The mNRI is a function of the propensity of the event outcome (y) and a regression coefficient representing the association between the direction of the risk score due to adding z and the event outcome after taking into account x. This perspective is analogous to a partial residual plot, where a model covariate of interest z is replaced by a between model directional covariate s(θˆ;θ0ˆ).

An alternative interpretation of the mNRI may be considered from the viewpoint of its limiting value

τθ0;θ0;π0=2π01π01EX,Zh(β0TX)Gβ0TX+γ0TZG(β0TX) (5)

where the weight h(β0TX) stems from the base model score residual,

h(β0TX)=G(β0Tx)(β0Tx)G(β0Tx)(1G(β0Tx))1r(β0TX)=h(β0TX)[yiG(β0TX)].

Thus, the population mNRI is a weighted L1 distance measure between the nested event probabilities. An important special case occurs when G is logistic and

τθ0;θ0;π0=2π01π01EX,ZGβ0TX+γ0TZG(β0TX),

which results in an unweighted L1 distance measure. Here, the population mNRI is proportional to the mean absolute deviation (MAD) of the nested event probabilities. In addition to using the MAD as a summary measure, this result suggests that graphical insight into the mNRI may be obtained by plotting the base model event probability estimates by the expanded model event probability estimates.

Inference with the modified NRI (4) is complicated due to the presence of the indicator function. To proceed, a smooth version of the mNRI is

Tn,hS(θˆ;;θˆ0;πˆ)=[ny1y]1ir(β0ˆTxi)ΦβˆTxi+γˆTziβ0ˆTxih12, (6)

where Φ(u/h) is a normal distribution function with scale parameter (bandwidth) h that goes to zero as n gets large. The smooth NRI is employed in Theorem 2 below. This theorem provides the inferential framework through a two-step process. First, it demonstrates the asymptotic equivalence between the mNRI and its smooth counterpart. Second, via this asymptotic equivalence, the asymptotic distribution of the mNRI is derived.

Theorem 2.

Consider Tn,Tn,hS, and τ defined in (4), (6), and (5), with τ=0. Assume the scale parameter h in the smooth mNRI (6) is chosen so that as n,nh, and nh40. Then

  1. Tn(θˆ;θˆ0;πˆ)=Tn,hS(θˆ;θˆ0;πˆ)+Oph2

  2. n1/2[Tn(θˆ;θˆ0;πˆ)τθ0,θ0,π]DN(0,v).

Theorem 2(a) was demonstrated in Heller (2007). Theorem 2(b) is proved in the appendix.

3. THE NRI FALSE POSITIVE RATE

Empirical research on the utility of the NRI has raised questions as to whether it has an unacceptably high false positive rate, signifying a larger than anticipated value when the new factors have no effect on the binary response (Kerr et al. 2014 and Pepe et al. 2014, 2015). In this section, this issue is investigated, and a valid test procedure is developed, both in the case of a single sample and when independent training and test samples are included.

Pencina et al. (2008) state that under the null ρθ0;θ0;π0=0, the asymptotic distribution of the estimated NRI in (2) is

n1/2Rn(θˆ;θˆ0;πˆ)DN[0,A] (7)

where accounting for the multiplication by 1/2, the asymptotic variance is estimated as Aˆ=[4n11+4n01]. Further work by Pencina et al. (2011, 2012) modified the asymptotic variance calculation. In a series of simulation experiments, Kerr et al. (2014) and Pepe et al. (2014, 2015) evaluated the adequacy of this result, using a conditional binormal model to produce nested logistic regression models. They found that on average, under the null, the NRI estimate was positive and that the type 1 error rate using the asymptotic normal reference distribution was as high as 0.63 (Pepe et al. 2014). Additional simulations that incorporated independent training and test datasets produced similar conclusions. Taken in total, these results represent a critical indictment against the test procedure in (7). A problem, recognized by these authors, and Demler et al. (2017), is that the asymptotic normal reference distribution is incorrect.

Consider a smooth NRI

RnS(θˆ;θˆ0;πˆ)=[ny(1y)]1iyiyΦ(βˆTxi+γˆTziβ0ˆTxi)12, (8)

where similar to Theorem 2, the extended indicator function is replaced by a continuous normal distribution function Φ(u). The purpose of this smoothing is to facilitate the derivation of the asymptotic null reference distribution. Interestingly, in contrast to the smoothing result in Section 2, under the null ρθ0;θ0;π0=0, one can essentially set the scale parameter h=1 in the smooth NRI test statistic. This is due to the convergence properties of the maximum likelihood estimates under the null (Pepe et al. 2013), γˆp0,βˆβˆ0p0. By setting h=1,

ΦβˆTx+γˆTzβ0ˆTxhp12,

creating an asymptotically valid test statistic at a faster rate than allowing h0 with increasing n.

Theorem 3.

Assume the binary response regression models in (1) are properly specified and the covariate vectors x and z have dimension p and q, respectively. If ρθ0;θ0;π0=0, then

nRnS(θˆ;θˆ0;πˆ)DNq0,V1TNq0,V2+12DpTββ1cp.

Theorem 3 is derived in the appendix. The first term is the inner product of two positively correlated, q-dimensional, mean zero normal random vectors, and the second term is bilinear, where Dp is a p dimensional random vector with quadratic components, and cp is a p dimensional constant vector. This result demonstrates that the null distribution of the NRI is not normal, the distribution is not symmetric about zero, and in general, does not have mean zero, which explains the anomalous findings in Kerr et al. (2014) and Pepe et al. (2014, 2015).

The reference distribution for the NRI test statistic RnS(θˆ;θˆ0;πˆ) is complex and difficult to apply. In contrast, the mNRI test statistic

TnS(θˆ;θˆ0;πˆ)=[ny(1y)]1ir(β0ˆTxi)Φ(βˆTxi+γˆTziβ0ˆTxi)12

has a straightforward null reference distribution.

Theorem 4.

Assume the binary regression models in (1) are properly specified and the covariate vectors x and z have dimension p and q, respectively. If ρθ0;θ0;π0=0, then

nTnS(θˆ;θˆ0;πˆ)Dkχq2,

where k=ϕ(0)π01π01,ϕ(0) is the standard normal density function evaluated at 0, and χq2 is a chi-square random variable with q degrees of freedom. A proof of this result is found in the appendix.

Theorems 3 and 4 reorient one’s understanding of what constitutes meaningful NRI and mNRI statistics and Theorem 4 provides an uncomplicated metric to test the mNRI distance from zero. If the new clinical factors (z) are noise, then small positive values are simply random variation under the null, and only large positive values, as determined by the scaled chi-square reference distribution, are considered meaningful. A precursor to this result is found in Kerr et al. (2011).

Theorem 4 covers the single sample case. Alternatively, the test statistic may be constructed from two independent data sets from the same population, where the regression coefficients are estimated from the training data (θˆ,θˆ0) and the test data (θ˜,θ˜0), and the data for the test statistic yi,xi,zi are drawn from the independent test data. Under these conditions, the reference distribution for the smooth mNRI test statistic

TnS(θˆ;θˆ0,θ˜0;π˜)=[ny(1y)]1ir(β0~Txi)Φ(βˆTxi+γˆTziβ0ˆTxi)12

is provided in Theorem 5.

Theorem 5.

Assume the binary regression models for the training and test data have the same specification and are given in (1), where the covariate vector x has dimension p and the covariate vector z has dimension q. If ρθ0;θ0;π0=0,

nTnS(θˆ;θˆ0,θ˜0;π˜)Dk2j=12qλjχj2,

where k is defined in Theorem 4, χj2 are independent chi-square random variables each with one degree of freedom, and λj represent eigenvalues determined from the product matrix VC (Baldessari 1967), where

V=var(γ˜)00var(γˆ)C=0DD0

and D=[var(γ˜)]1. The details are provided in the appendix.

4. SIMULATION STUDIES

Simulation studies were performed to assess the adequacy of the asymptotic distribution of the mNRI derived in Theorem 2 and the validity of the reference distributions derived in Theorems 4 and 5. A conditional bivariate normal covariate distribution was used to generate nested logistic risk models. The conditioning variable was the event status with Pr(Y = 1) = {0.25, 0.50, 0.75}. The bivariate normal had a common variance-covariance matrix across event status, with correlation parameters 0 or 0.5. The mean of (X, Z) was 0 for Y = 0. The mean of X for Y = 1 was {0.25, 0.50, 0.75, 1.0}. The mean of Z for Y = 1 was chosen to produce specified true mNRI values. Simulations with 200 and 500 observations per replicate were conducted. Five thousand replicates were run for each simulation.

Tables 1 and 2 estimate the mNRI, and use Theorem 2 to compute the asymptotic standard error of this estimate along with its coverage based on an asymptotic 95% confidence interval. Tables 3 and 4 compare the size estimates for the mNRI reference distributions in Theorems 4 and 5 with the NRI normal reference distribution in (7). The nominal type 1 error in all simulations was 0.05. Power simulations are summarized in Table 5, comparing the mNRI test from Theorem 4 with the Wald test for regression coefficients associated with the new factors.

TABLE 1.

Simulation results for the modified NRI (mNRI). The true mNRI is equal to 0.05 based on 50,000 replicates.

ρ = 0 ρ = 0.5

n π 0 μX Tn,hS se(Tn,hS) Sim se Coverage Tn,hS se(Tn,hS) Sim se Coverage

200 0.25 0.25 0.0663 0.0908 0.0613 0.9624 0.0649 0.1037 0.0607 0.9654
200 0.25 0.50 0.0642 0.0905 0.0600 0.9634 0.0625 0.1053 0.0586 0.9682
200 0.25 0.75 0.0608 0.0910 0.0573 0.9608 0.0632 0.1001 0.0582 0.9598
200 0.25 1.00 0.0637 0.0824 0.0570 0.9580 0.0653 0.0932 0.0566 0.9654

200 0.50 0.25 0.0577 0.0784 0.0531 0.9644 0.0563 0.0881 0.0516 0.9660
200 0.50 0.50 0.0559 0.0785 0.0508 0.9636 0.0579 0.0852 0.0524 0.9618
200 0.50 0.75 0.0569 0.0735 0.0508 0.9594 0.0600 0.0837 0.0517 0.9628
200 0.50 1.00 0.0553 0.0705 0.0487 0.9550 0.0630 0.0792 0.0507 0.9546

200 0.75 0.25 0.0643 0.0947 0.0590 0.9660 0.0680 0.1037 0.0601 0.9620
200 0.75 0.50 0.0674 0.0918 0.0591 0.9650 0.0640 0.1071 0.0597 0.9594
200 0.75 0.75 0.0628 0.0897 0.0585 0.9588 0.0645 0.1113 0.0583 0.9568
200 0.75 1.00 0.0627 0.0840 0.0560 0.9640 0.0654 0.1082 0.0576 0.9516

500 0.25 0.25 0.0562 0.0545 0.0434 0.9498 0.0518 0.0583 0.0428 0.9484
500 0.25 0.50 0.0518 0.0544 0.0422 0.9484 0.0512 0.0577 0.0412 0.9558
500 0.25 0.75 0.0499 0.0518 0.0403 0.9506 0.0524 0.0544 0.0407 0.9540
500 0.25 1.00 0.0556 0.0490 0.0407 0.9450 0.0548 0.0527 0.0409 0.9460

500 0.50 0.25 0.0496 0.0456 0.0383 0.9406 0.0444 0.0509 0.0363 0.9520
500 0.50 0.50 0.0446 0.0463 0.0358 0.9468 0.0497 0.0477 0.0371 0.9434
500 0.50 0.75 0.0494 0.0433 0.0362 0.9414 0.0523 0.0451 0.0366 0.9462
500 0.50 1.00 0.0487 0.0410 0.0353 0.9384 0.0576 0.0421 0.0359 0.9354

500 0.75 0.25 0.0520 0.0550 0.0422 0.9516 0.0556 0.0581 0.0430 0.9514
500 0.75 0.50 0.0550 0.0533 0.0424 0.9502 0.0520 0.0595 0.0423 0.9478
500 0.75 0.75 0.0519 0.0514 0.0414 0.9460 0.0527 0.0574 0.0417 0.9474
500 0.75 1.00 0.0535 0.0497 0.0398 0.9548 0.0571 0.0561 0.0405 0.9430

Tn,hS = smooth mNRI; Coverage = Simulation coverage from 95% confidence interval

se(Tn,hS) = average standard error for smooth mNRI

Sim se = Simulation standard error for smooth mNRI

n = Sample size within each simulation ρ = Correlation between covariates (X, Z); π0 = Pr(Y = 1); μX = Population mean for X when Y = 1

TABLE 2.

Simulation results for the modified NRI (mNRI). The true mNRI is equal to 0.10 based on 50,000 replicates.

ρ = 0 ρ = 0.5

n π 0 μX Tn,hS se(Tn,hS) Sim se Coverage Tn,hS se(Tn,hS) Sim se Coverage

200 0.25 0.25 0.1053 0.0834 0.0701 0.9404 0.1019 0.0890 0.0695 0.9464
200 0.25 0.50 0.1019 0.0823 0.0684 0.9444 0.0957 0.0896 0.0674 0.9462
200 0.25 0.75 0.0956 0.0797 0.0661 0.9372 0.1002 0.0859 0.0672 0.9436
200 0.25 1.00 0.1021 0.0754 0.0648 0.9394 0.1029 f 0.0801 0.0647 0.9464

200 0.50 0.25 0.0984 0.0698 0.0606 0.9470 0.0932 0.0754 0.0608 0.9370
200 0.50 0.50 0.0941 0.0691 0.0599 0.9418 0.0963 0.0722 0.0601 0.9388
200 0.50 0.75 0.0963 0.0649 0.0584 0.9388 0.1011 0.0696 0.0587 0.9400
200 0.50 1.00 0.0969 0.0614 0.0557 0.9414 0.1060 0.0645 0.0561 0.9356

200 0.75 0.25 0.1008 0.0829 0.0687 0.9446 0.1058 0.0881 0.0696 0.9418
200 0.75 0.50 0.1059 0.0822 0.0687 0.9440 0.1002 0.0896 0.0687 0.9360
200 0.75 0.75 0.1004 0.0775 0.0671 0.9346 0.1014 0.0942 0.0674 0.9390
200 0.75 1.00 0.1008 0.0749 0.0643 0.9464 0.1028 0.0887 0.0655 0.9334

500 0.25 0.25 0.1041 0.0490 0.0480 0.9442 0.0983 0.0500 0.0473 0.9380
500 0.25 0.50 0.0984 0.0483 0.0465 0.9444 0.0928 0.0497 0.0470 0.9324
500 0.25 0.75 0.0933 0.0468 0.0458 0.9428 0.0974 0.0476 0.0454 0.9476
500 0.25 1.00 0.1014 0.0449 0.0441 0.9458 0.1003 0.0455 0.0440 0.9372

500 0.50 0.25 0.0971 0.0423 0.0422 0.9436 0.0904 0.0431 0.0412 0.9358
500 0.50 0.50 0.0915 0.0414 0.0404 0.9436 0.0958 0.0419 0.0406 0.9400
500 0.50 0.75 0.0962 0.0397 0.0391 0.9492 0.0995 0.0401 0.0394 0.9426
500 0.50 1.00 0.0961 0.0379 0.0383 0.9384 0.1056 0.0381 0.0374 0.9386

500 0.75 0.25 0.0978 0.0495 0.0470 0.9448 0.1026 0.0501 0.0475 0.9358
500 0.75 0.50 0.1028 0.0484 0.0468 0.9460 0.0970 0.0501 0.0469 0.9346
500 0.75 0.75 0.0978 0.0469 0.0456 0.9456 0.0985 0.0488 0.0458 0.9356
500 0.75 1.00 0.0989 0.0448 0.0437 0.9470 0.1023 0.0471 0.0432 0.9316

Tn,hS = smooth mNRI; Coverage = Simulation coverage from 95% confidence interval

se(Tn,hS) = average standard error for smooth mNRI

Sim se = Simulation standard error for smooth mNRI

n = Sample size within each simulation; ρ = Correlation between covariates (X, Z); π0 = Pr(Y = 1); μX = Population mean for X when Y = 1

TABLE 3.

Type 1 error for the NRI and the modified NRI test procedures using a single sample.

ρ = 0 ρ = 0.5

n π0 μX mNRI
test
NRI
test
mNRI
test
NRI
test
200 0.25 0.25 0.0494 0.0468 0.0496 0.0504
0.50 0.0504 0.0578 0.0466 0.0582
0.75 0.0484 0.0776 0.0470 0.0724
1.00 0.0452 0.1046 0.0494 0.1034

0.50 0.25 0.0508 0.0574 0.0516 0.0678
0.50 0.0488 0.0820 0.0546 0.0804
0.75 0.0538 0.1028 0.0500 0.1008
1.00 0.0510 0.1242 0.0444 0.1276

0.75 0.25 0.0454 0.0466 0.0444 0.0466
0.50 0.0432 0.0588 0.0462 0.0568
0.75 0.0464 0.0756 0.0474 0.0832
1.00 0.0426 0.1032 0.0462 0.1036

500 0.25 0.25 0.0522 0.0630 0.0456 0.0604
0.50 0.0468 0.0926 0.0510 0.1040
0.75 0.0496 0.1468 0.0480 0.1392
1.00 0.0572 0.2012 0.0502 0.1910

0.50 0.25 0.0596 0.0726 0.0494 0.0646
0.50 0.0494 0.1128 0.0478 0.1116
0.75 0.0462 0.1532 0.0482 0.1590
1.00 0.0582 0.2152 0.0506 0.2076

0.75 0.25 0.0470 0.0624 0.0480 0.0650
0.50 0.0480 0.0976 0.0504 0.0940
0.75 0.0470 0.1488 0.0506 0.1472
1.00 0.0474 0.1946 0.0490 0.1864

mNRI test = Modified NRI test with Theorem 3 reference distribution

NRI test = NRI test with normal reference distribution

n = Sample size within each simulation

ρ = Correlation between covariates (X, Z); π0 = Pr(Y = 1); μX = Population mean for X when Y = 1

TABLE 4.

Type 1 error for the NRI and the modified NRI test procedures using a training and an independent test sample.

ρ = 0 ρ = 0.5

n π0 μX mNRI
test
NRI
test
mNRI
test
NRI
test

200 0.25 0.25 0.0518 0.0492 0.0490 0.0454
0.50 0.0500 0.0586 0.0528 0.0590
0.75 0.0506 0.0750 0.0498 0.0718
1.00 0.0524 0.1128 0.0468 0.1058

0.50 0.25 0.0500 0.0602 0.0508 0.0556
0.50 0.0456 0.0722 0.0534 0.0818
0.75 0.0496 0.0966 0.0486 0.0984
1.00 0.0568 0.1210 0.0484 0.1288

0.75 0.25 0.0516 0.0486 0.0532 0.0520
0.50 0.0496 0.0582 0.0478 0.0640
0.75 0.0492 0.0834 0.0516 0.0834
1.00 0.0560 0.1076 0.0484 0.1070

500 0.25 0.25 0.0510 0.0620 0.0528 0.0634
0.50 0.0528 0.1060 0.0494 0.0932
0.75 0.0542 0.1340 0.0532 0.1448
1.00 0.0536 0.2012 0.0554 0.1862

0.50 0.25 0.0594 0.0684 0.0480 0.0654
0.50 0.0518 0.1068 0.0560 0.1140
0.75 0.0516 0.1510 0.0488 0.1528
1.00 0.0524 0.1924 0.0464 0.1922

0.75 0.25 0.0526 0.0628 0.0564 0.0610
0.50 0.0530 0.0980 0.0498 0.0998
0.75 0.0514 0.1476 0.0500 0.1302
1.00 0.0504 0.1894 0.0486 0.1862

mNRI test = Modified NRI test with Theorem 4 reference distribution

NRI test = NRI test with normal reference distribution

n = Sample size within each simulation; ρ = Correlation between covariates (X, Z); π0 = Pr(Y = 1); μX = Population mean for X when Y = 1

TABLE 5.

Power of the modified NRI and Wald test procedures.

ρ = 0 ρ = 0.5

n π0 μX μZ mNRI
test
Wald
test
μZ mNRI
test
Wald
test
200 0.25 0.25 0.50 0.8448 0.8466 0.55 0.8400 0.8422
0.25 0.50 0.50 0.8408 0.8426 0.70 0.8592 0.8622
0.25 0.75 0.50 0.8038 0.8066 0.85 0.8818 0.8820
0.25 1.00 0.50 0.7678 0.7740 1.00 0.8834 0.8854

0.50 0.25 0.50 0.9286 0.9288 0.55 0.9198 0.9202
0.50 0.50 0.50 0.9184 0.9190 0.65 0.8690 0.8698
0.50 0.75 0.50 0.8990 0.8998 0.80 0.8940 0.8944
0.50 1.00 0.50 0.8708 0.8716 0.95 0.8978 0.9000

0.75 0.25 0.50 0.8438 0.8488 0.55 0.8386 0.8422
0.75 0.50 0.50 0.8358 0.8410 0.70 0.8658 0.8692
0.75 0.75 0.50 0.8118 0.8156 0.85 0.8790 0.8820
0.75 1.00 0.50 0.7796 0.7838 1.00 0.8862 0.8882

500 0.25 0.25 0.30 0.8190 0.8196 0.40 0.8594 0.8598
0.25 0.50 0.30 0.7996 0.8022 0.55 0.9060 0.9070
0.25 0.75 0.30 0.7854 0.7856 0.70 0.9318 0.9322
0.25 1.00 0.30 0.7506 0.7500 0.80 0.8540 0.8542

0.50 0.25 0.30 0.9120 0.9114 0.35 0.8114 0.8104
0.50 0.50 0.30 0.8960 0.8958 0.50 0.8818 0.8816
0.50 0.75 0.30 0.8848 0.8852 0.65 0.9094 0.9092
0.50 1.00 0.30 0.8396 0.8408 0.75 0.8218 0.8214

0.75 0.25 0.30 0.8206 0.8224 0.40 0.8576 0.8592
0.75 0.50 0.30 0.8078 0.8104 0.55 0.9106 0.9108
0.75 0.75 0.30 0.7832 0.7860 0.70 0.9312 0.9320
0.75 1.00 0.30 0.7482 0.7514 0.80 0.8604 0.8618

mNRI test = Modified NRI test with Theorem 4 reference distribution

Wald test = Test for γ = 0 in the extended model

n = Sample size within each simulation; ρ = Correlation between covariates (X, Z); μX = Population mean for X when Y = 1; μZ = Population mean for Z when Y = 1; π0 = Pr(Y = 1)

The results in Tables 1 and 2 were generated for true mNRI values equal to 0.05 and 0.10, respectively. The true mNRI values were determined from simulations with 50000 replicates, under the simulation structure detailed above, but using the known regression coefficient parameters. For the estimated smooth mNRI, the bandwidth h was set equal to σˆx,zn1/3, where σˆx,z is the estimated standard deviation of (βˆβ0ˆ)Tx+γˆTz, and the exponent was chosen to satisfy nh40. When the true mNRI value is 0.05, the estimate has a small bias for n=200, which improves when the sample size increases to 500. The mNRI estimate is relatively unbiased for all simulations when the true mNRI is 0.10. The same pattern occurs when comparing the average standard error for mNRI to its simulation standard error and the coverage of the asymptotic 95% confidence interval.

For the single sample simulations in Table 3, using Theorem 4, the average type 1 error was 0.048 (n=200) and 0.050 (n=500). In contrast, applying the normal reference distribution in (7), produced average type 1 errors equal to 0.079 (n=200) and 0.129 (n=500). Similar results were found for the independent training-test sample simulations in Table 4. From Theorem 5, the average type 1 error was 0.051 (n=200) and 0.052 (n=500), whereas when using the normal reference distribution it was 0.079 (n=200) and 0.124 (n=500). These simulation results confirm that the modified NRI test statistics, with their associated reference distributions, are valid test procedures, and they confirm the poor operating characteristics of the asymptotic normal reference distribution, with divergence increasing with sample size. Table 5 provides the power calculations, and demonstrate that the power of the mNRI test statistic is comparable to the Wald test for regression coefficients associated with the new factors.

5. PROSTATE CANCER DATA

Patients with metastatic prostate cancer are by definition high risk. Nevertheless, there is significant variability in the survival times of these patients (Sayegh, Swami, and Agarwal, 2021). Given this heterogeneity, there is a pressing need to identify new biomarkers that can accurately assess patient risk. Historically, the use of prostate specific antigen (PSA) and other blood based biomarkers have produced risk models with only moderate calibration and discrimination in the metastatic prostate cancer setting (Gafita et al. 2021). As a result, exploring informative new biomarkers continues, with a recent focus around circulating tumor cells and serum testosterone (Cieslikowski et al. 2021; Ryan et al. 2019).

An application of the net reclassification improvement (NRI), based on the addition of circulating tumor cells and serum testosterone, was undertaken for metastatic prostate cancer patients treated on the control arm of a multicenter phase 3 randomized clinical trial (Saad et al. 2015). The control arm of the randomized trial, patients treated with steroids alone, is useful to assess the added prognostic utility of new biomarkers, because it approximates the natural history of the disease.

Four hundred and eighteen patients with a complete set of biomarkers and sufficient follow-up were used in the analysis. The binary endpoint was survival 24 months after the start of treatment. In this cohort, forty seven percent of the patients survived longer than two years. In addition to circulating tumor cells and serum testosterone, traditional biomarkers for metastatic prostate cancer were incorporated into the risk model. The complete set of eight biomarkers included in the analysis were: albumin, alkaline phosphatase, circulating tumor cells, Gleason score, hemoglobin, lactate dehydrogenase, prostate specific antigen, and serum testosterone. Nested logistic regression models were fit for the binary 24 month survival endpoint; the expanded model incorporated all eight biomarkers and the base model represented a subset of seven biomarkers. All biomarkers except Gleason score were continuous. To create greater flexibility in the models, a restricted cubic spline with four knots was fit to each continuous biomarker. The knots were located at the {0.05, 0.35, 0.65, 0.95} quantiles of each covariate. Gleason score, an ordinal variable ranging from 2–10, representing tumor complexity as determined by pathology, and was dichotomized as 1–7 and 8–10.

Table 6 summarizes the results of the NRI, mNRI, and the p-values generated from their respective test procedures described in Section 3. For the logistic models, the mNRI equates to a scaled mean absolute difference (MAD) between the estimated event probabilities

[2ny(1y)]1i|G(βˆTxi+γˆTzi)G(βˆ0Txi)|.

For the prostate data, the observed proportion of events was 0.47, and so the mNRI ≈ 2 × MAD.

TABLE 6.

NRI and modified NRI for the prostate data.

Omitted factor NRI P-value
NRI test
mNRI P-value
mNRI test

Albumin 0.116 0.236 0.018 0.920
Alkaline phosphatase 0.336 < 0.001 0.106 0.014
Circulating tumor cells 0.627 < 0.001 0.190 < 0.001
Gleason score 0.086 0.381 0.034 0.849
Hemoglobin 0.351 < 0.001 0.088 0.020
Lactate dehydrogenase 0.027 0.787 0.056 0.322
Prostate specific antigen 0.359 < 0.001 0.080 0.138
Serum testosterone 0.195 0.046 0.044 0.490

P-value NRI test = P-value generated from the NRI test procedure with a normal reference distribution

P-value mNRI test = P-value generated from the mNRI test procedure with the reference distribution specified in Theorem 4.

With the addition of serum testosterone, the mean absolute distance was only 0.022, and using the smooth mNRI, a test of whether the population NRI differed from zero generated a p-value equal to 0.490. Figure 1 provides corroborating evidence that adding serum testosterone does not meaningfully change the predicted event probabilities. An application of the NRI with a normal reference distribution (7), however, produced a p-value equal to 0.046, which mirrors the high false positive rate for the NRI found in the simulations. When the circulating tumor cell (CTC) biomarker was added to the risk model, the mean absolute difference between the estimated event probabilities was large and equal to 0.095, with an attending p-value less than 0.001. The addition of circulating tumor cells had a marked effect on the predicted probability of death within 24 months. This result is confirmed visually in Figure 2, where the estimated event probabilities change significantly from the base model to the expanded model due to the addition of CTC. Thus, the addition of CTC but not serum testosterone would consequentially change the predicted probabilities of surviving greater than 24 months. Furthermore, for other single variable deletions, only the addition of alkaline phosphatase and hemoglobin appreciably change the expanded model probabilities.

Figure 1:

Figure 1:

Event probabilities for each individual estimated from the base model and the expanded model. The expanded model includes all eight biomarkers and the base model omits the biomarker serum testosterone. The symbols ‘o’ and ‘x’ represent individuals that survived 24 months from the start of treatment and those who did not.

Figure 2:

Figure 2:

Event probabilities for each individual estimated from the base model and the expanded model. The expanded model includes all eight biomarkers, and the base model omits the biomarker circulating tumor cells. The symbols ‘o’ and ‘x’ represent individuals that survived 24 months from the start of treatment and those who did not.

6. DISCUSSION

The net reclassification improvement (NRI) statistic is a measure of change for a model based risk score due to the addition of new factors. Although the NRI is frequently applied, identified weaknesses of the statistic include that it is not a proper scoring function (or proper change score) and it does not produce a valid test procedure. A modification of this statistic (mNRI) corrects these deficiencies. The mNRI can be interpreted as a measure of association between the directional change in the risk score and the base model score residual. In the special but frequently applied case of logistic regression, an asymptotic analysis demonstrates that the mNRI is proportional to a mean absolute deviation measure, putting the mNRI on an easily interpretable difference in probability scale.

There remain, however, some concerns with the NRI that are not resolved through the mNRI (Kerr et al. 2014). The mNRI does not include risk thresholds for the purpose of intervention strategies, and therefore does not include the costs and benefits of a risk threshold guided intervention. As a result, its application should be directed to the model development stage. On this topic, there has been significant discussion surrounding the utility of the NRI, and even with the modification proposed here, the debate will almost surely continue. The contribution of this work is to put the statistic on a stronger statistical foundation and to clear away some of the arguments that obscure its properties, perhaps shedding more light and less heat on this measure.

Highlights.

  • A modification of the Net Reclassification Improvement statistic can produce a proper score.

  • This modified statistic has an interpretation as a mean absolute deviation measure.

  • The modification can also produce a valid test statistic in contrast to the conventional Net Reclassification Improvement test statistic methodology.

  • This testing approach can be used with either single samples or independent training-test samples.

ACKNOWLEDGEMENTS.

This work was supported by NIH Grants R01CA207220 and P30CA008748.

The author thanks an associate editor and the reviewers for comments that improved the content of this manuscript.

APPENDIX: PROOF OF THEOREMS

The following conditions and notation will be used in the appendix.

(C1). The set of binary response nested models

Pr(Y=1)=Gβ
Pr(Y=1x)=G(β0Tx)
Pr(Y=1x,z)=Gβ0Tx+γ0Tz

specify the relationship between the p-dimensional existing factors x, the q-dimensional new factors z, and the binary event outcome y. The model with no covariates is the constant model, x alone is the base model and (x,z) is the expanded model. The inverse link function G is known and monotonically increasing. Throughout this work, random variables are represented with upper case, their observed copies are written in lower case, and vectors are indicated in bold.

(C2). The log-likelihood used to estimate the regression coefficients is

L(β,γ)=iyilogGβTxi+γTzi+1yilog1GβTxi+γTzi

where yi,xi,zi,i=1,,n are independent identically distributed copies of (Y,X,Z). For θ=(β,γ), the expanded model maximum likelihood estimate is denoted by θˆ=(βˆ,γˆ), and the two sets of restricted maximum likelihood estimates are θˆ0=(βˆ0,0) for the base model, and πˆ=G(βˆ), which is equal to the mean number of events y, for the constant model.

(C3). The score vector, observed information matrix, and expected information matrix for L(θ) are partitioned as

L(θ)θ=UβUγ;2L(θ)θθT=UββUβγUγβUγγ;En1Uθθ=βββγγβγγ.

For the likelihood evaluation under the restriction γ=0, we use the notation

L(α)αα=β0=Uβ0;2L(α)ααTα=β0=Uβ0β0;En1Uααα=β0=β0β0

For all evaluations, the elements of the inverse information matrix are denoted with superscripts. For example, the upper left p×p submatrix of the inverse of En1Uθθ is represented as ββ.

(C4). The likelihood parameterization L(η) will be utilized, where ηi=βTxi+γTzi is the risk score and the corresponding score residual rηi is

L(η)ηi=dGηidηiGηi1Gηi1yiGηi,

which will be useful to rewrite as

rηi=hηiyiGηi.

It is assumed that the score residual is bounded over η.

Theorem 1. The modified NRI (mNRI) is a proper change score

Proof of Theorem 1

For a single random variable, the modified NRI with the base and constant model parameters evaluated at their true value is

T1θ;θ0;π0=π01π01r(β0TX)I(βTX+γTZβ0TX>0)12.

Its expected value is equal to

EX,Zπ01π01hβ0TX×G(β0TX+γ0TZ)G(β0TX)I(βTX+γTZβ0TX>0)12

where h(β0TX) is a component of the score residual in (C4) evaluated under the base model.

To show ET1θ;θ0;π0 is maximized at θ=θ0, and therefore the modified NRI is a proper change score, consider

ET1θ0;θ0;π0T1θ;θ0;π0=EX,Z{π01π01h(β0TX)Gβ0TX+γ0TZG(β0TX)×I(β0TX+γ0TZβ0TX>0)I(βTX+γTZβ0TX>0)}.

This expectation is evaluated under two cases:

Case (i): β0TX+γ0TZβ0TX

The first term in square brackets, Gβ0TX+γ0TZG(β0TX), is non-negative due to the monotonicity of G, and the second term in square brackets, the difference in indicator functions, is either 0 or 1. Therefore, since the weight function h(β0TX) is positive, the expectation is non-negative for any θ=(β,γ).

Case (ii): β0TX+γ0TZ<β0TX.

Under this constraint, the first term in square brackets is negative and the second term in square brackets is either 0 or −1. It follows that the expectation is again non-negative for any θ=(β,γ).

Combining these two cases, ET1θ;θ0;π0 is maximized at θ=θ0 and therefore, the modified NRI is a proper change score.

Theorem 2.

Let Φ(/h) represent a normal distribution function with scale parameter h, which is chosen so that as n,nh, and nh40. For

Tn(θˆ;θˆ0;πˆ)=[ny(1y)]1ir(β0ˆTxi)I(βˆTxi+γˆTziβ0ˆTxi>0)12
Tn,hS(θˆ;θˆ0;πˆ)=[ny(1y)]1ir(β0ˆTxi)ΦβˆTxi+γˆTziβ0ˆTxih12
τθ0;θ0;π0=limnTn(θˆ;θˆ0;πˆ)

Then

  1. Tn(θˆ;θˆ0;πˆ)=Tn,hS(θˆ;θˆ0;πˆ)+Oph2

  2. n1/2[Tn(θˆ;θˆ0;πˆ)τ(θ0;θ0;π)]DN(0,v)

Theorem 2a was proved in Lemma A.1 of Heller (2007).

Proof of Theorem 2b

Consider n1/2[Tn(θˆ;θˆ0;πˆ)τθ0,θ0,π0]

=n1/2[Tn(θˆ;θˆ0;πˆ)Tn,hS(θˆθ;θˆ0;πˆ)]+n1/2[Tn,hS(θˆ;θˆ0;πˆ)τθ0,θ0,π0]

From Theorem 2a, under the condition nh40, the first term is op(1). Focusing on the second term, a first order expansion of n1/2[Tn,hS(θˆ;θˆ0;πˆ)τθ0,θ0,π] around (θˆ;θˆ0)=θ0,θ0, and the asymptotic identity (Cox and Hinkley 1974)

(βˆ0β0)=(βˆβ)+ββ1βγγˆγ0+opn1/2

results in

n1/2[Tn,hS(θˆ;θˆ0;πˆ)τ(θ0,θ0,π)]=[π1(1π1)]1n1/2i{r(β0TXi)[Φ(δih)12]τ(θ0,θ0,π)}+[π1(1π1)]1n1/2i{ψT[γβUβ(Xi,Zi)+γUγ(Xi,Zi)]+ξTβ0β0Uβ0(Xi,Zi)}

Where

δi=(β0β0)TXi+γ0TZiψ=limn(nh)1ir(β0TXi)(ziγββ1Xi)ϕ(δih)ξ=limnn1i[r(βTXi)β|β=β0][Φ(δih)12].

Theorem 2b follows from the central limit theorem.

Theorem 3.

Assume the covariate vectors x and z have dimension p and q, respectively. If ρθ0;θ0;π0=0, then the smooth NRI test statistic

nRnS(θˆ;θˆ0;πˆ)DNq0,V1TNq0,V2+12DpTββ1cp.

The first term is the inner product of two positively correlated, q-dimensional, mean zero normal random vectors, and the second term is bilinear, where Dp is a p dimensional random vector with quadratic components, and cp is a p dimensional constant vector.

Proof of Theorem 3:

The smooth NRI test statistic is

RnS(θˆ;θ0ˆ;πˆ)=[ny(1y)]1iyiyΦ(βˆTxi+γˆTziβ0ˆTxi)12.

To determine its null reference distribution, Pepe et al. (2013) demonstrate that for correctly specified nested models (C1),ρθ0;θ0;π0=0 iff βˆβˆ0p0 and γˆp0. This allows consideration of a second order Taylor expansion of RnS(θˆ;θˆ0;πˆ) around θˆ=θˆ0

nRnS(θˆ;θˆ0;πˆ)=ϕ(0)y(1y)i(βˆβˆ0)Txi+(γˆγˆ0)Tziyiy+op(1), (A.1)

where ϕ(0) represents the standard normal density function evaluated at 0, and since its derivative evaluated at zero, ϕ(0)=0, each element of the matrix in the quadratic term of the expansion is equal to 0.

To further simplify, note that

n1/2(βˆβˆ0)=ββ1βγn1/2γˆγ0+(4n)1/2ββ1d(θˆ;θˆ0)+opn1/2 (A.2)

which follows from a second order Taylor series approximation of the score statistic (C3), Uβ(θ) around θˆ=θˆ0, with

d(θˆ;θˆ0)=n12(θˆθˆ0)T[n1H1(θˆ0)]n12(θˆθˆ0)n12(θˆθˆ0)T[n1Hp(θˆ0)]n12(θˆθˆ0)andHjθ=2UβjθθθT.

Substituting (A.2) into (A.1),

nRnS(θˆ;θˆ0;πˆ)=ϕ(0)y(1y)×n1/2γˆγ0Tn1/2i(ziγβββ1xi)yiy+12d(θˆ;θˆ0)Tββ1n1ixiyiy+op(1). (A.3)

To obtain the result in Theorem 3, consider the elements in (A.3),

ϕ0y1ypϕ0π01π0
n1/2γˆγ0DNq0,V1
n1ixiyiypcp
d(θˆθ;θˆ0)DDp.

The remaining element is

n1/2iziγβββ1xiyiy.

First, under the null

n1i(ziγβββ1xi)yiypEX,Z(Zγβββ1X)(G(β0TX)π0)

which is rewritten as

EX,Z(Z*IγβIββ1X*)(WX1/2[G(β0TX)π0]) (A.4)

where Z*=ZWX1/2,X*=XWX1/2, and WX=var[r(β0T0T)X].

The motivation for the weight WX comes from the Bernoulli loglikelihood (C2, C3)

ββ=EX*X*Tγβ=EZ*X*T

and the recognition that

EX,Z{(Z*γβββ1X*)X*T}=0,

a q×p matrix of zeros.

Therefore by projection theory (Tsiatis 2006),

EZ*X*=γβββ1X*

and so the expectation in (A.4) is equal to zero.

It follows from the central limit theorem,

n1/2i(ziγβββ1xi)yiyDNq0,V2.

In addition, from (A.5) in Theorem 4,n1/2ir(ziγβββ1xi)yiy is strongly positively correlated with n1/2γˆγ0.

Theorem 3 now results from assembling the elements in (A.3).

Theorem 4.

Assume the binary regression models in (C1) are properly specified and the covariate vectors x and z have dimension p and q, respectively. If ρθ0;θ0;π0=0, then

nTnS(θˆ;θˆ0;πˆ)Dkχq2,

where k=ϕ(0)π01π01,ϕ(0) is the standard normal density function evaluated at 0, and χq2 is a chi-square random variable with q degrees of freedom.

Proof of Theorem 4:

The mNRI test statistic is,

nTnS(θˆ;θ0ˆ;πˆ)=[y1y]1ir(β0ˆTxi)Φ(βˆTxi+γˆTziβ0ˆTxi)12,

where r(·) is the score residual defined in (C4).

From Pepe et al. (2013), since βˆβˆ0p0 and γˆp0, a second order Taylor expansion of TnS around θˆ=θˆ0 results in

nTnS(θˆ;θ0ˆ*;πˆ)=ϕ(0)y(1y)n1/2γˆγ0Tn1/2ir(β0ˆTxi)(ziγβββ1xi)+op(1).

This approximation may be further simplified through the recognition that ir(β0ˆTxi)(ziγβββ1xi) is the efficient score statistic for estimating γ in the presence of β and evaluated under the constraint γ=0. It follows that (Bickel, Klassen, Ritov, and Wellner, 1993)

n1/2ir(β0ˆTxi)ziγβββ1xi=γγ1n1/2γˆγ0+op1, (A.5)

and therefore,

nTnS(θˆ;θˆ0;πˆ)=ϕ(0)π01π0n1/2γˆγ0Tγγ1n1/2γˆγ0+op1.

That is,

Pr(nTnS(θˆ;;θˆ0;πˆ)u)=Prkχq2u

where k=ϕ(0)π01π01 and χq2 is a chi-square random variable with q degrees of freedom.

Theorem 5.

Assume the binary regression models for the training and test data have the same specification and are given in (C1), where the covariate vector x has dimension p and the covariate vector z has dimension q. Denote the estimated regression coefficients from the training data by (θˆ,θˆ0,πˆ), the coefficients from the test data by (θ˜,θ˜0,π˜), and the data yi,xi,zi are drawn from the test sample. If ρθ0;θ0;π0=0,

nTnSθˆ;θˆ0,θ˜0;π˜Dk2j=12qλjχj2,

where k is defined in Theorem 4,{χj2} are independent chi-square random variables each with one degree of freedom, and λj represent eigenvalues determined from the product matrix VC, where

V=var(γ˜)00var(γˆ)C=0DD0

and D=[var(γ˜)]1

Proof of Theorem 5:

The test statistic for the NRI derived from training and test data are

nTnS(θˆ;θˆ0,θ˜0;π˜)=[y(1y)]1ir(β0~Txi)Φ(βˆTxi+γˆTziβ0ˆTxi)12.

Employing the arguments provided in the proof of Theorem 4, the smooth mNRI may be asymptotically approximated by

nTnS(θˆ;θˆ0,θ˜0;π˜)=ϕ(0)π01π0n1/2γˆγ0TIγγ1[n1/2(γγ0)]+op1.

The test statistic TnS is bilinear, due to the different coefficient estimates (γˆ,γ˜) from the training and test data. This statistic may be transformed to the quadratic

k2γˆγ0γ˜γ0T0DD0γˆγ0γ˜γ0.

It follows from Baldessari (1967) that as n,

PrnTnSθˆ;θˆ0,θ˜0;π˜t=Prk2j=12qλjχj2t.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

REFERENCES

  1. Baldessari B (1967), ”The Distribution of a Quadratic Form of Normal Random Variables,” Annals of Mathematical Statistics, 38, 1700–1704. [Google Scholar]
  2. Bickel PJ, Klaassen CAJ, Ritov Y, and Wellner JA (1993), Efficient and Adaptive Estimation for Semiparametric Models, The Johns Hopkins University Press. [Google Scholar]
  3. Cieslikowski WA, Antczak A, Nowicki M, Zabel M, Budna-Tukan J (2021), ”Clinical Relevance of Circulating Tumor Cells in Prostate Cancer Management,” Biomedicines, 9, 1179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Cox DR and Hinkley DV (1974), Theoretical Statistics. Chapman and Hall. [Google Scholar]
  5. Demler OV, Pencina MJ, Cook NR, and D’Agostino RB Sr (2017), ”Asymptotic distribution of ΔAUC, NRIs, and IDI based on theory of U-statistics, ”Statistics in Medicine, 36, 3334–3360. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Gafita A, Calais J, Grogan TR, Hadaschik B, Wang H, Weber M, Sandhu S, Kratochwil C, Esfandiari R, Tauber R, Zeldin A, Rathke H, Armstrong WR, Robertson A, Thin P, D’Alessandria C, Rettig MB, Delpassand ES, Haberkorn U, Elashoff D, Herrmann K, Czernin J, Hofman MS, Fendler WP, Eiber M (2021), ”Nomograms to predict outcomes after 177 Lu-PSMA therapy in men with metastatic castration-resistant prostate cancer: an international, multicentre, retrospective study, ”Lancet Oncology, 22, 1115–1125. [DOI] [PubMed] [Google Scholar]
  7. Gerds TA and Kattan MW (2021), Medical Risk Prediction Models With Ties to Machine Learning. CRC Press. [Google Scholar]
  8. Gneiting T and Raftery AE (2007), ”Strictly proper scoring rules, prediction, and estimation,” Journal of The American Statistical Association, 102, 359–378. [Google Scholar]
  9. Heller G (2007), ”Smoothed rank regression with censored data,” Journal of The American Statistical Association, 552–559. [Google Scholar]
  10. Horowitz JL (1998), Semiparametric Methods in Econometrics. Springer-Verlag. [Google Scholar]
  11. Kerr KF, McClelland RL, Brown ER, and Lumley T (2011), ”Evaluating the Incremental Value of New Biomarkers With Integrated Discrimination Improvement,” American Journal of Epidemiology, 174, 364–374. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Kerr KF, Wang Z, Janes H, McClelland RL, Psaty BM, and Pepe MS (2014), ”Net reclassification indices for evaluating risk-prediction instruments: A critical review,” Epidemiology, 25, 114–121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Manski CF (1985), ”Semiparametric analysis of discrete response: Asymptotic properties of the maximum score estimator,” Journal of Econometrics, 27, 313–333. [Google Scholar]
  14. McCullagh P and Nelder JA (1983), Generalized Linear Models. Chapman and Hall. [Google Scholar]
  15. Pencina MJ, D’Agostino RB Sr, D’Agostino RD Jr, and Vasan R (2008), ”Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond,” Statistics in Medicine, 27, 157–172. [DOI] [PubMed] [Google Scholar]
  16. Pencina MJ, D’Agostino RB Sr, and Steyerberg EW (2011), ”Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers,” Statistics in Medicine, 30, 11–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Pencina MJ, D’Agostino RB Sr, and Demler OV (2012), ”Novel metrics for evaluating improvement in discrimination: net reclassification and integrated discrimination improvement for normal variables and nested models,” Statistics in Medicine, 31, 101–113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Pepe MS, Fan J, Feng Z, Gerds T, and Hilden J (2015), ”The net reclassification index (NRI): A misleading measure of prediction improvement even with independent test data sets,” Statistics in Biosciences, 7, 282–295. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Pepe MS, Janes H, and Li CI (2014), Net risk reclassification p values: Valid or misleading? Journal of the National Cancer Institute, 106, 1–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Pepe MS, Kerr KF, Longton G, and Wang Z (2013), ”Testing for improvement in prediction model performance,” Statistics in Medicine, 32, 1467–1482. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Ryan CJ, Dutta S, Kelly WK, Russell C, Small EJ, Morris MJ, Taplin ME, Halabi S (2020), ”Androgen Decline and Survival During Docetaxel Therapy in Metastatic Castration Resistant Prostate Cancer (mCRPC),” Prostate Cancer and Prostatic Disease, 23, 66–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Saad F, Fizazi K, Jinga V, Efstathiou E, Fong PC, Hart LL, Jones R, McDermott R, Wirth M, Suzuki K, MacLean DB, Wang L, Akaza H, Nelson J, Scher HI, Dreicer R, Webb IJ, de Wit R ELM-PC 4 investigators. (2015), ”Orteronel plus prednisone in patients with chemotherapy naive metastatic castration-resistant prostate cancer (ELM-PC 4): a double-blind, multicentre, phase 3, randomised, placebo-controlled trial,” Lancet Oncology, 16, 338–348. [DOI] [PubMed] [Google Scholar]
  23. Sayegh N, Swami U, and Agarwal N (2021), ”Recent Advances in the Management of Metastatic Prostate Cancer,” JCO Oncology Practice, 18, 45–55. [DOI] [PubMed] [Google Scholar]
  24. Tsiatis AA (2006), Semiparametric Theory and Missing Data. Springer-Verlag. [Google Scholar]

RESOURCES