Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Jul 1.
Published in final edited form as: Lifetime Data Anal. 2012 Dec 20;19(3):350–370. doi: 10.1007/s10985-012-9239-z

Evaluating incremental values from new predictors with net reclassification improvement in survival analysis

Yingye Zheng 1,, Layla Parast 2, Tianxi Cai 3, Marshall Brown 4
PMCID: PMC3686882  NIHMSID: NIHMS430798  PMID: 23254468

Abstract

Developing individualized prediction rules for disease risk and prognosis has played a key role in modern medicine. When new genomic or biological markers become available to assist in risk prediction, it is essential to assess the improvement in clinical usefulness of the new markers over existing routine variables. Net reclassification improvement (NRI) has been proposed to assess improvement in risk reclassification in the context of comparing two risk models and the concept has been quickly adopted in medical journals. We propose both nonparametric and semiparametric procedures for calculating NRI as a function of a future prediction time t with a censored failure time outcome. The proposed methods accommodate covariate-dependent censoring, therefore providing more robust and sometimes more efficient procedures compared with the existing nonparametric-based estimators. Simulation results indicate that the proposed procedures perform well in finite samples. We illustrate these procedures by evaluating a new risk model for predicting the onset of cardiovascular disease.

Keywords: Inverse probability weighted (IPW) estimator, Net reclassification improvement (NRI), Risk prediction, Survival analysis

1 Introduction

Developing individualized prediction rules for disease risk and prognosis is fundamental for successful disease prevention and treatment selection. For many diseases, risk prediction models have been developed and incorporated into clinical practice guidelines. For example, the Gail model was developed for predicting individual breast cancer risk (Gail et al. 1989) and a risk calculator based on that model can be used to assist physicians making screening recommendations. For cardiovascular disease (CVD), prediction models such as the Framingham Risk Score (FRS) are used for stratifying patients into different levels of risks. However, much refinement is needed even for the best of these models because of their limited discriminatory accuracy. For example, the Framingham model, largely based on traditional clinical risk factors, has recognized limitations in its clinical utility (Hemann et al. 2007). A considerable fraction of patients who experienced CVD events had none of the identified risk factors, indicating a need to explore avenues beyond routine clinical measures for more accurate prediction (Khot et al. 2003). This fuels much of the current search for novel biologic markers and genetic factors that, when combined with routine clinical risk factors, may provide accurate prediction at the individual level.

When new genomic or biological markers become available to assist in risk prediction, it is essential to assess the clinical usefulness of these new markers compared to existing routine markers. Careful evaluation of the incremental value is particularly crucial when markers are either expensive or invasive to measure. To quantify the added clinical value of new markers over a conventional risk scoring system for predicting disease risk, one may calculate the difference in the prediction measures for the existing conventional model and the new model, which includes information from the new markers. For example the difference in the areas under the receiver operating characteristic curves (AUC of ROC) are often used to quantify the improvement in discrimination attributable to added markers. Since a risk model is often used to stratify patients into proper risk categories, statistical summaries that depend on clinically meaningful risk thresholds may be more relevant (Cook 2007; Cui 2009; Lloyd-Jones 2010). As an alternative to measuring the difference between AUCs, net reclassification improvement (NRI) has also been proposed to assess improvement in risk reclassification in the context of comparing two risk models constructed with and without novel markers (Pencina et al. 2008). Using “up” and “down” to denote changes in one or more risk categories in the upward and downward directions, respectively, for a subject between their baseline and augmented risk values, the NRI is defined as

NRI=[Pr(upDiseased)+Pr(downHealthy)][Pr(downDiseased)+Pr(upHealthy)].

Such a measure is appealing because it acknowledges both desirable risk reclassifications (up for diseased and down for healthy subjects) and undesirable risk reclassifications (down for diseased and up for healthy subjects). Due to its simplicity, NRI has been quickly adopted in medical journals. However, compared with many other measures for incremental values, the concept has not received much attention in the statistical literature.

Since a risk model is often used for predicting an individual's future outcome, it is essential to incorporate the additional dimension of time when assessing the performance of a risk model in a cohort study. For both deriving and evaluating risk models, prospective cohort data is often used. In this setting a subject's health status at a future time t is sometimes unknown due to loss of follow-up, termination of a study or the occurrence of a competing risk event. Such censoring poses additional challenges compared with settings previously examined in the literature which focus on incremental value calculation with a dichotomous outcome. Currently there is limited development in methods to estimate the incremental value of novel markers with censored failure time outcomes. Recently Pencina and D'Agostino (2011) proposed a method for calculating time-dependent NRI, based on nonparametric Kaplan–Meier (KM) estimators in order to account for censoring in cohort data. The asymptotic properties of a similar estimator is studied in detail in Uno et al. (2009). However, the validity of these estimators relies critically on the assumption that censoring is independent of predictors used in the risk models. Furthermore, the nonparametric procedure considered in these estimators may potentially lead to efficiency loss. A more flexible and more efficient estimating procedure is needed in practice.

In this manuscript, we propose quantitative procedures for calculating NRI as a function of a future prediction time t with a censored failure time outcome. Compared with existing nonparametric estimators, our procedures do not require the assumption that censoring is independent of predictors, therefore the methods would be widely applicable to many practical situations. We also consider procedures that aim to improve efficiency while maintaining robustness. This manuscript is organized as follows. In Sect. 2, we specify models and define NRI suitable for event time outcomes. In Sect. 3, we describe procedures for estimating time-dependent NRI using data obtained from a prospective cohort study with a failure time outcome. We comment on the theoretical properties of our proposed estimators in Sect. 4. We then describe simulation studies to evaluate the performance of the proposed estimators. The results are reported in Sect. 5. An application of our procedures for comparing two CVD risk models is presented in Sect. 6. Concluding remarks are in Sect. 7.

2 Measures of risk stratification and reclassification

Consider the situation that a vector of predictor Y measured at baseline is used for predicting the time to event outcome T. Risk models can be built using sub-vectors of Y. Let Y(1), a function of Y, denote a vector of conventional predictor values in the existing model. Let Y(2), also a function of Y, denote a vector of predictors used in the new model that contains Y(1), but also new predictor values. Individual-level risk at a future time t can be derived as P=Pr(TtY(1)), based on the conventional model, and Q=Pr(TtY(2)), the corresponding risk based on the new model, respectively. Since, in practice, risk categories are often uncertain for many diseases, a more objective and flexible measure of improvement in risk prediction would be based on P or Q in their original continuous scales. Therefore, following the definition of Pencina and D'Agostino (2011), in this manuscript we focus on the time-dependent continuous NRI, which is a more general definition that does not rely on the existence of risk categories. In the time-dependent setting, we further denote an ‘event’ person at time t as those with Tt, and a ‘nonevent’ person as T > t. Here, NRI(t) is equal to the sum of ‘event NRI’ and ‘nonevent NRI’, which are defined as:

eventNRIu(t)=Pr(QP>uTt)Pr(QPuTt)2Pr(QP>uTt)1,

and

noneventNRIv(t)=Pr(QPvT>t)Pr(QP>vT>t)12Pr(QP>vT>t).

Since, NRIu,v(t) = event NRIu(t) + nonevent NRIv(t), it follows that NRIu,v(t)=2{Pr(QP>uTt)Pr(QP>vT>t)}. In practice we may chose u and v such that improvement in risk estimates is meaningful (Uno et al. 2009). Setting u = v = 0 gives the ‘continuous NRI’ considered in Pencina and D'Agostino (2011). For the ease of presentation, in the sequel, we'll omit the subscript u and v from our notations and assume u = v = 0, but note that our estimators can be constructed for any arbitrary u and v. In the next section, we show how each component of NRI(t) can be estimated.

3 Estimation

Suppose we have a cohort of N individuals from the targeted population followed prospectively. Due to censoring, the observed data consist of N i.i.d copies of vector, D={Di=(Xi,δi,Yi)T,i=1,,N}, where Xi=min(Ti,Ci),δi=I(TiCi) for Ti and Ci denote failure time and censoring time respectively. Yi are predictors from individual i measured at time 0, including subset Yi(1) used in the existing model (model 1) and Yi(2) in the new model (model 2) such that Yi(1)Yi(2). For illustration, we first assume that P and Q both follow the conventional Cox regression models. Specifically, at time t, we assume P(θ1)=1exp[Λ01(t)exp{β1TY(1)}] and Q(θ2)=1exp[Λ02(t)exp{β2TY(2)}], where Λ0k is the baseline cumulative hazard function, βk are unknown vector of parameters, for model k = 1, 2, and θ1=(Λ01(t),β1T)T, θ2=(Λ02(t),β2T)T. It is important to note that these models are most likely not correctly specified. Nevertheless under a mild regularity condition, the standard maximum partial likelihood estimator β^k for βk converges to a constant vector, as n → ∞ (Hjort 1992). This provides theoretical ground for our asymptotic studies.

To estimate NRI(t), Pencina and D'Agostino (2011) first expressed the two key components as

Pr{B(θ)>0Tt}=Pr{TtB(θ)>0}Pr{B(θ)>0}Pr(Tt)

and

Pr{B(θ)>0T>t}=Pr{T>tB(θ)>0}Pr{B(θ)>0}Pr(T>t),

where B(θ) = Q(θ2) – P(θ1) and θ=(θ1,θ2)T. To account for censoring, Pencina and D'Agostino (2011) proposed to use the KM estimator to estimate the survival function using data from all subjects for Pr(Tt) and using subjects with B(θ) > 0 for estimation of Pr[Tt{B(θ)>0}]. We refer to the resulting estimator as the ‘KM estimator’ hereafter.

Uno et al. (2009) considered estimating NRI(t) based on an inverse-probability-of-censoring weighted (IPW) estimator (hereafter referred to as the ‘IPW estimator’), with its key components estimated as

Pr^IPW{B(θ)>0Tt}=iI{Bi(θ^)>0,Xit}W^i(t)iI(Xit)W^i(t) (3.1)
Pr^IPW{B(θ)>0T>t}=iI{Bi(θ^)>0,Xi>t}W^i(t)iI(Xi>t)W^i(t) (3.2)

where θ^=(θ^1,θ^2)T, θ^1=(Λ^01(t),β^1T)T, θ^2=(Λ^02(t),β^2T)T, W^i(t)=I(Xit)δiH^(Xi)+I(Xi>t)H^(t) and H^() is the KM estimator of H(·) = P(C > ·). Due to the equivalence between the KM estimator and the IPW estimator for marginal survival functions under independent censoring (Satten and Datta 2001), the two estimators are likely to have very similar robustness and efficiency. Both estimators are consistent under an independent censoring assumption regardless of the adequacy of the two fitted models, P(θ1) and Q(θ2). This is particularly appealing for model comparisons.

One potential weakness of both estimators is that they can be biased if censoring is dependent on a subset of Y(2). On the other hand, when model 2 is correctly specified, such covariate-dependent censoring can be incorporated based on the model since CT given β2TY(2) or Q(θ2). This motivates us to propose a more robust alternative to the Uno et al. (2009) estimator by estimating censoring probabilities given Y(2) via kernel smoothing over Q(θ2). Let Hq1(t)=P(C>tQ(θ2)=q,Δi(θ)=1) and Hq(t)=P(C>tQ(θ2)=q) where Δi(θ) = I{Bi(θ) > 0}. To estimate NRI(t), we propose to modify equations (3.1) and (3.2) by considering the following more robust IPW censoring weights

W~i(ι)(t)=I(Xit)δiH~Qi(θ^2)(ι)(Xi)+I(Xi>t)H^Qi(θ^2)(ι)(t)forι=1and,

where H~q(ι)(t)=exp{Λ^q(ι)(t)}=exp{0tπ^q(ι)(s)1dN^Cq(ι)(s)},

N^Cq(ι)(s)=n1i:Δi(θ^)ιKh{Qi(θ^2)q}NCi(s),π^q(ι)(s)=n1i:Δi(θ^)ιKh{Qi(θ^2)q}I(Xis),

NCi(s)=I(Xis)(1δi),1=1 and ={0,1},Kh()=1hK{qQi(θ2)h}, K is a symmetric kernel density function, with h = h(n) → 0 as the bandwidth. Note that Δi(θ^)1 is simply the subset of individuals with Bi(θ^)>0 and Δi(θ^) is the set of all individuals. Consequently we can then use these more robust kernel smoothing weights in the IPW estimator, to obtain the ‘Smooth-IPW (S-IPW) estimators’,

Pr^S-IPW{B(θ)>0Tt}=iΔi(θ^)W~i(1)(t)I(Xit)iW~i()(t)I(Xit)and (3.3)
Pr^S-IPW{B(θ)>0T>t}=iΔi(θ^)W~i(1)(t)I(Xi>t)iW~i()(t)I(Xi>t). (3.4)

This resulting estimator for NRI(t) is

NRI~(θ^,t)=2×[Pr^S-IPW{B(θ^)>0Tt}Pr^S-IPW{B(θ^)>0T>t}].

The estimator can be shown to have the property of ‘double robustness’, i.e., it only requires that the risk model Q is correctly specified or that the independent censoring assumption holds.

Additionally, to improve upon the efficiency of the class of nonparametric estimators, we propose considering a semiparametric estimator. Note that

Pr{B(θ)>0T>t}=E[E{I(B(θ)>0,T>t)Y(2)}]E[E{I(T>t)Y(2)}]=E{I(B(θ)>0)P(T>tY(2))}E{P(T>tY(2))}.

Therefore NRI(t) can be estimated semiparametrically as

NRI^(θ^,t)=2×{Pr^SEM(B(θ)>0Tt)Pr^SEM(B(θ)0T>t)},

with the ‘SEM’ estimators,

Pr^SEM(B(θ)>0Tt)=i=1nΔi(θ^)Qi(θ^2)i=1nQi(θ^2), (3.5)
Pr^SEM(B(θ)>0T>t)=i=1nΔi(θ^){1Qi(θ^2)}i=1n{1Qi(θ^2)}. (3.6)

Under the correctly specified model Q(θ2), the semiparametric estimator accommodates a covariate-dependent censoring situation and would be more efficient compared to the Smooth-IPW estimator. In practice, to estimate NRI(t), if estimates from such a semiparametric method agree well with those of the nonparametric methods, one may choose to report results based on the semiparametric method for additional gain in efficiency. To automatize the procedure, we suggest considering a combined estimator (hereafter referred as the ‘combined estimator’), which takes the form

p^×NRI^(θ^,t)+(1p^)×NRI~(θ^,t),

with p^ being a weight that is dependent on the aptness of the semiparametric model. For example, p^ can be taken to be the p-value from a consistent test of the proportional hazards assumption for a Cox regression model fit. Such an estimator provides a simple procedure which is robust over a wide variety of situations. In numerical studies, we show that such a combined estimator can be more efficient compared with the nonparametric estimators, while maintaining the double robustness property.

We note that the proposed estimators can be easily generalized to NRI based on risk categories. Consider a situation where individuals are classified as low, intermediate or high risk: low risk if their risks are below r1, and high risk if their risks are above r2. The reclassification accuracy of risk models in such a setting can be quantified with a 3-category NRI of the form NRIcategory(θ^,t)=P(upTt)P(downTt)+P(downT>t)P(upT>t). To estimate P(upTt) and P(up|T > t), we may simply replace Δi(θ^) with Ωiup(θ^)=I(Pi(θ1)r1Qi(θ2)>r1)+I(r1<Pi(θ1)r2,Qi(θ2)>r2) in Eqs. 3.3 and 3.4, respectively. Similarly, to estimate P(downTt) and P(down|T > t), one may replace Δi(θ^) with Ωidown(θ^)=I(Qi(θ1)r1,Pi(θ2)>r1)+I(r1<Qi(θ1)r2,Pi(θ2)>r2) in Eqs. 3.3 and 3.4. Similarly, one may obtain a semiparametric estimator of NRIcategory(θ^,t) by replacing Δi(θ^) with Ωiup(θ^), or Ωidown(θ^) in Eqs. 3.5 and 3.6.

4 Inference

To make inference about NRI~(θ^,t), we study the asymptotic properties of proposed estimators. In the Appendix, we show that NRI~(θ^,t) is uniformly consistent for NRI(θ0, t), where θ0=(Λk0(),βk0T)T with βk0 being the unique maximizer of the expected value of the corresponding partial likelihood. Furthermore, we show that the process W~(t)=n{NRI~(θ^,t)NRI(θ0,t)} is asymptotically equivalent to a sum of i.i.d terms, n12i=1ni(t) where εi(t) is defined in the the Appendix. By a functional central limit theorem of Pollard (1990), the process W~(t) converges weakly to a mean zero Gaussian process in t. We also show that NRI^(θ^,t) is uniformly consistent for NRI(θ0, t), and that the process N~(t)=n[NRI^(θ^,t)NRI(θ0,t)] is asymptotically equivalent to a sum of i.i.d terms n12i=1nζi(t) where ζi(t) is defined in the Appendix. Again, by a functional central limit theorem, the process N~(t) converges weakly to a mean zero Gaussian process in t. With weak convergence of both NRI^(θ^,t) and NRI~(θ^,t), it follows that the combined estimator converges to a zero-mean process. Due to the variation in p^, the combined estimators may not be a Gaussian process. We show in our simulation that to make inference, resampling procedures such as a bootstrap method can provide a valid approximation of the limit distribution. Specifically, at each of the bth bootstrap iterations, with b = 1, . . . , B, we conduct a random sampling with replacement of the original dataset, and fit our new and old risk models based on the sampled dataset, denoted as Pb(θ^) and Qb(θ^). These estimates from the fitted models are then used to calculate NRI^b(θ^,t) and NRI~(θ^,t) based on the bootstrapped samples. This procedure will be repeated B times, and confidence intervals can be constructed either based on the percentile method, or a normal approximation where the standard error is calculated based on the empirical standard errors of {NRI^b(θ^,t),b=1,,B} and {NRI~b(θ^,t),b=1,,B}. The combined estimator can be inferred similarly by repeatedly calculate the weights based on each bootstrap sample in addition to NRI^b(θ^,t) and NRI~(θ^,t).

In the absence of an independent validating set, often in practice the same dataset is used for both fitting the model with several predictors and calculating a measure such as NRI(t). Such an ‘apparent’ summary may potentially lead to the so-called ‘overfitting’ phenomenon, i.e. estimates of model performance will tend to be more optimistic compared with the corresponding estimates if the model were to applied to a new dataset. Several methods for correcting the bias from apparent estimates can be considered. The 0.632 Bootstrap method (Efron and Tibshirani 1997) has been shown to have better performance compared with a simple cross-validated approach. The estimator was derived in our simulation as follows: we first obtained a bootstrapped estimate NRI^bt(t) by sampling the data with replacement to obtain the training set. The training set is used to estimate the model parameters {θ^k(train),k=1,2}. The remaining subjects make up the validation set, and are used to calculate the various estimates of NRI using parameter values {θ^k(train),k=1,2}. This is repeated B times and NRI^bt(t) is the mean across the repetitions. The 0.632 bootstrap estimate is,

NRI^0.632bt(t)=0.632NRI^bt(t)+(10.632)NRI^apparent(t),

where NRI^apparent(t) is the estimate without using cross-validation. To construct a confidence interval based on NRI^0.632bt(t), we follow the suggestions given in Tian et al. (2007) and Uno et al. (2007) by shifting the apparent error based confidence interval in the amount of bias estimated as bias^=NRI^apparent(t)NRI^0.632bt(t). Specifically, if [L, R] is the confidence interval calculated based on the procedure described above, then the bias corrected confidence interval is [Lbias^,Rbias^].

5 Simulation studies

To examine the performance of various NRI(t) estimators, we conducted simulation studies under several different scenarios. Throughout we chose n = 500 and used 200 bootstrap samples to calculate standard errors. Results for each setting were produced from 1,000 simulations. We calculated NRI(t), for t = 3 for comparing two risk models using the KM, IPW, Smooth-IPW, SEM and the combined estimators described in Sect. 3. We fitted Cox regression models to calculate risks for both the new and existing models using corresponding predictors.

For the first setting presented in Table 1, two predictors Y1 and Y2 were simulated from a multivariate normal distribution with mean (0, 0.5), σy1 = σy2 = 1 and a correlation ρ of 0.25. The relationship between survival time T and Y followed a proportional hazards model with parameters β1 = log(3) and β2 equal to log(1.5). Censoring time was generated from a U(0, a) distribution where a was chosen to produce approximately 40% censoring. Note that in this setting, model Q is correctly specified and the independent censoring assumption is correct. We took the baseline model to consist of Y1 and the new model to include both predictors. As expected, all estimators shown in Table 1 provide unbiased estimates. The bootstrap-based variance estimators perform well with coverage percentage close to the 95% nominal level. Since the risk based on the new model is correctly specified, the semiparametric method is the most efficient. Improvement in efficiency over the nonparametric procedures is observed with our combined estimators.

Table 1.

Simulation results under noninformative censoring and correctly specified new risk model with mean (mean of bias (mean(bias)) and standard deviation (Std. Dev.) of the estimated parameters across simulations, the mean of the standard error estimates calculated for each simulation using bootstrapping (mean(std error)), and coverage of the 95 % bootstrap confidence interval based on the normal approximation

Method Pr(PiQi>0Tit) Pr(PiQi > 0|Ti > t) NRI (t)
True values 0.592 0.358 0.468
KM
Mean(Bias) 0.003 0.001 0.002
Std. Dev. 0.034 0.030 0.104
Mean(std error) 0.034 0.030 0.103
95 % bootstrap CI cov. 0.946 0.946 0.946
IPW
Mean(Bias) 0.002 0.002 –0.001
Std. Dev. 0.034 0.030 0.105
Mean(std error) 0.034 0.031 0.104
95 % bootstrap CI cov. 0.943 0.95 0.951
Smooth IPW
Mean(Bias) 0.001 0.003 –0.003
Std. Dev. 0.034 0.030 0.104
Mean(std error) 0.034 0.030 0.103
95 % bootstrap CI cov. 0.946 0.942 0.949
SEM
Mean(Bias) 0.001 0.003 –0.003
Std. Dev. 0.024 0.029 0.082
Mean(std error) 0.025 0.028 0.080
95 % bootstrap CI cov. 0.952 0.942 0.937
Combined
Mean(Bias) 0.002 0.003 –0.002
Std. Dev. 0.029 0.028 0.089
Mean(std error) 0.031 0.029 0.095
95 % bootstrap CI cov. 0.968 0.949 0.969

KM Kaplan–Meier estimator, IPW inverse probability weighted estimator, Smooth IPW smooth inverse probability weighted estimator, SEM semiparametric estimator, Combined combined estimator, as defined in the text

Under this setting we also considered a null model where β2 = 0 i.e. there is no incremental value of the new marker and NRI(t) = 0. We found that in this situation all estimators tend to slightly over estimate NRI(t), and variance estimators based on the bootstrap estimators tend to be conservative (see Table 2). We do not recommend calculating NRI(t) in the case when the new marker does not independently predict outcome in a model with conventional predictors. Note that all theoretical results in the Appendix are derived under the assumption that β2 ≠ 0 and thus our proposed procedures are only valid under this assumption. In practice, if the null setting is a likely possibility, estimation should be treated with care.

Table 2.

Simulation results under noninformative censoring and correctly specified new risk model with mean of bias (mean(Bias)) and standard deviation (Std. Dev.) of the estimated parameters across simulations, the mean of the standard error estimates calculated for each simulation using bootstrapping (mean(std error)), and coverage of the 95 % bootstrap confidence interval based on the normal approximation. Data is generated under the null model that β2 = 0

Method Pr(PiQi>0Tit) Pr(PiQi > 0|Ti > t) NRI(t)
Null model: β2 = 0
True values 0.5 0.5 0
KM
Mean(Bias) 0.01 –0.02 0.061
Std. Dev. 0.034 0.026 0.091
Mean(std error) 0.043 0.033 0.118
95 % bootstrap CI cov. 0.996 0.971 0.98
IPW
Mean(Bias) 0.01 –0.019 0.058
Std. Dev. 0.034 0.026 0.092
Mean(std error) 0.044 0.033 0.119
95 % bootstrap CI cov. 0.996 0.972 0.981
Smooth IPW
Mean(Bias) 0.009 –0.019 0.055
Std. Dev. 0.034 0.026 0.092
Mean(std error) 0.044 0.033 0.118
95 % bootstrap CI cov. 0.996 0.972 0.981
SEM
Mean(Bias) 0.009 –0.019 0.057
Std. Dev. 0.023 0.025 0.067
Mean(std error) 0.029 0.031 0.081
95 % bootstrap CI cov. 0.99 0.967 0.957
Combined
Mean(Bias) 0.008 –0.019 0.055
Std. Dev. 0.029 0.025 0.077
Mean(std error) 0.039 0.032 0.104
95 % bootstrap CI cov. 0.997 0.971 0.977

KM Kaplan–Meier estimator, IPW inverse probability weighted estimator, Smooth IPW smooth inverse probability weighted estimator, SEM semiparametric estimator, Combined combined estimator, as defined in the text

The second setting we considered was identical to the first setting, except that censoring time was dependent on marker values. Here, censoring time,

C=UB+exp(X3Y2)(1B),

where U was generated from a Uniform(0, a) distribution where with a chosen to yield about 40% censoring, X was generated from a N(0, 1) distribution and B was generated from a N(2·Y1, 1) distribution. Note that in this setting, model Q is correctly specified but the independent censoring assumption is not correct. As seen in the results presented in Table 3, the KM estimator yields biased estimators for both NRI(t) and its two key components. The IPW estimator is biased for both Pr(P>QTt) and NRI(t), whereas the smooth-IPW estimator substantially alleviates such biases. However, we observed large variation in nonparamatric estimators of NRI(t) as compared with the semiparametric and combined estimators (Table 3).

Table 3.

Simulation results under covariate-dependent censoring and correctly specified new risk model with mean of bias (mean(Bias)) and standard deviation (Std. Dev.) of the estimated parameters across simulations, the mean of the standard error estimates calculated for each simulation using bootstrapping (mean(std error)), and coverage of the 95 % bootstrap confidence interval based on the normal approximation

Method Pr(PiQi>0Tit) Pr(PiQi > 0|Ti > t) NRI(t)
True values 0.611 0.45 0.322
KM
Mean(Bias) 0.067 –0.062 0.259
Std. Dev. 0.040 0.040 0.126
Mean(std error) 0.041 0.040 0.129
95 % bootstrap CI cov. 0.615 0.659 0.483
IPW
Mean(Bias) –0.024 0.005 –0.057
Std. Dev. 0.034 0.045 0.131
Mean(std error) 0.035 0.044 0.130
95 % bootstrap CI cov. 0.897 0.944 0.918
Smooth IPW
Mean(Bias) –0.013 0.007 –0.038
Std. Dev. 0.041 0.041 0.133
Mean(std error) 0.040 0.040 0.132
95 % bootstrap CI cov. 0.937 0.939 0.941
SEM
Mean(Bias) 0 –0.001 0.002
Std. Dev. 0.025 0.039 0.098
Mean(std error) 0.026 0.037 0.095
95 % bootstrap CI cov. 0.951 0.932 0.938
Combined
Mean(Bias) –0.006 0.002 –0.016
Std. Dev. 0.031 0.039 0.109
Mean(std error) 0.035 0.039 0.117
95 % bootstrap CI cov. 0.975 0.951 0.971

KM Kaplan–Meier estimator, IPW inverse probability weighted estimator, Smooth IPW smooth inverse probability weighted estimator, SEM semiparametric estimator, Combined combined estimator, as defined in the text

The third setting we investigated considers the case where survival time depends on four markers Yi, for i = 1, . . . , 4, but we only have access to the first two. In particular, Y comes from a multivariate normal distribution with mean 0, and σij = 1 for i = j and 0.25 otherwise. Survival time relates to the marker data through a model where the hazard function is specified as λ(t|Y) = 0.1*{3Y1 + 1.5Y2 + 2Y3 + 2.5Y4 + exp(3Y1)}. Note that in this setting, model Q is misspecified as depending only on Y1 and Y2. Censoring time in this setting is generated the same as in the first setting, which does not dependent on T or Y. Since the SEM estimator misspecified the relationship between T and Y as λ(t|Y) = λ0 exp(β1Y1 + β2Y2), it yields biased results. All other estimators are unbiased (Table 4). Throughout the three settings we considered, the combined estimator remained unbiased and more efficient than other nonparametric estimators.

Table 4.

Simulation results under noninformative censoring and misspecified new risk model with mean of bias (mean(Bias)) and standard deviation (Std. Dev.) of the estimated parameters across simulations, the mean of the standard error estimates calculated for each simulation using bootstrapping (mean(std error)),and coverage of the 95 % bootstrap confidence interval based on the normal approximation

Method Pr(PiQi>0Tit) Pr(PiQi > 0|Ti > t) NRI (t)
True values 0.646 0.395 0.504
KM
Mean(Bias) 0.007 –0.002 0.016
Std. Dev. 0.072 0.023 0.160
Mean(std error) 0.074 0.024 0.164
95 % bootstrap CI cov. 0.94 0.945 0.947
IPW
Mean(Bias) 0.004 –0.001 0.008
Std. Dev. 0.072 0.023 0.160
Mean(std error) 0.074 0.024 0.165
95 % bootstrap CI cov. 0.945 0.942 0.95
Smooth IPW
Mean(Bias) 0.003 –0.001 0.007
Std. Dev. 0.072 0.023 0.160
Mean(std error) 0.074 0.024 0.164
95 % bootstrap CI cov. 0.943 0.946 0.95
SEM
Mean(Bias) –0.046 0.003 –0.099
Std. Dev. 0.022 0.022 0.068
Mean(std error) 0.022 0.023 0.068
95 % bootstrap CI cov. 0.448 0.943 0.682
Combined
Mean(Bias) –0.009 0.000 –0.020
Std. Dev. 0.057 0.022 0.128
Mean(std error) 0.062 0.023 0.139
95 % bootstrap CI cov. 0.970 0.947 0.976

KM Kaplan–Meier estimator, IPW inverse probability weighted estimator, Smooth IPW smooth inverse probability weighted estimator, SEM semiparametric estimator, Combined combined estimator, as defined in the text

To evaluate the procedures described above, we simulated 10 markers from a multivariate normal distribution with mean 0, σYi = 1 and pairwise correlations equal to 0.25. The number of parameters and sample size were chosen to mimic the setting of our data example described in Sect. 6. We consider a Cox model for failure time, with hazard ratio parameters for 10 markers specified as β = (log(2), log(.77), 0, log(1.81), 0, 0, 0, log(0.5), 0, log(1.2)). The baseline model consists only of the first marker. To derive a new model based on the information on all 10 markers, for each simulation, we first fit a model with all ten markers. The expanded model consists of all markers that have non-zero β at an α = 0.05 level. We found that in the case of estimating NRI, under our simulated scenario, the apparent summaries are quite close to the true values in many cases. Since the bias is at the rate of g/N, where g is the number of predictors under consideration for risk model building, overfitting may be of more concern when large numbers of genetic markers are involved with a relatively small sample size. In the situation there is a slight indication of overfitting, the 0.632 bootstrap procedure appears to be adequate in correcting the bias (see Table 5).

Table 5.

Simulation results comparing apparent estimates and the 0.632 bootstrap for correcting overfitting

Estimator Pr(PiQi>0Tit) Pr(PiQi > 0|Ti > t) NRI(t)
True values 0.684 0.275 0.817
Smooth IPW Apparent
    Mean(Bias) 0.000 0.004 –0.007
    Std. Dev. 0.036 0.028 0.108
    CI coverage 0.962 0.963 0.964
0.632 Bootstrap
    Mean(Bias) –0.008 0.008 –0.032
    Std. Dev. 0.034 0.027 0.102
    CI coverage 0.971 0.969 0.968
Bootstrapped SE
    Mean(std error) 0.039 0.030 0.114
SEM Apparent
    Mean(Bias) 0.003 –0.001 0.009
    Std. Dev. 0.023 0.025 0.072
    CI coverage 0.955 0.954 0.945
0.632 Bootstrap
    Mean(Bias) 0.005 –0.003 0.015
    Std. Dev. 0.022 0.024 0.072
    CI coverage 0.953 0.962 0.937
Bootstrapped SE
    Mean(std error) 0.024 0.025 0.071
Combined Apparent
    Mean(Bias) 0.001 0.001 0.001
    Std. Dev. 0.028 0.026 0.087
    CI coverage 0.982 0.969 0.975
0.632 Bootstrap
    Mean(Bias) –0.002 0.003 –0.008
    Std. Dev. 0.027 0.025 0.085
    CI coverage 0.989 0.975 0.983
Bootstrapped SE
    Mean(std error) 0.035 0.028 0.102

Smooth IPW smooth inverse probability weighted estimator, SEM semiparametric estimator, Combined combined estimator, as defined in the text

6 Example

The Framingham risk model (FRM) has been used for population-wide CVD risk assessment. The model was developed based on several common clinical risk factors, including age, gender, total cholesterol level, high-density lipoprotein (HDL) cholesterol level, smoking, systolic blood pressure and high blood pressure treatment (Wilson et al. 1998). To improve the predictive capacity of the FRM, a new risk model has been developed recently using data from the Women's Health Study (Cook et al. 2006), based on variables in the Framingham risk model and an inflammation marker, C-reactive protein (CRP). Prior to adapting the new model in routine practice, it is important to quantify its prediction performance, especially in comparison to that of FRM. We illustrate here how our proposed procedures can be used to evaluate and compare the clinical utility of the two risk models using an independent dataset from the Framingham Offspring Study (Kannel et al. 1979).

The Framingham Offspring Study was established in 1971 with 5,124 participants who were monitored prospectively for epidemiological and genetic risk factors for CVD. We consider here 1,728 female participants who have CRP measurement and other clinical information at the second exam and are free of CVD at the time of examination. The average age of this subset was about 44 years (standard deviation = 10). The outcome we consider is the time from exam date to first major CVD event including CVD-related death. During the followup period 269 participants were observed to encounter at least one CVD event and the 10-year event rate was about 4%. For illustration we chose t = 10 years as in Wilson et al. (1998). For each individual, two risk scores were calculated: one based on the FRM (Model 1), combining information on age, systolic blood pressure, smoking status, high-density lipoprotein (HDL), total cholesterol, medication for hypertension; the other based on an algorithm developed in Cook et al. (2006) (Model 2), with the addition of CRP concentration. We use Cox models to specify the relation between the time-to-CVD events and model scores (linear predictors from the models).

Both models are well calibrated based on calibration plots (not shown). For comparison, we first give AUC results and use the bootstrap to obtain confidence intervals. The AUC for an ROC curve at 10-years is 0.752 (95% CI: 0.721,0.783) for Model 1 and 0.758 (95% CI: 0.729,0.787) for Model 2. The difference between the two AUCs is not statistically significant: 0.006 (95% CI: –0.033, 0.046). We now investigate whether the new models reclassify patients in terms of their risks and CVD outcome at 10 years. We consider NRI (10-years) for such an evaluation using the methods described in Sect. 3. Table 6 shows that estimates from the three nonparametric models are quite consistent, all indicating that the new model does not add significant improvement gauged by NRI. The semiparametric model, however, does indicate a significant incremental value with NRI = 0.167 (SE = 0.067), and the combined estimator indicates a similar magnitude of improvement, though not significant (NRI = 0.132, SE = 0.137). Note that since we considered a continuous NRI with u = v = 0, the observed improvement at this magnitude may not be interpreted as clinically substantial. Since different conclusions could be reached depending on which estimation method is chosen, this analysis highlights the need to consider multiple robust approaches for calculating NRI.

Table 6.

NRI estimates for two risk models for predicting 10-year CVD risk among women in the Framingham offspring cohort

Method Pr(PiQi>0Tit) Pr(PiQi > 0|Ti > t) $NRI(t)
KM
    Est 0.483 0.508 –0.049
    SE 0.069 0.028 0.176
IPW
    Est 0.478 0.508 –0.059
    SE 0.070 0.028 0.178
Smooth IPW
    Est 0.480 0.508 –0.057
    SE 0.070 0.028 0.178
SEM
    Est 0.587 0.503 0.167
    SE 0.015 0.026 0.067
Combined
    Est 0.570 0.504 0.132
    SE 0.054 0.027 0.137

KM Kaplan–Meier estimator, IPW inverse probability weighted estimator, Smooth IPW smooth inverse probability weighted estimator, SEM semiparametric estimator, Combined combined estimator, as defined in the text

7 Discussion

NRI provides an alternative tool for evaluating risk prediction models (Pencina et al. 2008) beyond the traditional ROC curve framework. The concept has continued to gain popularity in the medical literature, yet its statistical properties have not been well studied to date in the statistical literature, and existing methods for calculating NRI under the failure time outcome setting are limited. In this manuscript, we provide a more thorough investigation of a variety of estimation procedures. Our proposed nonparametric and semiparametric estimators improve upon existing methods both in terms of robustness and efficiency under a variety of practical situations. Such improvement is quite important, since we observe that compared with other measures such as AUC, NRI estimates, in general, are not very stable with substantial variations in the estimators we have considered. The proposed procedures can be used for estimating both continuous NRI and NRI with pre-specified fixed categories. As illustrated in the example, the choice of estimation method can lead to different conclusions. In practice, the method chosen should depend on a number of important considerations including the likelihood that the model has been correctly specified and that the assumptions concerning censoring are correct. In addition, in situations where the new marker may be expensive or difficult to ascertain, an approach which considers both the risks and benefits of obtaining the marker should be considered in a decision-making process. We recommend such measures to be used in practice with caution. A thorough evaluation of a risk model should consider a wide spectrum of measures for assessing discrimination and calibration, and NRI may be better served as one of the summary measures to complement graphical displays of risk distributions(Gu and Pepe 2009). All analyses were performed in R. Code for implementing the proposed procedures is available upon request.

Acknowledgment

The Framingham Heart Study and the Framingham SHARe project are conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with Boston University. The Framingham SHARe data used for the analyses described in this manuscript were obtained through dbGaP (access number: phs000007.v3.p2). This manuscript was not prepared in collaboration with investigators of the Framingham Heart Study and does not necessarily reflect the opinions or views of the Framingham Heart Study, Boston University, or the NHLBI. The work is supported by grants U01-CA86368, P01-CA053996, R01- GM085047, R01-GM079330 awarded by the National Institutes of Health.

Appendix

Throughout, we assume that the joint density of (T, C, Y) is twice continuously differentiable, Y are bounded, and 1 > P(T > t) > 0, 1 > P(C > t) > 0. The kernel function K is a symmetric probability density function with compact support and bounded second derivative. The bandwidth h → 0 such that nh4 → 0. In addition, the estimator θ^k converges to θ0k for k = 1, 2 as n → ∞ (Hjort 1992), where βk0 is the unique maximizer of the expected value of the corresponding partial likelihood and Λk0 is the baseline cumulative hazard for k = 1, 2. We denote the parameter space for θk by Ωk and assume that Ωk is a compact set containing θ0k. Furthermore, we assume that β2 ≠ 0 and note that Q(θ2)=1exp{Λ02(t)eβ2TY(2)} and P(θ1)=1exp{Λ01(t)eβ1TY(1)} are the respective limits of Q(θ^2) and P(θ^1), for any given Y(2) and Y(1). The in-probability convergence of Q(θ^2)Q(θ02) and P(θ^1) and P(θ01) are uniform in Y(2) and Y(1) due to the convergence of θ^θ0=(θ01T,θ02T)T.

Asymptotic Properties of NRI^(θ^,t)

From the same arguments as given in Cai et al. (2010) and Dabrowska (1997), it follows that we have the uniform consistency of H~q(ι)(t) to H~q(ι)(t)=P(CtQ(θ2)=q,Δ(θ)ι, where 1=1 and ={0,1}, for ι = 1 and •. It follows, using the law of numbers (Pollard 1990), that

supθNRI~(θ,t)NRI(θ,t)0.

This along with the convergence of θ^ to θ0 implies that NRI~(θ^,t) is uniformly consistent for NRI(θ0, t).

Throughout, we will use the fact that E{Δi(θ)I(Xit)δiHQi(θ2)(1)(Xi)1Qi(θ2)=q}=P(Δi(θ)=1,TitQi(θ2)=q) if either CT, Y(2) (model may be misspecified) or Q(θ2)=Pr(TtY(2)) i.e. the Cox model is correctly specified though censoring may be such that CT | Y(2) (double robustness). We first write the i.i.d representation of n[NRI~(θ,t)NRI(θ,t)] for any θ. Note that n{NRI~(θ,t)NRI(θ,t)}=2n{Pr~(Δ(θ)1Tt)Pr(Δ(θ)=1Tt)}2n{Pr~(Δ(θ)=1T>t)Pr(Δ(θ)=1T>t)}. We first examine the initial component,

Pr~(Δ(θ^)=1Tt)=iΔ(θ^)I(Xit)δiH~Qi(θ^2)(1)(Xi)iI(Xit)δiH~Qi(θ^2)()(Xi)N^(t,θ^,H~)D^(t,θ^,H~)

where N^(t,θ,H)=n1iΔi(θ)I(Xit)δiHQi(θ2)(1)(Xi) and D^(t,θ,H)=n1iI(Xit)δiHQi(θ2)()(Xi). Let N(t,θ)=Pr(Δ(θ)=1,Tt) and D(t)=Pr(Tt). Then by the uniform consistency of the IPW weights, we have

n{Pr~(Δ(θ)=1Tt)Pr(Δ(θ)=1Tt)}n{N^(t,θ,H~)D(t)N(t,θ)D^(t,θ,H~)}D(t)2.

Examining the numerator, n{N^(t,θ,H~)D(t)N(t,θ)D^(t,θ,H~)}=n{(1)+(2)(3)} where (1)=N^(t,θ,H)D(t)D^(t,θ,H)N(t,θ),(2)=N^(t,θ,H~)D(t)N^(t,θ,H)D(t), and (3)=[N(t,θ)D^(t,θ,H~)D^(t,θ,H)N(t,θ)]. Note that

(1)=n(N^(t,θ,H)D(t)D^(t,θ,H)N(t,θ))=n12U1i(t),whereU1i(t)=I(Xit)δiHQi(θ2)(1)(Xi)Δi(θ)D(t)I(Xit)δiHQi(θ2)()(Xi)N(t,θ)

Using a Taylor series expansion, Lemma A.3 of Bilias et al. (1997) and the asymptotic expansion for Λ^q(t) given in Du and Akritas (2002),

(2)=D(t)n{N^(t,θ,H~)N^(t,θ,H)}=D(t)n12iΔi(θ)I(Xit)δiHQi(θ2)(1)(Xi)[HQi(θ2)(1)(Xi)H~Qi(θ2)(1)(Xi)1]=D(t)n120t[Hq(1)(s)H~q(1)(s)1]diΔi(θ)δiI(Xis,Qi(θ2)q)HQi(θ2)(1)(Xi)D(t)0tn[Λ^q(1)(s)Λq(1)(s)]d{1niΔi(θ)δiI(Xis,Qi(θ2)q)HQi(θ2)(1),(Xi)}D(t)0t[n12Kh{qQi(θ2)}MCq(1)(s,Xi,δi)]dP(Δ(θ)=1,Tt,Q(θ2)q)

where

MCq(1)(t,Xi,δi)=0tdNCi(s)I(Xis)dΛq(1)(s)πs(1)(q).

Now by a change of variable, ψ=qQi(θ2)h and f(t,q)2P(Δ(θ)=1,Tt,Q(θ2)q)tq,

(2)D(t)0tn[1nK(ψ)MC(ψh+Qi(θ2))(s,Xi,δi)]f(t,ψh+Qi)dsdψ=D(t)n120tK(ψ)a{s,hψ+Qi(θ2),Xi}dsdψ=n12U2i(t),

where U2i(t)=D(t)0ta(s,q,Xi)dsanda(t,q,Xi)=MCq(t,Xi,δi)f(t,q). Similar arguments can be used to obtain an asymptotic expansion for (3) as (3)n12U3i(t) and therefore, the numerator, n[N^(t,θ,H~)D(t)N(t,θ)D^(t,θ,H~)]n12{U1i(t)+U2i(t)+U3i(t)}. The same arguments as given above can be used to obtain an asymptotic expansion for n{Pr~(Δ(θ)=1T>t)Pr(Δ(θ)=1T>t)} as n12i=1nD(t)2{U1i(t)+U2i(t)+U3i(t)} where D(t), U–1i(t), U–2i(t), and U–3i(t) are defined similarly to D(t), U1i(t), U2i(t), and U3i(t) with Tt replaced with T > t. Therefore, n{NRI~(θ,t)NRI(θ,t)}n12i=1n2[D(t)2{U1i(t)+U2i(t)+U3i(t)}D(t)2{U1i(t)+U2i(t)+U3i(t)}]=n12i=1nηi(t).

Note that regardless of correct model specification, n(θ^θ0)=n12ψi+op(1) where ψi are i.i.d mean zero random variables by Lin and Wei (1989) and Uno et al. (2009). Using a Taylor series approximation and the i.i.d representation of n[NRI~(θ,t)NRI(θ,t)] for any θ, we can write W~(t)=n[NRI~(θ^,t)NRI(θ0,t)] as a sum of i.i.d terms, n12i=1ni(t) defined below.

n[NRI~(θ^,t)NRI(θ0,t)]=n[NRI~(θ^,t)NRI(θ^,t)+NRI(θ^,t)NRI(θ0,t)]n[NRI~(θ^,t)NRI(θ^,t)+NRI(t)θθ0(θ^θ0)=n[NRI~(θ^,t)NRI(θ^,t)]+n(θ^θ0)NRI(t)θθ0n[NRI~(θ^,t)NRI(θ^,t)]+n12ψiNRI(t)θθ0n12i=1nηi(t)+n12ψiNRI(t)θθ0=n12i=1ni(t)

where i(u,v,t)=ηi(u,v,t)+ψiNRI(t)θθ0. By a functional central limit theorem of Pollard (1990), the process W~(t) converges weakly to a mean zero Gaussian process in t.

Asymptotic Properties of NRI^(θ^,t)

Recall that we assume the Cox model is correctly specified and thus, Q(θ2)=Q(θ2,t,Y(2))=Pr(TtY(2))=1exp{Λ02(t)eβ2TY(2)} and SQi(θ2)(t)=Pr(T>tY(2))=exp{Λ02(t)eβ2Y(2)}. To derive asymptotic properties of NRI^(θ^,t) we assume the same regularity conditions as in Andersen and Gill (1982). The uniform consistency of Q(θ^2,t,Y(2)) for Q(θ2, t, Y(2)) in t and Y(2) follows directly from the uniform consistency of Λ^02(t) and β^2. It follows from the uniform law of large numbers (Pollard 1990) that NRI^(θ^,t) is uniformly consistent for NRI(θ0, t). Andersen and Gill (1982) show that n(β^2β02) is a normal random variable and n(Λ^02(t)Λ02(t)) converges to a Gaussian process. By the functional delta method it can be shown that n{Q(θ^2,t,Y(2))Q(θ2,t,Y(2))} converges to a zero mean Gaussian process in t and Y(2) (Zheng et al. 2008). Similar to the derivation for NRI~(θ~,t), it can be shown that the process N~(t)=n[NRI^(θ^,t)NRI(θ0,t)] is asymptotically equivalent to n12i=1nζi(u,v,t). In particular, for a fixed θ, n{NRI^(θ,t)NRI(θ,t)}n12i=1nηi(t) where ηi(t)=2[D(t)2{Δi(θ)Qi(θ2)Pr(Δi(θ)=1Tit)Qi(θ2)}D(t)2{Δi(θ)[1Qi(θ2)]Pr(Δi(θ)=1Ti>t)[1Qi(θ2)]}]. Thus, N~(t)n12i=1nζi(t) where ζi(u,v,t)=ηi(t)+ψiNRI(t)θθ0. Once again, using a functional central limit theorem, this implies that N~(t) converges to a Gaussian process with mean zero.

Contributor Information

Yingye Zheng, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, Seattle, WA 98109, USA.

Layla Parast, Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA.

Tianxi Cai, Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA.

Marshall Brown, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, Seattle, WA 98109, USA.

References

  1. Andersen P, Gill R. Cox's regression model for counting processes: a large sample study. Ann Stat. 1982;10:1100–1120. [Google Scholar]
  2. Bilias Y, Gu M, Ying Z. Towards a general asymptotic theory for Cox model with staggered entry. Ann Stat. 1997;25:662–682. [Google Scholar]
  3. Cai T, Tian L, Uno H, Solomon S, Wei L. Calibrating parametric subject-specific risk estimation. Biometrika. 2010;97:389–404. doi: 10.1093/biomet/asq012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Cook N. Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation. 2007;115:928. doi: 10.1161/CIRCULATIONAHA.106.672402. [DOI] [PubMed] [Google Scholar]
  5. Cook N, Buring J, Ridker P. The effect of including c-reactive protein in cardiovascular risk prediction models for women. Annals of Internal Medicine. 2006;145:21. doi: 10.7326/0003-4819-145-1-200607040-00128. [DOI] [PubMed] [Google Scholar]
  6. Cui J. Overview of risk prediction models in cardiovascular disease research. Ann Epidemiol. 2009;19:711–717. doi: 10.1016/j.annepidem.2009.05.005. [DOI] [PubMed] [Google Scholar]
  7. Dabrowska D. Smoothed cox regression. Ann Stat. 1997;25(4):1510–1540. [Google Scholar]
  8. Du Y, Akritas M. Uniform strong representation of the conditional Kaplan–Meier process. Math Methods Stat. 2002;11:152–182. [Google Scholar]
  9. Efron B, Tibshirani R. Improvements on cross-validation: the.632+ bootstrap method. J Am Stat Assoc. 1997;92(438):548–560. [Google Scholar]
  10. Gail M, Brinton L, Byar D, Corle D, Green S, Schairer C, Mulvihill J. Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. J Natl Cancer Inst (JNCI) 1989;81:1879. doi: 10.1093/jnci/81.24.1879. [DOI] [PubMed] [Google Scholar]
  11. Gu W, Pepe M. Measures to summarize and compare the predictive capacity of markers. Int J Biostat. 2009;5:27. doi: 10.2202/1557-4679.1188. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Hemann B, Bimson W, Taylor A. The framingham risk score: an appraisal of its benefits and limitations. Am Heart Hosp J. 2007;5:91–96. doi: 10.1111/j.1541-9215.2007.06350.x. [DOI] [PubMed] [Google Scholar]
  13. Hjort N. On inference in parametric survival data models. Int Stat Rev. 1992;60(3):355–387. [Google Scholar]
  14. Kannel W, Feinleib M, McNamara P, Garrison R, Castelli W. An investigation of coronary heart disease in families. Am J Epidemiol. 1979;110:281. doi: 10.1093/oxfordjournals.aje.a112813. [DOI] [PubMed] [Google Scholar]
  15. Khot U, Khot M, Bajzer C, Sapp S, Ohman E, Brener S, Ellis S, Lincoff A, Topol E. Prevalence of conventional risk factors in patients with coronary heart disease. JAMA. 2003;290:898–904. doi: 10.1001/jama.290.7.898. [DOI] [PubMed] [Google Scholar]
  16. Lin D, Wei L. The robust inference for the Cox proportional hazards model. J Am Stat Assoc. 1989;84:1074–1078. [Google Scholar]
  17. Lloyd-Jones D. Cardiovascular risk prediction. Circulation. 2010;121:1768–1777. doi: 10.1161/CIRCULATIONAHA.109.849166. [DOI] [PubMed] [Google Scholar]
  18. Pencina M, D'Agostino R., Sr Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers. Stat Med. 2011;30:11–21. doi: 10.1002/sim.4085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Pencina M, D'Agostino R, Sr, D'Agostino R., Jr Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med. 2008;27:157–172. doi: 10.1002/sim.2929. [DOI] [PubMed] [Google Scholar]
  20. Pollard D. Empirical processes: theory and applications. Institute of Mathematical Statistics; Hayward: 1990. [Google Scholar]
  21. Satten G, Datta S. The kaplan-meier estimator as an inverse-probability-of-censoring weighted average. Am Stat. 2001;55:207–210. doi: 10.1198/000313001317098185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Tian L, Cai T, Goetghebeur E, Wei L. Model evaluation based on the sampling distribution of estimated absolute prediction error. Biometrika. 2007;94:297–311. [Google Scholar]
  23. Uno H, Cai T, Tian L, Wei L. Evaluating prediction rules for t-year survivors with censored regression models. J Am Stat Assoc. 2007;102:527–537. [Google Scholar]
  24. Uno H, Tian L, Cai T, Kohane I, Wei L. Comparing risk scoring systems beyond the roc paradigm in survival analysis. Harvard University Biostatistics Working Paper Series. 2009. p. 107.
  25. Wilson P, D'Agostino R, Levy D, Belanger A, Silbershatz H, Kannel W. Prediction of coronary heart disease using risk factor categories. Circulation. 1998;97:1837. doi: 10.1161/01.cir.97.18.1837. [DOI] [PubMed] [Google Scholar]
  26. Zheng Y, Cai T, Pepe M, Levy W. Time-dependent predictive values of prognostic biomarkers with failure time outcome. J Am Stat Assoc. 2008;103:362–368. doi: 10.1198/016214507000001481. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES