Skip to main content
. Author manuscript; available in PMC: 2016 Oct 1.
Published in final edited form as: Stat Biosci. 2014 Aug 23;7(2):282–295. doi: 10.1007/s12561-014-9118-0

Table 3.

Measures of improvement in prediction when risk models are fit to a training dataset of N-training=50 observations and assessed on a test dataset of N-test=5000 observations where P(D = 1) = 0.5. The linear logistic regression models fit to the training datasets are (i) baseline logitP(D = 1|X) = α0 + α1X; (ii) model(X, Y) : logitP(D = 1|X,Y) = β0 + β1X + β2Y; (iii) model(X, Y, XY) : logitP(D = 1|X,Y) = γ0 + γ1X + γ2Y + γ3XY. The baseline model is correct. The marker Y is informative and the correct expanded model is model(X, Y) while the model(X, Y, XY) is overfit. Data generation is described in Supplementary Materials. Shown are the true values of the performance measures (True) that use the true risk, as well as averages of estimated performance using training and test data over 1,000 simulations. All measures shown as %.

One Informative Marker
NRI ΔROC(0.2) ΔAUC ΔBrier ΔSNB(ρ)





AUCX AUCY True Model (X,Y) Model (X,Y,XY) True Model (X,Y) Model (X,Y,XY) True Model (X,Y) Model (X,Y,XY) True Model (X,Y) Model (X,Y,XY) True Model (X,Y) Model (X,Y,XY)
0.7 0.7 28.32 23.37 22.74 3.25 0.99 −0.89 1.98 0.64 −0.69 0.61 0.16 −0.41 3.21 0.98 −0.40
0.8 0.7 28.39 23.37 27.03 1.97 0.14 −1.13 1.03 0.06 −0.80 0.47 0.01 −0.45 1.84 0.15 −0.90
0.8 0.8 57.84 55.65 55.82 7.73 6.09 4.99 3.94 3.12 2.25 1.89 1.48 1.00 7.10 5.39 4.59
0.9 0.7 28.41 27.16 39.92 0.88 − 0.31 −1.15 0.43 −0.16 −0.77 0.29 −0.16 −0.59 0.92 −0.33 −1.25
0.9 0.8 57.87 57.13 63.05 3.40 2.31 1.61 1.69 1.14 0.53 1.19 0.74 0.30 3.79 2.49 1.71
0.9 0.9 89.58 87.02 88.18 7.35 6.46 5.59 3.74 3.23 2.54 2.81 2.40 1.95 8.85 7.50 6.74