Skip to main content
. 2010 Sep 1;12(5):R66. doi: 10.1186/bcr2633

Table 2.

Pseudocode of nested cross-validation for model selection and model assessment

Repeat 100 times:
Divide the data into 10 outer folds
Repeat 10 times:
Keep 1 outer fold for testing
Select the remaining 9 outer folds for training
Divide the 9 outer training folds into 10 inner folds
Repeat 10 times:
Keep 1 inner fold for testing
Select the remaining 9 inner folds for training
Move all variables into the list of available variables
Create an empty list of nested model variables
Iterate this backward selection procedure until only 1 variable is left in the list of available variables:
Train Cox models on the inner training set. Each Cox model contains all available variables except of 1 variable at a time
Select the variable that contributes the least to the model likelihood
Move the selected variable from the list of available variables to the top of the list of nested model variables
Move the last available variable to the top of the list of nested model variables
Iterate over the list of nested variables:
Train the Cox model containing the present variable and the variables above it in the list of nested variables using the inner training set.
Evaluate the average time-dependent area under the receiver operating characteristic curve (ATD-AUCROC) h of the present Cox model using the 1 inner testing fold.
Record the variable usage U in the present Cox model and the size n of the model. UX(vm) = 1 if vm is in model X, 0 otherwise.
Estimate:
- the expected model size <n> = ΣX(hX nX)/ΣX(hX)
- the (inner) variable stability score for each variable vm: <vm> = ΣX(hx UX(vm))/ΣX(hx)
Train the Cox model containing the most stable <n> variables using the outer training set.
Evaluate the ATD-AUCROC k of the present Cox model using the 1 outer testing fold.
Record the variable usage T in the present Cox model and the size s of the model.
TX(vm) = 1 if vm is in model X, 0 otherwise.