1: |
for b = 1, …, B do
|
2: |
Draw a bootstrap sample of the data; fit the model to the bootstrap data. |
3: |
Calculate the prediction error, Errb, using the OOB data. |
4: |
Noise up the OOB data for β; use this to calculate the OOB error, Errβ;b. |
5: |
Calculate the bootstrap VIMP index δβ;b = Errβ;b − Errb. |
6: |
end for |
7: |
Calculate the VIMP index by averaging:
. |
8: |
The OOB error for the model can also be obtained using
|
As stated the algorithm provides a VIMP index for a given variable β but in practice one applies the same procedure for all variables in the model. The same bootstrap samples are to be used when doing so. This is required because it ensures that the VIMP index for each variable is always compared with the same value Errb.
Because all calculations are run independently of one another Algorithm 1 can be implemented using parallel processing. This makes the algorithm extremely fast and scalable to big data settings. The most obvious way to parallelize the algorithm is on the bootstrap sample. Thus on a specific computing machine on a cluster a single bootstrap sample is drawn and Errb determined. Steps 4 and 5 are then applied to each variable in the model for the given bootstrap draw. Results from different computing machines on the computing cluster are then averaged as in Steps 7 and 8.
Noising up a variable is typically done by permuting its data. This is called permutation noising up and is used for nonparametric regression models. In the case of parametric and semiparametric regression models (such as Cox regression) in place of permutation noising up the regression coefficient estimate for b is set to zero. Setting the coefficient to zero is equivalent to setting the OOB data for β to zero and is a special feature of parametric models that provides a more direct and convenient way to noise up the data.
As a side effect the algorithm can also be used to return the OOB error rate for the model Erroob (see Step 8). This can be useful for assessing the effectiveness of the model and identifying poorly constructed models.
Algorithm 1 requires being able to calculate prediction error. The type of prediction error used will be context specific. For example in linear regression prediction error can be measured using mean squared error or standardized mean squared error. In classification problems prediction error is typically defined by misclassification. In survival problems a common measure of prediction performance is Harrell’s concordance index. Thus unlike the P value the interpretation of the VIMP index will be context specific.
|