Table 3.
Variable Importance Measures for the Random Forest Classifier when Differentiating N from DM Individuals
Mean Decrease | Mean Decrease in | |
---|---|---|
Variable | in Accuracy | Gini Coefficient |
Adjacent area sum | 21.11 (21.059, 21.15) | 18.45 (18.44, 18.47) |
Circ max | 10.21 (10.17, 10.26) | 15.27 (15.25, 15.28) |
Ave percent sk | 6.44 (6.40, 6.48) | 11.01 (11.00, 11.017) |
Mean ves int | 2.00 (1.96, 2.045) | 7.69 (7.68, 7.70) |
Mean cap int | 12.81 (12.76, 12.85) | 12.99 (12.97, 13.00) |
The table shows the mean decrease in accuracy estimated from 2000 runs of the Random Forest algorithm applied to the full dataset. The more the accuracy of the Random Forest classifier decreases when breaking the link between the predictor variable and the outcome variable, the more important that predictor variable. Variables with a large mean decrease in accuracy are therefore more important for classification of the data. The Gini importance measures the mean decrease of node impurity by splits of a given variable. If the variable is useful, it tends to split mixed labeled nodes into pure single class nodes. The table shows the average across the 2000 runs of the algorithm of the mean decrease in Gini coefficient for each predictor variable. In all cases the numbers in brackets denote the range corresponding to ±1.96 standard errors around the average. Thus we can see that area of ischemic zones around FAZ (adjacent area sum) and FAZ circularity (circ max) are the most important variables.