Table 2.
Results of regression models created with random forest. The R2 values of random forest model with entire set of variables and those with only most important variables are presented for the bean and maize aggregate phene metrics.
| Aggregate phenotypic metric | R 2 (% variance explained) | |||
|---|---|---|---|---|
| Bean | Maize | |||
| Model with all variables | Model with most important variables | Model with all variables | Model with most important variables | |
| Total length | 89.5 | 91.6 | 82 | 85 |
| Total area | 87 | 87 | 78 | 81 |
| Total volume | 81.7 | 88.5 | 79 | 81.6 |
| Volume distribution | 87 | 91 | 61 | 66 |
| Max no. of roots | 78.8 | 84 | 67 | 72.8 |
| Median no. of roots | 79.9 | 87 | 71 | 75 |
| Bushiness | 62 | 67 | 36 | 41 |
| Max depth | 98.6 | 99.6 | 79 | 84 |
| Max width | 91 | 90 | 95 | 99 |
| Convex hull area | 97.8 | 97 | 90 | 93.4 |
| Convex hull volume | 97.6 | 97.6 | 87 | 89.9 |
| Ellipse minor axis | 94.9 | 93.6 | 80 | 85 |
| Ellipse major axis | 96.7 | 97.3 | 95 | 98.6 |
| Ellipse aspect ratio | 85.9 | 87.4 | 51.9 | 62 |
| Solidity | 97.4 | 97.5 | 89 | 89 |
| FD | 67 | 68 | 16 | 20 |
| FA | 93.5 | 94.9 | 88 | 90 |
Random forest possesses its own reliable statistical characteristics, which could be used for validation and model selection. The major criterion for estimation of internal predictive ability of the random forest models and model selection is the value of R2. R2 in random forest is interpreted as a measure of predictive quality of random forest model on independent samples. Random forest models were run with the aggregate phenotype as dependent variable and all the phenes as predictor variables. Most important variables were chosen based on the % increase in mean square, and random forest models were run with only the most important variables.