Skip to main content
. 2018 Apr 3;5:180053. doi: 10.1038/sdata.2018.53

Table 1. Performance of a random forest model predicting conductivity given different feature-sets and cross validation resampling schemes.

Feature Set   Compositional (RMSE/SD,mtry) Structural (RMSE/SD, mtry) Synthesis (RMSE/SD, mtry) Combined (RMSE/SD, mtry)
The root mean square error (RMSE) and standard deviation (SD) provides the average error and variance in log10(S/m) for the cross validation. A parameter sweep was used to determine the best value for mtry, which is the number of randomly chosen parameters at each step of the tree building algorithm. The models tuned using per-position cross validation tend to overfit the data with a larger mtry, while mtry=2 performs well for the more conservative resampling schemes.          
Cross Validation Method Positions (samples) 1.35/0.02, 2 2.15/0.01, 16 0.976/0.03, 17 0.67/0.03, 129
  Sample Libraries 1.68/0.13, 2 2.25/0.12, 2 1.93/0.11, 2 1.61/0.15, 2
  Studies 1.83/0.48, 2 1.94/0.42, 2 1.93/0.55, 2 2.05/0.55, 2