Skip to main content
. 2019 Sep 30;4(16):16774–16780. doi: 10.1021/acsomega.9b01512

Figure 1.

Figure 1

Summary of statistical analysis: (a) Important variables selected via random forest (green circle, eight descriptors) and gradient boost machine (blue circle, five descriptors) algorithms; (b) variable importance as calculated by the random forest algorithm; (c) relative importance of variables predicted by the gradient boost machine method. The overlapping zone of the two circles contains descriptors common to both of the machine learning approaches. MeanDecreaseGini is the mean of a variable’s total decrease in node impurity weighted by the proportion of samples reaching that specific node in each individual decision tree in a random-forest-based classification. The larger is this value, the larger is the contribution of the corresponding descriptor.