Figure 6.
RF based selection of seed identity markers from mass features of non-targeted metabolite profiles of experimental bakery products that were prepared with or without additions of chia, linseed, or sesame seeds. (A) Analysis of variable importance by mean decrease of Gini index and mean decrease of accuracy measures (means ± standard deviations) of 12 random forest analyses using 84 pre-selected processing-dependent mass features and eight manually added mass features containing previously identified markers of non-processed seeds. These mass features were selected from 19761 mass features of a non-targeted metabolite profiling analysis of polar extracts from experimental cookies that were prepared with 5, 10, 15, or 20% (w/w) defatted seed flour or 10 or 20% (w/w) whole seeds (Supplemental Table S4). The classification models predicted four classes, cookies without added seeds and cookies with chia, linseeds or sesame seeds irrespective of amount of added seed material or seed pre-processing. The importance of top mass features was ranked according to mean decreases in accuracy (Supplemental Table S4). (B) Characterization of the trained classification models by a confusion matrix, class false negative rates (FNR) and class false discovery rates (FDR). Averages (AVG) and maxima (MAX) of class FNR and FDR were calculated from the individual confusion matrices of 12 classification models that were trained from 46 random samplings of a total set of 93 profiles of cookies without added seeds (n = 5) and cookies with chia (n = 28), linseeds (n = 30) or sesame seeds (n = 30). The overall classification error was 6.70 ± 3.27% (mean ± standard deviation).