Construction of an RF classifier using the single-cell Raman dataset and its performance on species classification
(A) Average of the preprocessed Raman spectra for each of the six prokaryotic species (E. coli, B. subtilis, T. thermophilus, T. kodakarensis, S. acidocaldarius, and N. viennensis). Top 50 most important features are shown as vertical lines. The dashed rectangular box indicates the region for which the CH2-stretching band intensities were calculated (see Figure 4B).
(B) Averaged class probabilities in 10-fold cross-validation. Error bars represent ±SD (n = 10). The asterisks represent statistically significant differences between the probability of being predicted to be in the true class and those in other classes, with Welch's t test (P < 0.05).
(C) Confusion matrix, C, for six strain classes. Each entry of the confusion matrix, Cij, represents the total number of spectra known to be in class i and predicted by the RF classifier to be in class j in 10-fold cross-validation. Correct classification results are shown in red boxes on the diagonal, and misclassification results in blue boxes. Also shown are the precision and recall rates in percentage.