Performance comparison on rare or unknown cell groupdetection
All datasets were generated by Splatter. A. Cell population distribution of simulation data (10 repeats), composed of 10,000 genes and 2000 cells, splitting into 9 cell types with cell number proportions of 51.25%, 24.70%, 11.85%, 6.50%, 2.70%, 1.70%, 0.85%, 0.30%, and 0.15%, respectively. B. Plot illustrating cell type-specific accuracy across 9 cell groups in (A), for the five annotation methods with overall accuracy > 0.8 and ARI > 0.8. The x-axis is cell groups in descending order for their cell proportions, and the y-axis is the cell type-specific accuracy score. Results are shown as mean ± SD over ten repetitions. C. Boxplots showing performance metrics (overall accuracy, ARI, and V-measure) of another simulation dataset, composed of 4000 genes and 2000 cells splitting into 5 cell types. During each prediction, one cell group was removed from the reference matrix and the query remained intact. The x-axis lists methods with the rejection option (i.e., allowing “unknown” labels), and the y-axis is the classification metric after excluding the leave-out group. D. A boxplot showing the overall accuracy of methods in (C), when assigning “unknown” class to the leave-out group in the query.