Effect of increased DE scalesand cell type classes on annotation performance
A. PCA plots of simulation datasets generated by Splatter. Each dataset is composed of 10,000 genes and 2000 cells, splitting into 5 cell types with equal proportion. Each dataset contains the same proportion of DE genes in each cell type. The datasets differ by the magnitude of DE factors for those DE genes to modify cell–cell similarities. We generated 20 datasets with the cell group similarity ranging from low, low–moderate, moderate to high DE (see Materials and methods). Colors represent different cell types. B. The evaluation of each annotation method applied to the datasets in (A) is shown by plots of three classification metrics: overall accuracy, ARI, and V-measure. The x-axis is the gene DE scale in each cell group, and the y-axis is the metric score. Results are shown as mean ± SD over five repetitions. Line colors and point shapes correspond to different methods. C. The performance of each method for increased cell type classes is shown by plots of three classification metrics: overall accuracy, ARI, and V-measure. Each simulation dataset is composed of an increased number (N) of cell types (N = 10, 20, 30, 40, 50) with constant total cell number (10,000), gene number (20,000), and DE level among cell types. The x-axis is the number of cell type classes in each dataset, and the y-axis is the metric score. PCA, principal component analysis; DE, differential expression.