Effect of gene filtering on annotation performance
A. The features (genes) in the human pancreas Fluidigm C1 dataset were filtered by removing genes that are present in less than three cells, resulting in 19,211 genes. The filtered genes were randomly downsampled into 5000, 10,000, and 15,000 input features, following the original log count distribution. Such downsampling was repeated five times. SCINA failed when the number of features reached 5000, thus no point is shown. B. Plots depicting three classification metrics (overall accuracy, ARI, and V-measure) of each method applied to downsampling approaches in (A). C. The BAM file reads in the human pancreas Fluidigm C1 dataset were randomly downsampled into 25%, 50%, and 75% of the original read depth. D. Plots depicting three classification metrics (overall accuracy, ARI, and V-measure) of each method applied to downsampling approaches in (C). In (B) and (D), the x-axis is the downsampling size for feature number or read depth, and the y-axis is the metric score. Results are shown as mean ± SD. Line colors and point shapes correspond to different methods.