Skip to main content
. 2022 May 16;23:114. doi: 10.1186/s13059-022-02682-2

Fig. 3.

Fig. 3

Evaluation of the performance of cell clustering. A Scatter plot visualize the Umap embedding colored by clustering label from different methods including Cell Ranger on gene expression, Cell Ranger on chromatin accessibility, Seurat V4, intNMF, and scREG. B Same Umap as shown in A but colored by the surrogate ground truth. We see Cell Ranger RNA-seq did not distinguish naive CD4 T cells from the naive CD8 T cells, and CD56 (dim) NK cells from the effector CD8 T cells. ATAC-seq failed to separate non-classical monocytes and the myeloid DC, while scREG separates them clearly. In Seurat, the boundary between memory B cells and naïve B cells is shifted so a large proportion of memory B cells are labeled as naïve B cells. Clustering performance also assessed by calculating normalized mutual information (NMI) and adjusted Rand index (ARI) based on the surrogate ground truth. CE scREG clusterings on 10X multiome data from human cerebellum, mouse E18 brain, and lymph node from B cell lymphoma. The clustering results are consistent with the known cell types and marker genes’ expression. F The comparison of scREG with Seurat by four different clustering evaluation metrics on three datasets. The distance among cells are calculated as Euclidean distance on the top 20 principal components of gene expression and chromatin accessibility, respectively. X axis represents the metric calculated based on Seurat clustering label, and Y axis represent that from scREG clustering. Colors represent different data sets and shape represents different data type (triangle for scRNA-seq and diamond for scATAC-seq). A lower Davies-Bouldin index indicate better clustering, but the other three metrics are the higher the better. The scREG perform better for all case than the Seurat