. 2022 Mar 2;17:17. doi: 10.1186/s13024-022-00517-z

Table 2.

Summary of the clustering analysis approaches for scRNA-seq data

Method	Clustering strategy	Dimension reduction	Similarity	Notes
*Expression-based*
SC3 [111]	consensus k-means in multiple similarity matrices	PCA	Euclidean distance, Spearman’s correlation, Pearson’s correlation	Joint calculation of multiple similarity matrices increases the computational burden
SIMLR [112]	A Gaussian kernel is jointly learned on Euclidean and Spearman’s correlations to infer block structure in cell-cell similarity.	t-SNE on learned cell-cell similarity	Euclidean distance, Spearman’s correlation, Pearson’s correlation	Searches for consensus block structures in multiple similarities
DBSCAN [113]	density-based clustering	user choice (usually t-SNE is preferred)	NA	Results may vary due to the stochasticity of t-SNE
PhenoGraph [114]	k-nearest neighbor graph	NA	Jaccard index, Euclidean distance	Jaccard index is used to prune spurious links. GN modularity is optimized by Louvain’s algorithm
SNN-Cliq [115]	shared k-nearest neighbor graph	NA	Euclidean distance	Maximal clique search is performed for small cliques. Quasi-cliques connecting the detected maximal cliques are further detected to identify dense subnetworks.
MetaCell [116]	k-nearest neighbor graph	NA	Pearson’s correlation	A series of regularizations are performed to construct a balanced, symmetrized, and weighted graph. This is followed by a variant k-means search in the graph.
scvis [117]	Model-based deep generative modeling to train deep neural network-based model	Deep neural network-based	NA	Log-likelihood of noise model serves as the loss to train a deep auto-encoder-based model.
scVI [96]	Model-based deep generative modeling to train deep neural network-based model	Deep neural network-based	NA	Similar to scvis. Additional noise parameters for dropout reads by ZINB and library sizes as Gaussian noises.
DESC [118]	Neural network based dimension reduction + Louvain’s method-based iterative clustering.	Deep neural network-based	NA	Autoencoder learns cluster-specific gene expressions, and handles technical variances (e.g. batch effects) when they are smaller than biological variances. GPU enabled to scale up for millions of cells. Combination of Louvain’s clustering and t-distribution based cluster assignment refines the clusters iteratively in the bottleneck layer.
*Genotype-based*
demuxlet [110]	supervised clustering of cells based on genotypes	NA	NA	likelihood of cell belonging to an individual is calculated based on alternate allele frequency
Vireo [108]	supervised clustering of cells based on genotypes	NA	NA	variational Bayesian inference allows estimation on the number of unique individuals with distinct genotypes. Cells are assigned to the individual with maximum likelihood
scSplit [109]	unsupervised clustering of cells based on allele fraction model	NA	NA	Expectation-Maximization (EM) optimization of Allele Fraction model to probability of observing alternate alleles from individuals.
Souporcell [72]	mixture modeling	NA	NA	minimap2 instead of STAR aligner to optimize variant calling in scRNA-seq reads. The mixture model is fitted in the allele fraction model to perform clustering in genotype space.
DENDRO [107]	phylogeny reconstruction based on genetic divergence in cells	NA	NA	Intended for tumoral heterogeneity. Genetic divergence is modeled with nuisance variables such as dropout rates and library sizes.