Skip to main content
. 2022 Mar 2;17:17. doi: 10.1186/s13024-022-00517-z

Table 2.

Summary of the clustering analysis approaches for scRNA-seq data

Method Clustering strategy Dimension reduction Similarity Notes
Expression-based
SC3 [111] consensus k-means in multiple similarity matrices PCA Euclidean distance, Spearman’s correlation, Pearson’s correlation Joint calculation of multiple similarity matrices increases the computational burden
SIMLR [112] A Gaussian kernel is jointly learned on Euclidean and Spearman’s correlations to infer block structure in cell-cell similarity. t-SNE on learned cell-cell similarity Euclidean distance, Spearman’s correlation, Pearson’s correlation Searches for consensus block structures in multiple similarities
DBSCAN [113] density-based clustering user choice (usually t-SNE is preferred) NA Results may vary due to the stochasticity of t-SNE
PhenoGraph [114] k-nearest neighbor graph NA Jaccard index, Euclidean distance Jaccard index is used to prune spurious links. GN modularity is optimized by Louvain’s algorithm
SNN-Cliq [115] shared k-nearest neighbor graph NA Euclidean distance Maximal clique search is performed for small cliques. Quasi-cliques connecting the detected maximal cliques are further detected to identify dense subnetworks.
MetaCell [116] k-nearest neighbor graph NA Pearson’s correlation A series of regularizations are performed to construct a balanced, symmetrized, and weighted graph. This is followed by a variant k-means search in the graph.
scvis [117] Model-based deep generative modeling to train deep neural network-based model Deep neural network-based NA Log-likelihood of noise model serves as the loss to train a deep auto-encoder-based model.
scVI [96] Model-based deep generative modeling to train deep neural network-based model Deep neural network-based NA Similar to scvis. Additional noise parameters for dropout reads by ZINB and library sizes as Gaussian noises.
DESC [118] Neural network based dimension reduction + Louvain’s method-based iterative clustering. Deep neural network-based NA Autoencoder learns cluster-specific gene expressions, and handles technical variances (e.g. batch effects) when they are smaller than biological variances. GPU enabled to scale up for millions of cells. Combination of Louvain’s clustering and t-distribution based cluster assignment refines the clusters iteratively in the bottleneck layer.
Genotype-based
demuxlet [110] supervised clustering of cells based on genotypes NA NA likelihood of cell belonging to an individual is calculated based on alternate allele frequency
Vireo [108] supervised clustering of cells based on genotypes NA NA variational Bayesian inference allows estimation on the number of unique individuals with distinct genotypes. Cells are assigned to the individual with maximum likelihood
scSplit [109] unsupervised clustering of cells based on allele fraction model NA NA Expectation-Maximization (EM) optimization of Allele Fraction model to probability of observing alternate alleles from individuals.
Souporcell [72] mixture modeling NA NA minimap2 instead of STAR aligner to optimize variant calling in scRNA-seq reads. The mixture model is fitted in the allele fraction model to perform clustering in genotype space.
DENDRO [107] phylogeny reconstruction based on genetic divergence in cells NA NA Intended for tumoral heterogeneity. Genetic divergence is modeled with nuisance variables such as dropout rates and library sizes.