Expression-based |
SC3 [111] |
consensus k-means in multiple similarity matrices |
PCA |
Euclidean distance, Spearman’s correlation, Pearson’s correlation |
Joint calculation of multiple similarity matrices increases the computational burden |
SIMLR [112] |
A Gaussian kernel is jointly learned on Euclidean and Spearman’s correlations to infer block structure in cell-cell similarity. |
t-SNE on learned cell-cell similarity |
Euclidean distance, Spearman’s correlation, Pearson’s correlation |
Searches for consensus block structures in multiple similarities |
DBSCAN [113] |
density-based clustering |
user choice (usually t-SNE is preferred) |
NA |
Results may vary due to the stochasticity of t-SNE |
PhenoGraph [114] |
k-nearest neighbor graph |
NA |
Jaccard index, Euclidean distance |
Jaccard index is used to prune spurious links. GN modularity is optimized by Louvain’s algorithm |
SNN-Cliq [115] |
shared k-nearest neighbor graph |
NA |
Euclidean distance |
Maximal clique search is performed for small cliques. Quasi-cliques connecting the detected maximal cliques are further detected to identify dense subnetworks. |
MetaCell [116] |
k-nearest neighbor graph |
NA |
Pearson’s correlation |
A series of regularizations are performed to construct a balanced, symmetrized, and weighted graph. This is followed by a variant k-means search in the graph. |
scvis [117] |
Model-based deep generative modeling to train deep neural network-based model |
Deep neural network-based |
NA |
Log-likelihood of noise model serves as the loss to train a deep auto-encoder-based model. |
scVI [96] |
Model-based deep generative modeling to train deep neural network-based model |
Deep neural network-based |
NA |
Similar to scvis. Additional noise parameters for dropout reads by ZINB and library sizes as Gaussian noises. |
DESC [118] |
Neural network based dimension reduction + Louvain’s method-based iterative clustering. |
Deep neural network-based |
NA |
Autoencoder learns cluster-specific gene expressions, and handles technical variances (e.g. batch effects) when they are smaller than biological variances. GPU enabled to scale up for millions of cells. Combination of Louvain’s clustering and t-distribution based cluster assignment refines the clusters iteratively in the bottleneck layer. |
Genotype-based |
demuxlet [110] |
supervised clustering of cells based on genotypes |
NA |
NA |
likelihood of cell belonging to an individual is calculated based on alternate allele frequency |
Vireo [108] |
supervised clustering of cells based on genotypes |
NA |
NA |
variational Bayesian inference allows estimation on the number of unique individuals with distinct genotypes. Cells are assigned to the individual with maximum likelihood |
scSplit [109] |
unsupervised clustering of cells based on allele fraction model |
NA |
NA |
Expectation-Maximization (EM) optimization of Allele Fraction model to probability of observing alternate alleles from individuals. |
Souporcell [72] |
mixture modeling |
NA |
NA |
minimap2 instead of STAR aligner to optimize variant calling in scRNA-seq reads. The mixture model is fitted in the allele fraction model to perform clustering in genotype space. |
DENDRO [107] |
phylogeny reconstruction based on genetic divergence in cells |
NA |
NA |
Intended for tumoral heterogeneity. Genetic divergence is modeled with nuisance variables such as dropout rates and library sizes. |