Skip to main content
Briefings in Bioinformatics logoLink to Briefings in Bioinformatics
. 2024 Jan 2;25(1):bbad497. doi: 10.1093/bib/bbad497

A critical assessment of clustering algorithms to improve cell clustering and identification in single-cell transcriptome study

Xiao Liang 1,2, Lijie Cao 3, Hao Chen 4, Lidan Wang 5, Yangyun Wang 6, Lijuan Fu 7,8, Xiaqin Tan 9, Enxiang Chen 10,11,, Yubin Ding 12,13,, Jing Tang 14,15,
PMCID: PMC10782910  PMID: 38168839

Abstract

Cell clustering is typically the initial step in single-cell RNA sequencing (scRNA-seq) analyses. The performance of clustering considerably impacts the validity and reproducibility of cell identification. A variety of clustering algorithms have been developed for scRNA-seq data. These algorithms generate cell label sets that assign each cell to a cluster. However, different algorithms usually yield different label sets, which can introduce variations in cell-type identification based on the generated label sets. Currently, the performance of these algorithms has not been systematically evaluated in single-cell transcriptome studies. Herein, we performed a critical assessment of seven state-of-the-art clustering algorithms including four deep learning-based clustering algorithms and commonly used methods Seurat, Cosine-based Tanimoto similarity-refined graph for community detection using Leiden’s algorithm (CosTaL) and Single-cell consensus clustering (SC3). We used diverse evaluation indices based on 10 different scRNA-seq benchmarks to systematically evaluate their clustering performance. Our results show that CosTaL, Seurat, Deep Embedding for Single-cell Clustering (DESC) and SC3 consistently outperformed Single-Cell Clustering Assessment Framework and scDeepCluster based on nine effectiveness scores. Notably, CosTaL and DESC demonstrated superior performance in clustering specific cell types. The performance of the single-cell Variational Inference tools varied across different datasets, suggesting its sensitivity to certain dataset characteristics. Notably, DESC exhibited promising results for cell subtype identification and capturing cellular heterogeneity. In addition, SC3 requires more memory and exhibits slower computation speed compared to other algorithms for the same dataset. In sum, this study provides useful guidance for selecting appropriate clustering methods in scRNA-seq data analysis.

Keywords: single-cell RNA sequencing, clustering algorithms, deep learning, performance evaluation, cell identification

INTRODUCTION

Single-cell RNA sequencing (scRNA-seq) is a popular and powerful technology that enables the profiling of the whole transcriptome of a large number of individual cells [1, 2]. Compared with bulk RNA-seq, scRNA-seq has the following characteristics: high sensitivity [3], unbiasedness [4, 5], dynamic monitoring [6] and enabling to detect individual differences [7]. ScRNA-seq has brought important biological insights and discoveries, which can be applied for facilitating the understanding of development of different tissues and organs [8], revealing spatial and functional heterogeneity in tumor microenvironment [9, 10], investigating host–pathogen interactions under maximum containment [11, 12], and providing a promising solution for studying intercellular communication and signal transduction in cardiovascular diseases [13].

In the data processing protocols of scRNA-seq experiments, cell-type identification is a vital step for subsequent analysis [14, 15]. A typical strategy for cell-type identification is cluster-based annotation, indicating that all cells were first clustering by graph-based clustering and then used differentially expressed marker genes at the level of pre-computed clusters to annotate the clusters [14]. However, the process of cell-type identification using graph-based clustering may encounter challenges in accurately separating different types of cells, potentially resulting in the inability to distinguish certain cell types. This may be attributed to the following: batch effects cannot be effectively removed [16]; scRNA-seq data are sparse (due to low RNA capture rate) [17]; clustering algorithm resolution depends on subjective judgment [18] and the increasing number of cells which may reach thousands to millions [19].

To address the issue of unreliable clustering results, caused by the aforementioned problems, a variety of state-of-the-art clustering methods have been developed in single-cell transcriptome study, which included four deep learning-based clustering algorithms such as Deep Embedding for Single-cell Clustering (DESC) [19], single-cell Variational Inference tools (scVI) [20], scDeepCluster [21] and Single-Cell Clustering Assessment Framework (SCCAF) [22]. In addition to their application in investigating cell states and phenotypes in scRNA-seq data [23, 24], deep learning methods have gained popularity and been extensively applied in other bioinformatics fields, including predicting chromatin loops [25], chromatin interactions [26] and promoters [27]. Moreover, there have been recent developments of other clustering algorithms specifically designed for scRNA-seq data analysis, such as Single-cell consensus clustering (SC3) [28], PARC [29], Cosine-based Tanimoto similarity-refined graph for community detection using Leiden’s algorithm (CosTaL) [30] and SCMcluster [31]. Since clustering algorithms are sensitive to sequencing platforms [32] and dropouts [33], the results of different clustering algorithms can also vary greatly. Even if the same algorithm is used to process different data, as the points representing cells in the scRNA-seq data become more similar when represented in high-dimensional space, the performance of clustering algorithms will also be affected by data from different dimensions [34, 35]. Thus, it is crucial to evaluate the performance of these diverse clustering algorithms to improve cell clustering and identification in single-cell transcriptome study.

Currently, several existing clustering algorithms have been evaluated and compared in different studies. For instance, Duò et al. [36] performed a systematic assessment of 14 clustering algorithms and found that SC3 and Seurat exhibited the most favorable cluster results. Stassen et al. developed a new algorithm called PARC [29] and compared its performance against other state-of-the-art clustering algorithms, including PhenoGraph, FlowSOM and Flock, and demonstrating that PARC consistently outperformed them. Li et al. [30] introduced a novel algorithm called CosTaL, which was shown to outperform PhenoGraph, Scanpy and PARC. Wu et al. [31] developed the SCMcluster algorithm, which utilized cellular marker genes and demonstrated that superior performance compared to other clustering methods. However, there is still a lack of comprehensive comparison and evaluation between these well-performing clustering algorithms and deep learning-based clustering algorithms in current studies.

Herein, using 10 scRNA-seq datasets as benchmarks (involved into three types of biological backgrounds), we conducted a systematic performance evaluation of various clustering algorithms including Seurat, DESC, scVI, scDeepCluster, SCCAF, CosTaL and SC3 from multiple perspectives. Based on the comparative results in terms of clustering number, diverse evaluating indices measuring clustering effectiveness and computation efficiency, we demonstrate that CosTaL, Seurat, DESC and SC3 consistently perform well, particularly on clustering effectiveness measured by nine evaluation metrics across different types of benchmarks. scVI exhibits good clustering performance in some datasets while performing poorly in others. The performance of the scVI varied across different datasets in terms of these evaluating indices. While scDeepCluster and SCCAF exhibit poor performance compared to other algorithms, their clustering performance have substantial variations across different benchmarks. Moreover, DESC exhibited promising results for cell subtype identification and capturing cellular heterogeneity. When applied to the same scRNA-seq dataset, SC3 requires more memory and exhibits slower computation speed compared to other algorithms. This study provides a useful guidance for selecting appropriate clustering methods for scRNA-seq data analysis.

MATERIALS AND METHODS

Benchmark datasets collection for single-cell clustering algorithm comparison

We evaluate clustering algorithms on 10 scRNA-seq benchmark datasets selected from the GEO and ArrayExpress databases (Table 1). Based on the culture system and cell type of the data, we further classify the data into three categories: for Type 1: different culture systems for the same cell type (GSE196091 and GSE189120); for Type 2: the same culture system for different cell types (GSE180914, GSE159183, GSE89497, GSE171381, E-MTAB-6701); and for Type 3: different cell types and different culture systems (GSE194209, GSE139850, GSE171993). The details of each benchmark dataset are provided in the Supplementary Data.

Table 1.

Ten scRNA-seq datasets were utilized to systematically analyze and compare seven single-cell clustering algorithms

Dataset ID Sample source Cell label Case type Expression unit Cell (gene)
GSE196091 Homo sapiens Label known by experiment Type 1: Different culture systems for the same cell type Reads count 31 730 (20 028)
GSE189120 Homo sapiens Label known by experiment Type 1: Different culture systems for the same cell type Reads count 25 511 (20 838)
GSE180914 Pleurodeles waltl Label generated by scRNA-seq Type 2: The same culture system for different cell types Reads count 681 (16 545)
GSE159183 Mus musculus Label known by experiment Type 2: The same culture system for different cell types Reads count 738 (24 711)
GSE89497 Homo sapiens Label known by experiment Type 2: The same culture system for different cell types TPM 1567 (21 465)
GSE171381 Homo sapiens Label generated by scRNA-seq Type 2: The same culture system for different cell types Reads count 83 378 (25 127)
E-MTAB-6701 Homo sapiens Label generated by scRNA-seq Type 2: The same culture system for different cell types Reads count 64 734 (26 212)
GSE194209 Mus musculus Label known by experiment Type 3: Different cell types and different culture systems Reads count 20 393 (17 454)
GSE139850 Homo sapiens Label known by experiment Type 3: Different cell types and different culture systems Reads count 14 874 (22 401)
GSE171993 Mus musculus Label known by experiment Type 3: Different cell types and different culture systems Reads count 52 834 (24 043)

The datasets GSE196091 and GSE189120 were classified as Type 1 cases, while GSE180914, GSE159183, GSE89497, GSE171381 and E-MTAB-6701 were classified as Type 2 cases. Lastly, GSE194209, GSE139850 and GSE171993 were classified as Type 3 cases.

Clustering algorithms utilized for performance evaluation in this study

In this work, seven cell clustering algorithms were analyzed (Supplementary Table S1), including a typically traditional Seurat method, four state-of-the-art deep learning-based clustering algorithms such as DESC, scVI, single-cell Model-based Deep Embedded Clustering (scDeepCluster), SCCAF, CosTaL and SC3. The instructions on these clustering algorithms are provided in the Supplementary Methods.

Diverse evaluating indices used for determining performance of clustering algorithm

In this study, different types of metrics were used for assessing the performance of clustering algorithm. These evaluation metrics included t-SNE plot & clustering number, and nine effectiveness scores and computation speed & memory usage. These diverse evaluation metrics included Adjusted Rand Index (ARI), Homogeneity Score (H Score), Kullback–Leibler (KL) Divergence, Normalized Mutual Information (NMI), Adjusted Mutual Information (AMI), FlowCAPI F1 score (FF1), Hungarian algorithm-based F1 score (HF1), Fowlkes–Mallows Index (FMI) and V-measure. The characterizations of these diverse evaluation metrics are demonstrated in Supplementary Table S2. First, a graphical representation of performance, such as dimension reduction and data visualization using t-SNE [37], was utilized. Herein, we evaluated the ability of clustering algorithms in identifying cell subtype and characterizing cellular heterogeneity using a single metric CN (clustering number), which were obtained by the algorithm for a given cell type. Second, the true labels for the cells were defined based on the well-known cellular information provided by the original publications in this study. These true labels represent the known cell types or experimental conditions associated with each cell. By using these true labels as a reference, the performance of the clustering algorithms can be evaluated and compared. The predictive labels for the cells were defined using a majority voting scheme [38]. Briefly, clustering algorithm usually partitions data into a certain number of clusters, patterns in the same cluster should be similar to each other, while patterns in different clusters should not [39]. We obtained all the clusters generated by each algorithm’s clustering result. For each cluster, the number of cells with the same true label is counted, and the true label with the highest count is selected as the predictive label for that cluster. This approach ensures the predictive label represents the main characteristic of the cells within that cluster, thereby enabling a more accurate evaluation of the clustering algorithm’s performance.

RESULTS AND DISCUSSIONS

Study overview

Diverse clustering algorithms were evaluated using 10 scRNA-seq datasets. Seurat was able to identify distinct clusters, but the clusters were found to be highly sensitive to the choice of resolution parameter, where higher values tend to result in a larger number of clusters [40, 41]. Deep learning clustering algorithms have emerged as promising approaches to address challenges such as batch effects in scRNA-seq data, and were shown to be effective algorithm for cell clustering and identification [23, 24]. In addition, there are other clustering algorithms specifically designed for single-cell clustering analysis, including SC3 and CosTaL. We adopted nine metrics (three categories) to evaluate performance of algorithms. An overview of the framework for comparing diverse algorithms are shown in Figure 1.

Figure 1.

Figure 1

The overview of this study framework. The framework is divided into three major parts: types of scRNA-seq dataset, single-cell clustering algorithm and performance evaluation. The true labels of cells are cell identifiers from corresponding previous publications, while the predictive labels represent cell identifiers generated by the clustering algorithm. The performances of seven clustering algorithms were assessed utilizing nine metrics.

Performance evaluation of clustering algorithms using t-SNE plot and clustering number

In the context of identifying cell subtypes and cell heterogeneity, the number of clusters generated by clustering algorithms can serve as an indicator of the algorithm’s ability to accurately capture the underlying cellular heterogeneity [42]. A greater number of clusters suggests that the clustering algorithm is effective in distinguishing distinct subtypes within a given cell type [42]. For instance, in the case of placental cytotrophoblasts cells (CTBs), the generation of three subtypes by algorithm indicates that CTBs exhibit both high and varying proliferative capacities [43]. To compare clustering algorithms, we refer to clustering number (showing distinct and separated clusters) as an evaluation measure for evaluating the ability of algorithms in identifying cell subtype and heterogeneity. Figures 2, 3 and 4 show the t-SNE plots illustrating clustering results for all types of datasets using seven algorithms.

Figure 2.

Figure 2

Performance comparison of clustering results for Type 1 data across six algorithms. (AF) The t-SNE plots show the visualization of clustering results for Type 1 data using Seurat (A), DESC (B), scVI (C), SCCAF (D), scDeepCluster (E) and CosTaL (F).

Figure 4.

Figure 4

Performance comparison of clustering results for Type 3 data across seven algorithms. (AG) The t-SNE plots show the visualization of clustering results for Type 3 data using Seurat (A), DESC (B), scVI (C), SCCAF (D), scDeepCluster (E), CosTaL (F) and SC3 (G).

Application to datasets of different culture systems for the same cell type (Type 1)

Figure 2 shows the visualization of clustering results for Type 1 data across diverse algorithms. In GSE196091, all algorithms exhibit poor performance in clustering N6 cells under different treatments, particularly for N6 cells in AZD medium (blue) and N6 cells in DMSO medium (yellow). These results can be attributed to the fact that N6 cells under different treatments do not exhibit significant differences at the overall gene expression level. In GSE189120, the three true labels (pDC_24hrs, pDC_6hrs and pDC_baseline) were well separated in all algorithms. Notably, as shown in Figure 2(B), the DESC algorithm can cluster the data into multiple subgroups for the same cell type. For example, DESC exhibits more distinct as well as separated cell clusters for pDC_6hrs cells (red) compared to other algorithms. Although the Seurat generated most the numbers of clusters (Supplementary Figure S1), these clusters do not display obvious separation. This may be attributed to resolution parameter. The t-SNE plots of clusterings for Type 1 can be seen in Supplementary Figure S1. Overall, for Type 1 datasets, DESC results often yield more clustering subgroups showing obvious separation compared to other algorithms for the same cell type. These results show that DESC display good performance in identifying cell subtypes and cell heterogeneity. Due to the memory usage limitations, the SC3 do not obtain comparable cluster results.

Application to datasets of the same culture system for different cell types (Type 2)

Figure 3 shows the visualization of clustering results for Type 2 data using seven algorithms. As demonstrated in Figure 3, for all five datasets, all clustering algorithms obviously separated the distinct cell types. For example, in GSE180914, seven cell types including B cells, erythrocyte-1, erythrocyte-2, intermediate cells, macrophages, platelets and T cells display obvious separation. The clustering results of these seven algorithms in GSE159183 are consistent with the true label, which obviously divided these cells into ES, PrES, TS and XEN types. Notably, DESC (Figure 3B) exhibits superior performance in clustering HE8WEVT and HE24WEVT cells (for GSE89497), Dec_DSC and Dec_FB cells (for GSE171381), as well as NK CD16+ and dNK cells (for E-MTAB-6701). In contrast, Seurat and scVI fail to effectively cluster these cell types. Due to memory limitations, the SCCAF, scDeepCluster, CosTaL and SC3 algorithms do not obtain comparable cluster results for GSE171381 and E-MTAB-6701. In addition, it is important to note that the GSE89497 dataset only provides normalized data in transcripts per million (TPM) and does not include Read Count expression information in Table 1. Thus, we did not compare the performance of CosTaL and scDeepCluster for this dataset in this study. Moreover, as shown in Figure 3(B), the DESC algorithm can cluster the data into multiple subgroups for the same cell type. For example, DESC exhibits more distinct as well as separated cell clusters for erythrocyte-2 (GSE180914), XEN (GSE159183) and HE8WEVT (GSE89497) compared to other algorithms. The t-SNE plots of clusterings for Type 2 can be seen in Supplementary Figure S2. In sum, these results indicate that DESC display good performance in identifying cell subtypes and cell heterogeneity.

Figure 3.

Figure 3

Performance comparison of clustering results for Type 2 data across seven algorithms. (AG) The t-SNE plots show the visualization of clustering results for Type 2 data using Seurat (A), DESC (B), scVI (C), SCCAF (D), scDeepCluster (E), CosTaL (F) and SC3 (G).

Application to datasets of different cell types and different culture systems (Type 3)

Figure 4 shows the visualization of clustering results for Type 3 data using seven algorithms. As demonstrated in Figure 4, in the GSE194209 dataset, we observed that none of the algorithms were able to effectively separate the CD45+ cells (isolated from tumor tissue) between TightMWA and MWA treatments. In GSE139850, SCCAF performed poorly in distinguishing cells (isolated from embroid body) under IWP2 and CHIRSB treatments. In GSE171993, DESC and CosTaL performed well in distinguishing different cell types (isolated from different postnatal days), especially exhibiting the good performance in distinguished B cells and T cells (Figure 4B and F), whereas Seurat and scVI fail to separate these two cell types. Moreover, as shown in Figure 4(B), the DESC algorithm can cluster the data into multiple subgroups for the same cell type. For example, DESC exhibits more distinct as well as separated cell clusters for CHIRSB (GSE139850) and EC (GSE171993) compared to other algorithms. The t-SNE plots of clusterings for Type 3 can be seen in Supplementary Figure S3. These results indicate that DESC displays good performance in identifying cell subtypes and heterogeneity. Due to the limitation of a large number of cells (for GSE171993), SCCAF, scDeepCluster and SC3 did not produce comparable clustering results.

Overall, these findings demonstrate that six algorithms performed well in clustering different cell types across most benchmark datasets, with the exception of SCCAF. DESC and CosTaL consistently exhibited strong performance across diverse datasets, particularly for specific cell types. DESC could be a well-suited method for cell subtype identification and cellular heterogeneity, as it effectively clusters cells of the same type in a distinct and separated manner. This factor contributing to DESC’s this performance may be the use of a biologically interpretable clustering assignment probability in DESC, which can reveal the discrete and pseudo-time structure of cells [19].

Performance evaluation of clustering algorithms using effectiveness scores

To evaluate the effectiveness of the clustering algorithms, we performed a quantitative evaluation using nine metrics, including ARI, H Score, KL Divergence, NMI, AMI, FF1, HF1, FMI and V-measure. These metrics are widely used to assess the classification ability of clustering algorithms and are acknowledged to provide a comprehensive performance evaluation. A higher value of ARI, H Score, NMI, AMI, FF1, HF1, FMI and V-measure indicates better performance of the algorithm. A lower value of KL represents better performance of the algorithm. The characteristics of all metrics used in this study are demonstrated in Supplementary Table S2. Based on the categories adopted by Li et al., we classified the nine metrics into three distinct categories: Pair counting, Set overlap and Information theory [30].

Performance of metrics (pair counting)

Figure 5(A) and (B) illustrates the comparison of ARI and FMI among seven algorithms. The results represent that CosTaL, DESC, Seurat and SC3 algorithms have consistently superior performance in terms of ARI and FMI for most datasets, while SCCAF and scDeepCluster have consistently poor performance for these datasets. However, the scVI algorithm showed inconsistent performance across different datasets. For example, the ARI and FMI scores of scVI were lowest in E-MTAB-6701 and GSE171993, while it exhibited the higher level in GSE180914, GSE171381, GSE139850 and GSE159183. In addition, the ARI and FMI scores of SCCAF were higher in GSE159183 and GSE180914 than that of scDeepCluster, but in GSE189120, GSE196091 and GSE139850 was lower. Interestingly, in GSE196091, the performance of scDeepCluster was even better than that of CosTaL and DESC. All ARI and FMI scores are shown in Supplementary Tables S3–S4.

Figure 5.

Figure 5

Performance comparison of different clustering algorithms based on nine evaluation metrics. (A) ARI: Type 1 (left), Type 2 (middle), Type 3 (right). (B) FMI: Type 1 (left), Type 2 (middle), Type 3 (right). (C) HF1: Type 1 (left), Type 2 (middle), Type 3 (right). (D) FF1: Type 1 (left), Type 2 (middle), Type 3 (right). (E) H Score: Type 1 (left), Type 2 (middle), Type 3 (right). (F) AMI: Type 1 (left), Type 2 (middle), Type 3 (right). (G) NMI: Type 1 (left), Type 2 (middle), Type 3 (right). (H) V-measure: Type 1 (left), Type 2 (middle), Type 3 (right). (I) KL: Type 1 (left), Type 2 (middle), Type 3 (right).

Performance of metrics (set overlap)

Figure 5(C) and (D) illustrate the comparison of HF1 and FF1 among seven algorithms. The higher HF1 and FF1 scores of CosTaL, Seurat, DESC and SC3 showed that they outperformed for most datasets. The HF1 and FF1 scores of SCCAF and scDeepCluster are consistently low for most datasets, which indicate that SCCAF and scDeepCluster exhibit poor performance across different datasets. scVI has a low HF1 and FF1 in E-MTAB-6701 and GSE171993, but high in the other datasets. These findings suggested that scDeepCluster and scVI exhibit inconsistent performance across different datasets. All HF1 and FF1 scores are shown in Supplementary Tables S5S6.

Performance of metrics (information theory)

Figure 5(E)–(H) illustrates the comparison of H Score, NMI, AMI and V-measure among seven algorithms. Similar to metrics based on Pair counting and Set overlap, for H Score, NMI, AMI and V-measure, the higher scores of CosTaL, Seurat, DESC and SC3 showed that they outperformed for most datasets. The scores of SCCAF and scDeepCluster are consistently low for most cases, which indicate SCCAF and scDeepCluster exhibit poor performance across these datasets. scVI has low scores in E-MTAB-6701 and GSE171993, but high in the other datasets. These findings suggested that scDeepCluster and scVI exhibit inconsistent performance across different datasets. Moreover, Figure 5I shows a comparison of KL among seven algorithms. The lower KL scores of CosTaL, Seurat, DESC and SC3 showed that they outperformed across all datasets. The KL value of SCCAF and scDeepCluster is consistently high across all datasets, which indicates that SCCAF and scDeepCluster exhibit poor performance for most cases. scVI has a high KL value in E-MTAB-6701 and GSE171993, but low in the other datasets. These findings suggested that scVI exhibits inconsistent performance across different datasets. All H Score, NMI, AMI, V-measure and KL Divergence scores are shown in Supplementary Tables S7S11.

Notably, in GSE194209 and GSE196091, all algorithms produced very low scores (ARI, H Score, NMI, AMI, FF1, HF1, FMI and V-measure) compared to other datasets, especially GSE194209. This may be attributed to the CD45+ cells isolated from tumor tissue between TightMWA and MWA treatments that do not exhibit obvious biological variation (GSE194209). N6 cells treated by between AZD and DMSO medium do not exhibit significant differences at the overall gene expression level (GSE196091). The performance of algorithms based on these scores is consistent with the t-SNE plots in Figures 2 and 4. This observation suggests that there may be a lack of distinct biological variations among cells under different conditions in these datasets. It is crucial to consider that the effectiveness scores may not fully capture the performance of clustering algorithms in such cases. These metrics primarily evaluate the agreement between the clustering results and the ground truth labels or known biological features. When the underlying biological variations are minimal or absent, it becomes challenging for any clustering algorithm to accurately identify meaningful clusters. Generally, all datasets that exhibit clear biological variation between distinct conditions are suitable for clustering analysis. On the other hand, datasets that do not show obvious biological variation may be not suitable for clustering analysis. Furthermore, for the different type datasets, CosTaL, Seurat, DESC and SC3 outperformed SCCAF and scDeepCluster. However, the performance of the scVI algorithm varied across different datasets. It showed good performance in some datasets while performing poorly in others. This variability suggests that the scVI algorithm may be more sensitive to certain dataset characteristics.

Performance evaluation of clustering algorithms using computation speed and memory usage

To assess the performance of clustering algorithms in terms of computation speed and memory usage, we conducted a thorough analysis. As demonstrated in Supplementary Table S12, Seurat demonstrated the most superior computation speed compared to the other six clustering algorithms across all 10 benchmark datasets. DESC, SCCAF and CosTaL also exhibited good computation speed. However, as the volume of data increases, the computation speed of CosTaL became slower compared to the other two algorithms. On the other hand, scDeepCluster and scVI showed modest computation speed, while SC3 exhibited poor computation speed. For memory usage, SC3 required more memory for implementing the single-cell clustering for the same dataset. scDeepCluster and scVI required modest memory usage, while Seurat, DESC, SCCAF and CosTaL required minor memory usage for implementing the single-cell clustering. In addition, as expected, when the number of cells analyzed increases, the computation speed tends to slow down, and memory usage becomes larger. Furthermore, we released the source codes for seven single-cell clustering algorithms and nine effectiveness scores on the CAFCA website (http://rdblab.cn/cafca/). These codes are readily available for download, allowing users to access and utilize them on their local computers.

CONCLUSIONS

We systematically compared the performance of clustering algorithms across 10 benchmark datasets using diverse evaluation metrics. Our results indicate that CosTaL, Seurat, DESC and SC3 consistently outperformed SCCAF and scDeepCluster based on nine effectiveness scores across most benchmark datasets. Notably, CosTaL and DESC demonstrated superior performance in clustering specific cell types. Furthermore, our evaluation metrics revealed variability in the performance of scVI, suggesting its sensitivity to certain dataset characteristics. On the other hand, DESC exhibited promising results for cell subtype identification and capturing cellular heterogeneity. This performance may be attributed to the use of a biologically interpretable clustering assignment probability in DESC, which reveals discrete and pseudo-time structures of cells. In addition, we compared the computation speed and memory usage of the algorithms. Our findings indicate that SC3 requires more memory and exhibits slower computation speed compared to other algorithms when applied to the same scRNA-seq dataset. Overall, our study provides valuable insights for researchers in selecting appropriate clustering algorithms based on their specific research needs. We highlight the strengths and limitations of each algorithm, considering factors such as performance, interpretability, computational efficiency, and suitability for different cell types and dataset characteristics.

Key Points

  • A systematic comparison of different state-of-the-art single-cell clustering algorithms was conducted using 10 scRNA-seq benchmark datasets, which involved three types of case situations.

  • To improve cell clustering and identification, the performance of these clustering algorithms was collectively assessed using a variety of well-established indices with distinct underlying theories.

  • To capture the underlying biological complexity and heterogeneity of the cell population, clustering (showing distinct and separated clusters) generated for the same cell type was utilized.

Supplementary Material

Tang-Supplementary_Materials_bbad497

ACKNOWLEDGEMENTS

We are sincerely grateful to the Supercomputing Center of Chongqing Medical University for its strong support in the calculation of large memory data in this article.

Author Biographies

Xiao Liang, Lijie Cao, Hao Chen and Lidan Wang are master candidates in the School of Basic Medicine at Chongqing Medical University. They are working on single-cell analysis and web server construction.

Lijuan Fu is a professor in Chongqing Medical University. She is working on molecular mechanism, epidemiology and transformation of female reproductive-related diseases.

Yangyun Wang and Xiaqin Tan are undergraduates at Chongqing Medical University.

Enxiang Chen is a postdoctoral fellow in the School of Basic Medicine at Chongqing Medical University. He is interested in trophoblastic stem cells and organoids.

Yubin Ding is a professor in Women and Children’s Hospital of Chongqing Medical University. He is working on molecular mechanism, epidemiology and transformation of female reproductive related diseases.

Jing Tang is an associate professor in the School of Basic Medicine at Chongqing Medical University. She is interested in single-cell analysis and web server construction.

Contributor Information

Xiao Liang, Department of Obstetrics and Gynecology, Women and Children’s Hospital of Chongqing Medical University, Chongqing 401147, China; School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China.

Lijie Cao, School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China.

Hao Chen, School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China.

Lidan Wang, School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China.

Yangyun Wang, School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China.

Lijuan Fu, Joint International Research Laboratory of Reproduction and Development of the Ministry of Education of China, School of Public Health, Chongqing Medical University, Chongqing 400016, China; Department of Pharmacology, Academician Workstation, Changsha Medical University, Changsha 410219, China.

Xiaqin Tan, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400016, China.

Enxiang Chen, School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China; Joint International Research Laboratory of Reproduction and Development of the Ministry of Education of China, School of Public Health, Chongqing Medical University, Chongqing 400016, China.

Yubin Ding, Department of Obstetrics and Gynecology, Women and Children’s Hospital of Chongqing Medical University, Chongqing 401147, China; Joint International Research Laboratory of Reproduction and Development of the Ministry of Education of China, School of Public Health, Chongqing Medical University, Chongqing 400016, China.

Jing Tang, Department of Obstetrics and Gynecology, Women and Children’s Hospital of Chongqing Medical University, Chongqing 401147, China; School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China.

AUTHOR CONTRIBUTIONS

J.T., Y.D. and E.C. conceived the idea and supervised the work. X.L. performed the research and constructed the scheme, and wrote the scripts. X.L., L.C. and Y.W. implemented the visualization and data curation. X.L., L.C., H.C., L.W., Y.W., L.F. and X.T. prepared and analyzed the data. J.T. and X.L. wrote the manuscript. All authors reviewed and approved the final version of the manuscript.

FUNDING

The National Natural Science Foundation of China (82301909, 82171664), the Science and Technology Project of Chongqing Education Committee (KJZD-K202200408, KJQN202100421), the Natural Science Foundation of Chongqing Municipality of China (CSTB2022NS CQ-LZX0062) and the Basic Research and Frontiers Exploration Project of Science and Technology Committee of Yuzhong District, Chongqing (20210119).

DATA AVAILABILITY

The source codes for seven single-cell clustering algorithms and the nine effectiveness scores are available at the CAFCA website (http://rdblab.cn/cafca/).

References

  • 1. Andrews  TS, Kiselev  VY, McCarthy  D, Hemberg  M. Tutorial: guidelines for the computational analysis of single-cell RNA sequencing data. Nat Protoc  2021;16:1–9. [DOI] [PubMed] [Google Scholar]
  • 2. Jovic  D, Liang  X, Zeng  H, et al.  Single-cell RNA sequencing technologies and applications: a brief overview. Clin Transl Med  2022;12:e694. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Yan  LY, Yang  MY, Guo  HS, et al.  Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells. Nat Struct Mol Biol  2013;20:1131–9. [DOI] [PubMed] [Google Scholar]
  • 4. Klein  AM, Mazutis  L, Akartuna  I, et al.  Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell  2015;161:1187–201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Van de Sande  B, Lee  JS, Mutasa-Gottgens  E, et al.  Applications of single-cell RNA sequencing in drug discovery and development. Nat Rev Drug Discov  2023;22:496–520. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Trapnell  C, Cacchiarelli  D, Grimsby  J, et al.  The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol  2014;32:381–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Zhang  M, Liu  S, Miao  Z, et al.  IDEAS: individual level differential expression analysis for single-cell RNA-seq data. Genome Biol  2022;23:33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Elmentaite  R, Dominguez Conde  C, Yang  L, et al.  Single-cell atlases: shared and tissue-specific cell types across human organs. Nat Rev Genet  2022;23:395–410. [DOI] [PubMed] [Google Scholar]
  • 9. Lambrechts  D, Wauters  E, Boeckx  B, et al.  Phenotype molding of stromal cells in the lung tumor microenvironment. Nat Med  2018;24:1277–89. [DOI] [PubMed] [Google Scholar]
  • 10. Wu  K, Lin  K, Li  X, et al.  Redefining tumor-associated macrophage subpopulations and functions in the tumor microenvironment. Front Immunol  2020;11:1731. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Kotliar  D, Lin  AE, Logue  J, et al.  Single-cell profiling of Ebola virus disease in vivo reveals viral and host dynamics. Cell  2020;183:1383–401.e19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Ziegler  CGK, Allon  SJ, Nyquist  SK, et al.  SARS-CoV-2 receptor ACE2 is an interferon-stimulated gene in human airway epithelial cells and is detected in specific cell subsets across tissues. Cell  2020;181:1016–35.e19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Paik  DT, Cho  S, Tian  L, et al.  Single-cell RNA sequencing in cardiovascular development, disease and medicine. Nat Rev Cardiol  2020;17:457–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Shao  X, Liao  J, Lu  X, et al.  scCATCH: automatic annotation on cell types of clusters from single-cell RNA sequencing data. iScience  2020;23:100882. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Fang  J, Chan  C, Owzar  K, et al.  Clustering Deviation Index (CDI): a robust and accurate internal measure for evaluating scRNA-seq data clustering. Genome Biol  2022;23:269. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Tran  HTN, Ang  KS, Chevrier  M, et al.  A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biol  2020;21:12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Choi  K, Chen  Y, Skelly  DA, Churchill  GA. Bayesian model selection reveals biological origins of zero inflation in single-cell transcriptomics. Genome Biol  2020;21:183. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Kiselev  VY, Andrews  TS, Hemberg  M. Publisher correction: challenges in unsupervised clustering of single-cell RNA-seq data. Nat Rev Genet  2019;20:310. [DOI] [PubMed] [Google Scholar]
  • 19. Li  X, Wang  K, Lyu  Y, et al.  Deep learning enables accurate clustering with batch effect removal in single-cell RNA-seq analysis. Nat Commun  2020;11:2338. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Lopez  R, Regier  J, Cole  MB, et al.  Deep generative modeling for single-cell transcriptomics. Nat Methods  2018;15:1053–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Tian  T, Wan  J, Song  Q, Wei  Z. Clustering single-cell RNA-seq data with a model-based deep learning approach. Nat Mach Intell  2019;1:191–8. [Google Scholar]
  • 22. Miao  Z, Moreno  P, Huang  N, et al.  Putative cell type discovery from single-cell gene expression data. Nat Methods  2020;17:621–8. [DOI] [PubMed] [Google Scholar]
  • 23. Brendel  M, Su  C, Bai  Z, et al.  Application of deep learning on single-cell RNA sequencing data analysis: a review. Genom Proteom Bioinform  2022;20:814–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Li  X, Zhang  S, Wong  KC. Deep embedded clustering with multiple objectives on scRNA-seq data. Brief Bioinform  2021;22:bbab090. [DOI] [PubMed] [Google Scholar]
  • 25. Zhang  P, Wu  Y, Zhou  H, et al.  CLNN-loop: a deep learning model to predict CTCF-mediated chromatin loops in the different cell lines and CTCF-binding sites (CBS) pair types. Bioinformatics  2022;38:4497–504. [DOI] [PubMed] [Google Scholar]
  • 26. Zhang  P, Wu  H. IChrom-deep: an attention-based deep learning model for identifying chromatin interactions. IEEE J Biomed Health Inform  2023;27:4559–68. [DOI] [PubMed] [Google Scholar]
  • 27. Zhang  P, Zhang  H, Wu  H. iPro-WAEL: a comprehensive and robust framework for identifying promoters in multiple species. Nucleic Acids Res  2022;50:10278–89. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Kiselev  VY, Kirschner  K, Schaub  MT, et al.  SC3: consensus clustering of single-cell RNA-seq data. Nat Methods  2017;14:483–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Stassen  SV, Siu  DMD, Lee  KCM, et al.  PARC: ultrafast and accurate clustering of phenotypic data of millions of single cells. Bioinformatics  2020;36:2778–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Li  Y, Nguyen  J, Anastasiu  DC, Arriaga  EA. CosTaL: an accurate and scalable graph-based clustering algorithm for high-dimensional single-cell data analysis. Brief Bioinform  2023;24:bbad157. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Wu  H, Zhou  H, Zhou  B, Wang  M. SCMcluster: a high-precision cell clustering algorithm integrating marker gene set with single-cell RNA sequencing data. Brief Funct Genomics  2023;22:329–40. [DOI] [PubMed] [Google Scholar]
  • 32. Mereu  E, Lafzi  A, Moutinho  C, et al.  Benchmarking single-cell RNA-sequencing protocols for cell atlas projects. Nat Biotechnol  2020;38:747–55. [DOI] [PubMed] [Google Scholar]
  • 33. Haque  A, Engel  J, Teichmann  SA, Lönnberg  T. A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications. Genome Med  2017;9:75. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Andrews  TS, Hemberg  M. Identifying cell populations with scRNASeq. Mol Aspects Med  2018;59:114–22. [DOI] [PubMed] [Google Scholar]
  • 35. Becht  E, McInnes  L, Healy  J, et al.  Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol  2018;37:38–44. [DOI] [PubMed] [Google Scholar]
  • 36. Duo  A, Robinson  MD, Soneson  C. A systematic performance evaluation of clustering methods for single-cell RNA-seq data. F1000Res  2018;7:1141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Bushati  N, Smith  J, Briscoe  J, Watkins  C. An intuitive graphical visualization technique for the interrogation of transcriptome data. Nucleic Acids Res  2011;39:7380–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Millan Arias  P, Alipour  F, Hill  KA, et al.  DeLUCS: deep learning for unsupervised clustering of DNA sequences. PloS One  2022;17:e0261531. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Xu  R, Wunsch  D. Survey of clustering algorithms. IEEE Trans Neural Netw  2005;16:645–78. [DOI] [PubMed] [Google Scholar]
  • 40. Sun  Z, Wang  T, Deng  K, et al.  DIMM-SC: a Dirichlet mixture model for clustering droplet-based single cell transcriptomic data. Bioinformatics  2018;34:139–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Zeng  Y, Wei  Z, Zhong  F, et al.  A parameter-free deep embedded clustering method for single-cell RNA-seq data. Brief Bioinform  2022;23:bbac172. [DOI] [PubMed] [Google Scholar]
  • 42. Duan  X, Wang  W, Tang  M, et al.  Dissecting cellular heterogeneity based on network denoising of scRNA-seq using local scaling self-diffusion. Front Genet  2021;12:811043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Liu  Y, Fan  X, Wang  R, et al.  Single-cell RNA-seq reveals the diversity of trophoblast subtypes and patterns of differentiation in the human placenta. Cell Res  2018;28:819–32. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Tang-Supplementary_Materials_bbad497

Data Availability Statement

The source codes for seven single-cell clustering algorithms and the nine effectiveness scores are available at the CAFCA website (http://rdblab.cn/cafca/).


Articles from Briefings in Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES