Abstract
Background
Current single cell analysis methods annotate cell types at cluster-level rather than ideally at single cell level. Multiple exchangeable clustering methods and many tunable parameters have a substantial impact on the clustering outcome, often leading to incorrect cluster-level annotation or multiple runs of subsequent clustering steps. To address these limitations, methods based on well-annotated reference atlas has been proposed. However, these methods are currently not robust enough to handle datasets with different noise levels or from different platforms.
Results
Here, we present gCAnno, a graph-based Cell type Annotation method. First, gCAnno constructs cell type-gene bipartite graph and adopts graph embedding to obtain cell type specific genes. Then, naïve Bayes (gCAnno-Bayes) and SVM (gCAnno-SVM) classifiers are built for annotation. We compared the performance of gCAnno to other state-of-art methods on multiple single cell datasets, either with various noise levels or from different platforms. The results showed that gCAnno outperforms other state-of-art methods with higher accuracy and robustness.
Conclusions
gCAnno is a robust and accurate cell type annotation tool for single cell RNA analysis. The source code of gCAnno is publicly available at https://github.com/xjtu-omics/gCAnno.
Supplementary Information
The online version contains supplementary material available at 10.1186/s12864-020-07223-4.
Keywords: Graph embedding, Cell type annotation, Single cell RNA analysis
Background
Bulk RNA sequencing measures average gene expression level in a large population of cells, hindering dissection of heterogeneous cell types [1]. In 2009, single cell RNA sequencing (scRNA-seq) technology was developed to provide valuable insights into cell heterogeneity [2].
In general, accurate cell type annotation for single cell data is a prerequisite for any further investigation of cell heterogeneous [3–6]. The commonly used cell type annotation methods, including Seurat [7], SCANPY [8] and SINCERA [9], adopts a similar procedure of data quality control, reads mapping, UMI quantification, expression normalization, clustering, differentially expressed genes (DEGs) of each cluster identification and cell type assignment based on biomarker genes [10]. However, those methods report cluster-level rather than truly single cell-level annotation results, masking subtle differences within each cluster. In addition, different clustering methods and many tunable parameters led to uncertain clustering outcome. These above two factors cause incorrect cluster-level annotations or multiple runs of subsequent clustering steps [10].
To overcome the above issues, two distinct strategies, namely biomarker-based and reference-based approaches, have been proposed. The biomarker-based methods, such as Garnett [11] and CellAssign [12], aim to establish mappings between the query dataset and the well-studied biomarkers. In particular, Garnett trains a classifier based on the user defined markup language. CellAssign builds a probabilistic model that leverages prior knowledge of cell-type marker genes for annotation. However, collecting a comprehensive biomarker set of different cell types is cumbersome, time-consuming and subjective. Thus recently reference-based approaches, such as Scmap [13], Chetah [14] and scPred [15] have been developed and are gaining popularity after a number of well-annotated single cell data were published, especially the datasets released by human cell atlas (HCA) [16]. The reference-based methods follow data-driven strategy and construct mappings between query dataset and the well-annotated reference datasets. For example, Scmap uses drop-based method to select feature genes as variables and constructs mapping by distance and correlation coefficient. Another method, scPred selects differential principle components (PCs) calculated by gene expression value between cell types and trains an SVM model with these PCs. Recently, a comprehensive benchmark study [17] of 22 cell type classification methods indicated that SVM classifier has overall the best performance. However, these methods are sensitive to experiment batches, sequencing platforms and noises, all of which are intrinsic properties of the single cell datasets.
Here, we propose a reference-based method, gCAnno, using graph representation feature selection strategy to comprehensively represent the global view of associations between cell types and genes for robust and high accuracy single cell-level annotation. Our gCAnno method starts with construction of a weighted cell type-gene bipartite graph. Then, graph embedding is applied to capture the cell type specific genes and naïve Bayes (gCAnno-Bayes) and SVM (gCAnno-SVM) classifiers are built for further annotation (Fig. 1). We compared gCAnno with the state-of-the-art methods on four published datasets as the basic test [3–6]. We also reported the performance comparison on large dataset with deep annotation level [18], different single cell platforms, simulated datasets with either various cell type imbalance situations and different dropout noise levels as the advanced test. Finally, runtime is summarized to demonstrate the efficiency of gCAnno.
Results
To evaluate the performance of gCAnno, we first evaluated the cell type-gene specific relation, and then compared gCAnno with five state-of-art methods, including Scmap-cell, Scmap-cluster, Chetah, scPred and SVM, in the following four aspects: 1) cell type specificity of gCAnno detected genes, 2) overall performance on different scRNA-seq datasets, 3) robustness test on simulated drop-out and imbalance noise data, 4) cross platform annotation.
Cell type specificity of gene sets detected by gCAnno
After graph embedding step, gCAnno selects cell type specific gene sets, which largely determines the performance of our approach. Thus, we first evaluated the cell type specificity of gene sets detected in the four datasets. We noticed that clear cell type specific expression patterns are observed for these selected genes (Fig. 2; Additional file 9: Figure S5; Additional file 10: Figure S6). Among the reported marker genes from the corresponding publications, gCAnno is able to capture an average of 57% of them, indicating gCAnno’s effectiveness of cell type specific gene identification (Additional file 11: Figure S7; Additional file 12: Table S4).
Overall and large dataset performance evaluation
We next evaluated and compared overall performance of gCAnno, Scmap, scPred, Chetah and SVM with four published scRNA-seq datasets (Table 1). We found that the comprehensive kappa coefficient of both gCAnno was consistently much higher than those of Scmap-cluster, Scmap-cell and scPred, respectively (p < 0.05, Wilcoxon rank sum test) (Fig. 3a-d) (Additional file 13: Table S5), hinting gCAnno’s better performance than other methods on cell type annotation across different species (e.g. human or plant), organs (e.g. liver or pancreases), or disease states (e.g. health or cancer). In 20 mouse organs dataset, the comprehensive kappa coefficient of both gCAnno were 0.74 (gCAnno-Bayes) and 0.94 (gCAnno-SVM), and other methods achieve 0.16 (Scmap-cluster), 0.18 (Scmap-cell), 0.80 (Chetah), 0.63 (scPred) and 0.92 (SVM), respectively (Fig. 3e). We found that gCAnno-SVM achieved highest performance than other methods in large dataset with deep annotation level (Additional file 6: Table S2; Additional file 14: Figure S8).
Table 1.
Dataset | #Cells | #Genes | # Cell types |
---|---|---|---|
Liver [4] | 8444 | 20,007 | 14 |
Pancreas [3] | 8562 | 20,126 | 13 |
AT root [6] | 7053 | 32,833 | 19 |
HCC, ICCA [5] | 4729 | 19,379 | 8 |
# means the number of
Robustness on dropout and imbalance noisy data
Besides basic accuracy, we examined its robustness in the presence of different types of noises. Dropout and cell count imbalance noises are two major types and the most challenging in scRNA-seq data. Dropout is a technical noise in the form of missing value in gene expression [10], while cell number imbalance among cell types is coming from biology itself. We found gCAnno achieved the highest and rather stable kappa coefficients for both reference dropout and query dropout tests in four datasets (Fig. 4; Additional file 15: Figure S9; Additional file 16: Table S6; Additional file 17: Figure S10). Remarkably, gCAnno achieved average kappa coefficients of 0.88 (gCAnno-SVM) and 0.79 (gCAnno-Bayes) even when dropout rate was as high as 50%, while other methods achieve 0 (Scmap-cluster), 0.44 (Scmap-cell), 0.37 (Chetah), 0.25 (scPred) and 0.79 (SVM), respectively. Moreover, we found gCAnno, SVM and Scmap-cell achieved the highest and stable kappa coefficients (average values are about 0.99) for different cell count imbalance ratios (Additional file 15: Figure S9; Additional file 18: Table S7). All of these results show gCAnno is better than other methods for dropout and cell count imbalance noises and achieved the best performance on highly noisy data (e.g. 50% dropout rate and 1:0.1 imbalance rate), suggesting the effectiveness of the wCGBG in selecting accurate features in the presence of high noise.
Cross platform annotation
Different single cell sequencing platforms have platform specific features or bias [19], limiting cross platform cell type annotation. We evaluated the platform compatibility of gCAnno on two liver datasets [4, 20] and two pancreas datasets [3, 21] from four platforms (10x, mCel-seq2, Drop-seq, and Smart-seq2) (Table 2). We used one platform dataset as the training data and the other as the testing data. For the performance comparison, gCAnno achieved consistently high kappa coefficient values for liver dataset tests (Fig. 5a and b) and for pancreas dataset tests (Fig. 5c and d) (Additional file 19: Table S8). These results show gCAnno is able to maintain high annotation accuracy for real heterogeneous and cross platform data in the presence of systematic platform specific bias.
Table 2.
Dataset | #Cells | #Genes | # Cell types | Platform |
---|---|---|---|---|
Liver [4] | 8103 | 20,007 | 7 | 10x |
Pancreas [3] | 8037 | 20,126 | 9 | Drop-seq |
Liver [20] | 7130 | 33,941 | 7 | mCel-seq2 |
Pancreas [21] | 2068 | 25,526 | 9 | Smart-seq2 |
# means the number of
Runtime evaluation
Finally, we evaluated the runtime of gCAnno based on datasets in above tests (Additional file 20: Table S9; Additional file 21: Figure S11). We found that the time takes in model building (including graph construction and embedding) step is positive correlated with the number of graph nodes (Pearson’s correlation is 0.94). Once the model has been built, the annotation step only takes less than 1 min (e.g. for mCel-seq2 platform liver dataset with 8103 cells only takes 48 s).
Discussion
In this study, we present gCAnno, a novel graph-based cell type identification method for scRNA-seq data. The most significant feature of gCAnno is the construction of wCGBG, enabling gCAnno to capture the global characteristics of association between cell types and genes. This feature allows gCAnno to detect accurate feature genes for each cell type, leading to accurate annotation results and robustness for different noise types and rates. In addition, gCAnno is able to annotate not only human scRNA-seq, but also plant scRNA-seq (e.g. Arabidopsis data) and its stable and high performance across two platforms.
gCAnno contains SVM version (gCAnno-SVM) and naïve Bayes version (gCAnno-Bayes). The SVM version takes into account the effect of expression value while naïve Bayes version only considers the existence of cell type specific genes. From the evaluation result, the SVM version seems suitable for the dataset with deep annotation level and contains largely similar cell types between training and test sets. However, in cross platform datasets from different studies and different sequencing platforms, gene expression value might fluctuate significantly, rendering better performance of naïve Bayes version than SVM version.
Since gCAnno is a reference-based cell type annotation method, it lacks the ability to identify novel cell types. For novel type cells, gCAnno assigns the closest cell types with the most similar expression profiles to them, which might be reasonable in most of applications but probably require further improvement. Integrating the biomarker-based method for novel cell type annotation and reference-based method for accurate pre-defined cell type annotation, we think, will be one direction to explore.
Conclusion
We have implemented a stable and high-performance automated cell type annotation tool, gCAnno, for scRNA-seq datasets. With an easy use Python running script as an example, we hope gCAnno will be useful for the scRNA-seq data analysis.
Methods
Here we summarized the framework of gCAnno. gCAnno adopts graph structure for cell type specific gene set detection and accurate cell type annotation. Firstly, gCAnno builds cell type-gene bipartite graph based on gene expression abundances and intensities, in which gene expression abundance is the proportion of cells expressing the gene in a given cell type while intensity is the average expression in cells expressing the gene. Then, graph embedding is adopted to obtain the embedding vectors of gene nodes and cell type nodes. Next, gCAnno selects a set of genes for each cell type with similar profiles in the embedding space. Finally, based on the detected cell type specific genes, gCAnno trains naïve Bayes and SVM classifiers. The workflow of gCAnno is depicted in Fig. 1.
Cell type-gene bipartite graph construction
Starting from the well-annotated reference scRNA-seq data, we constructed a weighted cell type-gene bipartite graph (wCGBG) containing both cell type nodes (CTN) and gene nodes (GN). Edges between CTN and GN indicate the correlation of a gene and a cell type while weight W measures significance of correlation. The weight is calculated by:
1 |
where nk is the cell count of cell type k, mj, k is the number of cells expressed gene j in cell type k. is the expression vector of gene j in cell type k. W is the product of the gene expression abundance and intensity. We use gene expression abundance and intensity to establish a relationship between cell types and genes in the form of proportion to reduce the impact of individual gene loss (dropout) or cell number imbalance.
Graph embedding and cell type-gene specific relation detection
After wCGBG construction, we used node2vec to obtain the low dimensional vectors (the embedding vectors) of gene nodes and cell type nodes. The first step is construction of a neighborhood set N(u) of each node u (either gene or cell type node) by a probability walk [22]. Then, we optimized the following objective function f(u) by maximizing the log-probability of observing a neighborhood set.
2 |
This optimization step enables the embedding vectors to capture the specificity and strength of interactions between cell node and gene node, e.g. if one gene is specific and highly expressed in one cell type, the corresponding two embedding vectors are similar. Then, we calculated Euclidean distance between the vector of genes and cell types. We selected top n (a user defined parameter, default n = 65, Additional file 1: Figure S1) closest genes for each cell type as the cell type specific gene set based on the overall performance on the five datasets we used [3–6, 18].
Classifier construction
After obtaining the cell type specific gene set, we build naïve Bayes (gCAnno-Bayes) and SVM (gCAnno-SVM) classifiers for annotation. For gCAnno-SVM, we directly use the expression of cell type specific genes as features to train an SVM classifier. For gCAnno-Bayes, we build a binary matrix to presents cell type and its corresponding specific genes, e.g. the element bij = 1 indicates gene j is one of the specific genes in cell type i. We train a Bernoulli Naïve Bayes to get genes’ conditional probability in each cell type and the prior probability of cell types. The query dataset is binarized and the annotation is based on maximum posterior probability of single cell’s cell type specific genes expression.
Performance measurement and dataset
Performance assessment and comparison
Cell type annotation is a typical multi-classification problem. We applied kappa coefficient as the performance measurement of classification, defined as Eq. (3).
3 |
where Ncorr is the ratio of total number of cells with corrected cell type annotation, Nt is the total number of cells in the dataset, K is the number of truly cell types, ai is the number of corrected annotated cells in the i-th cell type, and bi is the number of cells in the i-th cell type, po is the accuracy, ai × bi is the product of the actual and predicted quantity, pe punishes bias for unbalance evaluation.
To evaluate the performance of gCAnno, we performed both cross-validation test and independent heterogeneous test (cross-platform test). First, we adopted the five-fold cross-validation strategy following recent single cell analysis comparison published earlier [15, 17] on four published datasets and simulated noise datasets to evaluate the overall and robustness performance (Additional file 2: File S1). Then, we performed independent test on datasets from different sequencing platforms (the cross-platform testing) to evaluate the generalization capability of gCAnno.
Tools in comparison
The calculation results of Scmap, Chetah and scPred were obtained from the corresponding publications [13–15]. For SVM, we followed the previous report [17] which is using drop-based method [23] for feature selection.
Datasets used in basic overall performance test
To illustrate the stable performance of gCAnno across various species and tissue types, we compared gCAnno with other methods using four published datasets, including liver, pancreas, Arabidopsis thaliana root (AT root), hepatocellular carcinoma and intrahepatic cholangiocarcinoma (HCC and ICCA) datasets (Table 1; Additional file 2: File S1; Additional file 3: Figure S2; Additional file 4: Table S1). The true labels of the cells in each dataset are obtained from the corresponding publications.
Large dataset with deep annotation level
To demonstrate the performance of gCAnno in large dataset (cell number more than 50,000) with deep annotation level (more than 20 cell types). We compared gCAnno with other methods in 20 mouse organs dataset with 54,246 cells, 29 cell types and 23,433 genes. The true labels of the cells in each dataset are also obtained from the original publications [18] (Additional file 2: File S1; Additional file 5: Figure S3; Additional file 6: Table S2).
Simulated dropout and imbalance datasets
To evaluate the robustness of gCAnno in the presence of dropout noise, we simulated different dropout rates in four above datasets (Table 1), by modifying the expression level of a random gene subset (10, 20, 30, 40 and 50% of all genes) to zero (Additional file 2: File S1). Similarly, we used five-fold cross validation to evaluate its performance. In each validation, we simulated the dropout noise in either training group (reference dropout) or test group (query dropout), and calculated the kappa coefficient for each method.
To simulate the cell number imbalance noise, we randomly sampled different proportions (0.1:1, 0.3:1, 0.5:1, 0.7:1, 0.9:1, 1:0.9, 1:0.7, 1:0.5, 1:0.3 and 1:0.1) of cell count in two cell types (Hepatocyte and GamaDetaT) in liver dataset as the reference data for classifier constructing. To get more accuracy testing, this simulation was repeated five times (Additional file 2: File S1).
Cross platform datasets
To compare cross platform performance (various studies using different sequencing platforms), we searched and identified four datasets suitable for this purpose, including two liver datasets from 10x and mCel-seq2 platforms and two pancreas datasets from drop-seq and smart-seq2 platforms (Table 2). We noticed that the cell type annotation labels of the same tissue from different platforms are not identical. Thus, we unified the labels by removing cell types absent in either of the datasets (Additional file 7: Figure S4; Additional file 8: Table S3; Additional file 2: File S1).
Supplementary Information
Acknowledgements
The authors thank Yujing Liu, Peng Jia and Yongyong Kang for their important suggestions and feedback.
Abbreviations
- kappa
Kappa coefficient
- DEGs
Differentially expressed genes
- UMI
Unique Molecular Identifier
- scRNA-seq
Single-cell RNA-seq
- PCs
Principle components
- HCA
Human cell atlas
- wCGBG
Weighted cell type-gene bipartite graph
- 10x
10x Genomics platform
- SVM
Support Vector Machine
Authors’ contributions
KY and XY conceived the study. SG and TW designed and performed the experiments. SG, TW and BY analysed the data. SG developed the program. XY and SG wrote the manuscript. SG and ND completed figures of manuscript. All authors read and approved the final manuscript.
Funding
This work was supported by the National Key R&D Program of China (2018YFC0910400, 2017YFC0907500 and 2018ZX10302205), the National Science Foundation of China (61702406 and 31671372), the General Financial Grant from the China Postdoctoral Science Foundation (2017 M623178) and the Fundamental Research Funds for the Central Universities (xzy012020012).
Availability of data and materials
Datasets used for the analyses in this study are summarized in Additional file 2: File S1. The source code of gCAnno is publicly available at https://github.com/xjtu-omics/gCAnno.
Ethics approval and consent to participate
This study used previously published data and did not obtain any new data directly involved humans, plants or animals.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Xiaofei Yang, Shenghan Gao and Tingjie Wang contributed equally to this work.
References
- 1.Kolodziejczyk AA, Kim JK, Svensson V, Marioni JC, Teichmann SA. The technology and biology of single-cell RNA sequencing. Mol Cell. 2015;58(4):610–620. doi: 10.1016/j.molcel.2015.04.005. [DOI] [PubMed] [Google Scholar]
- 2.Tang F, Barbacioru C, Wang Y, Nordman E, Lee C, Xu N, Wang X, Bodeau J, Tuch BB, Siddiqui A, et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat Methods. 2009;6(5):377–382. doi: 10.1038/nmeth.1315. [DOI] [PubMed] [Google Scholar]
- 3.Baron M, Veres A, Wolock SL, Faust AL, Gaujoux R, Vetere A, Ryu JH, Wagner BK, Shen-Orr SS, Klein AM, et al. A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure. Cell Syst. 2016;3(4):346–360.e344. doi: 10.1016/j.cels.2016.08.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.MacParland SA, Liu JC, Ma XZ, Innes BT, Bartczak AM, Gage BK, Manuel J, Khuu N, Echeverri J, Linares I, et al. Single cell RNA sequencing of human liver reveals distinct intrahepatic macrophage populations. Nat Commun. 2018;9(1):4383. doi: 10.1038/s41467-018-06318-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Ma L, Hernandez MO, Zhao Y, Mehta M, Tran B, Kelly M, Rae Z, Hernandez JM, Davis JL, Martin SP, et al. Tumor cell biodiversity drives microenvironmental reprogramming in liver cancer. Cancer Cell. 2019;36(4):418–430.e416. doi: 10.1016/j.ccell.2019.08.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Zhang TQ, Xu ZG, Shang GD, Wang JW. A single-cell RNA sequencing profiles the developmental landscape of Arabidopsis root. Mol Plant. 2019;12(5):648–660. doi: 10.1016/j.molp.2019.04.004. [DOI] [PubMed] [Google Scholar]
- 7.Butler A, Hoffman P, Smibert P, Papalexi E, Satija R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol. 2018;36(5):411–420. doi: 10.1038/nbt.4096. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Wolf FA, Angerer P, Theis FJ. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 2018;19(1):15. doi: 10.1186/s13059-017-1382-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Guo M, Wang H, Potter SS, Whitsett JA, Xu Y. SINCERA: a pipeline for single-cell RNA-Seq profiling analysis. PLoS Comput Biol. 2015;11(11):e1004575. doi: 10.1371/journal.pcbi.1004575. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Kiselev VY, Andrews TS, Hemberg M. Challenges in unsupervised clustering of single-cell RNA-seq data. Nat Rev Genet. 2019;20(5):273–282. doi: 10.1038/s41576-018-0088-9. [DOI] [PubMed] [Google Scholar]
- 11.Pliner HA, Shendure J, Trapnell C. Supervised classification enables rapid annotation of cell atlases. Nat Methods. 2019;16(10):983–986. doi: 10.1038/s41592-019-0535-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Zhang AW, O'Flanagan C, Chavez EA, Lim JLP, Ceglia N, McPherson A, Wiens M, Walters P, Chan T, Hewitson B, et al. Probabilistic cell-type assignment of single-cell RNA-seq for tumor microenvironment profiling. Nat Methods. 2019;16(10):1007–1015. doi: 10.1038/s41592-019-0529-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Kiselev VY, Yiu A, Hemberg M. scmap: projection of single-cell RNA-seq data across data sets. Nat Methods. 2018;15(5):359–362. doi: 10.1038/nmeth.4644. [DOI] [PubMed] [Google Scholar]
- 14.de Kanter JK, Lijnzaad P, Candelli T, Margaritis T, Holstege FCP. CHETAH: a selective, hierarchical cell type identification method for single-cell RNA sequencing. Nucleic Acids Res. 2019;47(16):e95. doi: 10.1093/nar/gkz543. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Alquicira-Hernandez J, Sathe A, Ji HP, Nguyen Q, Powell JE. scPred: accurate supervised method for cell-type classification from single-cell RNA-seq data. Genome Biol. 2019;20(1):264. doi: 10.1186/s13059-019-1862-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Regev A, Teichmann SA, Lander ES, Amit I, Benoist C, Birney E, Bodenmiller B, Campbell P, Carninci P, Clatworthy M, et al. The human cell atlas. Elife. 2017;6:e27041. doi: 10.7554/eLife.27041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Abdelaal T, Michielsen L, Cats D, Hoogduin D, Mei H, Reinders MJT, Mahfouz A. A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biol. 2019;20(1):194. doi: 10.1186/s13059-019-1795-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Tabula Muris C. Overall c. Logistical c. Organ c, processing. Library p, sequencing. Computational data a. Cell type a. Writing g et al. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature. 2018;562(7727):367–372. doi: 10.1038/s41586-018-0590-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Ziegenhain C, Vieth B, Parekh S, Reinius B, Guillaumet-Adkins A, Smets M, Leonhardt H, Heyn H, Hellmann I, Enard W. Comparative analysis of single-cell RNA sequencing methods. Mol Cell. 2017;65(4):631–643.e634. doi: 10.1016/j.molcel.2017.01.023. [DOI] [PubMed] [Google Scholar]
- 20.Aizarani N, Saviano A, Sagar, Mailly L, Durand S, Herman JS, Pessaux P, Baumert TF, Grun D. A human liver cell atlas reveals heterogeneity and epithelial progenitors. Nature. 2019;572(7768):199–204. doi: 10.1038/s41586-019-1373-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Segerstolpe A, Palasantza A, Eliasson P, Andersson EM, Andreasson AC, Sun X, Picelli S, Sabirsh A, Clausen M, Bjursell MK, et al. Single-cell transcriptome profiling of human pancreatic islets in health and type 2 diabetes. Cell Metab. 2016;24(4):593–607. doi: 10.1016/j.cmet.2016.08.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Grover A, Leskovec J. node2vec: scalable feature learning for networks. KDD. 2016;2016:855–864. doi: 10.1145/2939672.2939754. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Andrews TS, Hemberg M. M3Drop: dropout-based feature selection for scRNASeq. Bioinformatics. 2019;35(16):2865–2867. doi: 10.1093/bioinformatics/bty1044. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Datasets used for the analyses in this study are summarized in Additional file 2: File S1. The source code of gCAnno is publicly available at https://github.com/xjtu-omics/gCAnno.