Table 2.
Dataset name | Protocol | No. of cells | No. of genes | No. of cell types | Species/tissue description | Refs. |
---|---|---|---|---|---|---|
PBMC sorted | 10X | 91,649 | 18,986 | 7 | Human PBMCs | [33] |
PBMC-3K | 10X | 2467 | 13,714 | 6 | Human PBMCs | |
Pancreas sorted | CEL-Seq2 | 2285 | 34,363 | 13 | Human pancreas | [10], [31] |
Pancreas | Fluidigm C1 | 638 | 34,363 | 13 | Human pancreas | [10], [32] |
TM full sorted | Smart-Seq2 | 24,622 | 22,252 | 37 | Mouse | [3] |
TM full | 10X | 20,000 | 17,866 | 32 | Mouse | [3] |
TM lung sorted | Smart-Seq2 | 1563 | 22,253 | 10 | Mouse lung | [3] |
TM lung | 10X | 1303 | 17,866 | 8 | Mouse lung | [3] |
Simulation 1 true | Splatter | 2000 | 4000 | 5 | Simulation data for cross-dataset prediction | |
Simulation 1 raw | Splatter | 2000 | 4000 | 5 | Simulation data for cross-dataset prediction | |
Simulation 2 true | Splatter | 2000 | 10,000 | 5 | Simulation data with increasing differential expression scales from low, low–moderate, moderate to high, each generated with 5 random seeds | |
Simulation 2 raw | Splatter | 2000 | 10,000 | 5 | Simulation data with increasing differential expression scales from low, low–moderate, moderate to high, each generated with 5 random seeds | |
Simulation 3 true | Splatter | 10,000 | 20,000 | 10/20/30/40/50 | Simulation data with increasing No. of cell type classes from 10 to 50 | |
Simulation 3 raw | Splatter | 10,000 | 20,000 | 10/20/30/40/50 | Simulation data with increasing No. of cell type classes from 10 to 50 | |
Simulation 4 true | Splatter | 2000 | 10,000 | 9 | Simulation data with descending cell proportion for each cell group, generated with 10 random seeds | |
Simulation 4 raw | Splatter | 2000 | 10,000 | 9 | Simulation data with descending cell proportion for each cell group, generated with 10 random seeds | |
Simulation 5 true | Splatter | 5000/10,000/15,000/20,000/25,000/50,000 | 20,000 | 5 | Simulation data with increasing No. of cells from 5000 to 50,000 | |
Simulation 5 raw | Splatter | 5000/10,000/15,000/20,000/25,000/50,000 | 20,000 | 5 | Simulation data with increasing No. of cells from 5000 to 50,000 |
Note PBMC-3K data were obtained from https://support.10xgenomics.com/single-cell-gene-expression/datasets/. Raw data indicate the true simulation data with the addition of dropouts. Sorted data were generated from the fluorescence-activated cell sorting. TM, Tabula Muris; PBMC, peripheral blood mononuclear cell.