Skip to main content
. 2020 Dec 24;19(2):267–281. doi: 10.1016/j.gpb.2020.07.004

Table 2.

Datasets used in this study

Dataset name Protocol No. of cells No. of genes No. of cell types Species/tissue description Refs.
PBMC sorted 10X 91,649 18,986 7 Human PBMCs [33]
PBMC-3K 10X 2467 13,714 6 Human PBMCs
Pancreas sorted CEL-Seq2 2285 34,363 13 Human pancreas [10], [31]
Pancreas Fluidigm C1 638 34,363 13 Human pancreas [10], [32]
TM full sorted Smart-Seq2 24,622 22,252 37 Mouse [3]
TM full 10X 20,000 17,866 32 Mouse [3]
TM lung sorted Smart-Seq2 1563 22,253 10 Mouse lung [3]
TM lung 10X 1303 17,866 8 Mouse lung [3]
Simulation 1 true Splatter 2000 4000 5 Simulation data for cross-dataset prediction
Simulation 1 raw Splatter 2000 4000 5 Simulation data for cross-dataset prediction
Simulation 2 true Splatter 2000 10,000 5 Simulation data with increasing differential expression scales from low, low–moderate, moderate to high, each generated with 5 random seeds
Simulation 2 raw Splatter 2000 10,000 5 Simulation data with increasing differential expression scales from low, low–moderate, moderate to high, each generated with 5 random seeds
Simulation 3 true Splatter 10,000 20,000 10/20/30/40/50 Simulation data with increasing No. of cell type classes from 10 to 50
Simulation 3 raw Splatter 10,000 20,000 10/20/30/40/50 Simulation data with increasing No. of cell type classes from 10 to 50
Simulation 4 true Splatter 2000 10,000 9 Simulation data with descending cell proportion for each cell group, generated with 10 random seeds
Simulation 4 raw Splatter 2000 10,000 9 Simulation data with descending cell proportion for each cell group, generated with 10 random seeds
Simulation 5 true Splatter 5000/10,000/15,000/20,000/25,000/50,000 20,000 5 Simulation data with increasing No. of cells from 5000 to 50,000
Simulation 5 raw Splatter 5000/10,000/15,000/20,000/25,000/50,000 20,000 5 Simulation data with increasing No. of cells from 5000 to 50,000

Note PBMC-3K data were obtained from https://support.10xgenomics.com/single-cell-gene-expression/datasets/. Raw data indicate the true simulation data with the addition of dropouts. Sorted data were generated from the fluorescence-activated cell sorting. TM, Tabula Muris; PBMC, peripheral blood mononuclear cell.