Skip to main content
. 2021 Mar 23;4(6):e202001004. doi: 10.26508/lsa.202001004

Table 2.

Summary benchmark tasks: In total, five tasks were designed to evaluate scaling, sensitivity, stability, correspondence to real data batch characteristics, and computational time and memory.

Name Measure Aim
Task 1: Batch characteristics Spearman correlation of metrics with surrogates of batch strength (e.g., PVE-Batch and proportion of DE genes between batches) across datasets Test whether metrics reflect batch strength/confounding across datasets
Task 2: Scaling with batch label permutation Spearman correlation of metrics with the percentage of randomly permuted batch label Serves as a negative control and determines whether metrics scale with randomness
Task 3: Scaling with batch strength and detection limits Spearman correlation of metrics with the batch logFC in simulation series on the same dataset; minimal batch logFC that is recognized from the metrics as batch effect Test whether metrics scale with (synthetic) batch strength; Estimate lower limit of batch detection
Task 4: Unbalanced batches Reaction of metrics to imbalance cell type abundance within the same dataset Test sensitivity towards imbalance of cell type abundance
Task 5: Computational time and memory CPU time and memory usage according to number of cells and number of genes Assess computational cost of metrics

For each task, different datasets (synthetic, semi-synthetic, or real) were used.