Summary
Feature selection by expectation maximization test (Festem) enables the direct selection of cell type marker genes, facilitating downstream clustering of single-cell RNA sequencing (scRNA-seq) data. Here, we present a protocol for using Festem to identify marker genes in scRNA-seq data and perform subsequent analyses. We describe comprehensive steps for setting up the environment, marker gene selection, clustering, and marker gene assignment. This protocol yields both clustering results and identified marker genes, enhancing the interpretation of biological information in scRNA-seq data.
For complete details on the use and execution of this protocol, please refer to Chen et al.1
Subject areas: bioinformatics, single cell, RNA-seq, systems biology
Graphical abstract

Highlights
-
•
Instructions for directly selecting cell type marker genes using Festem
-
•
Steps for clustering using Festem-selected marker genes
-
•
Guidance on batch effect removal based on Festem-selected genes
Publisher’s note: Undertaking any experimental protocol requires adherence to local institutional guidelines for laboratory safety and ethics.
Feature selection by expectation maximization test (Festem) enables the direct selection of cell type marker genes, facilitating downstream clustering of single-cell RNA sequencing (scRNA-seq) data. Here, we present a protocol for using Festem to identify marker genes in scRNA-seq data and perform subsequent analyses. We describe comprehensive steps for setting up the environment, marker gene selection, clustering, and marker gene assignment. This protocol yields both clustering results and identified marker genes, enhancing the interpretation of biological information in scRNA-seq data.
Before you begin
In single-cell RNA sequencing (scRNA-seq) research, cell types and their marker genes are typically identified through clustering and differentially expressed gene (DEG) analysis. Traditionally, genes are selected based on surrogate criteria such as variance and deviance. These selected genes are then used for clustering, and markers are identified through DEG analysis, assuming known cell types. However, surrogate criteria may overlook crucial genes or include irrelevant ones, and DEG analysis can suffer from the selection bias.2 To address these limitations, we developed Festem, a novel method that directly selects marker genes for optimal cell-type identification by exploiting the intrinsic clustering information within each gene’s expression distribution. By doing so, Festem circumvents the pitfalls of surrogate criteria and avoids the selection bias of the available DEG methods.
Installation and environment setup
Timing: ∼1 h
-
1.Install Miniconda.
-
a.Download Miniconda from https://docs.anaconda.com/miniconda/.
-
b.Run the following command in the terminal.
-
a.
bash Miniconda3-latest-Linux-x86_64.sh
Note: The installation process may vary depending on your operating system (Windows, macOS, or Linux). For comprehensive guidance, please refer to the official Miniconda installation instructions at the following link: https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html.
-
2.Create virtual environment.
-
a.Run the following command in the terminal.
-
a.
conda create -n Festem python=3.12.2
conda activate Festem
-
3.Install R using Miniconda.
-
a.Run the following command in the terminal.
-
a.
conda install conda-forge::r-base=4.4.1
Note: If you are using Windows, you will also need to install RTools (4.4). You can download it from the following link: https://cran.r-project.org/bin/windows/Rtools/.
-
4.Install R packages. Troubleshooting 1 and Troubleshooting 2.
-
a.Install Seurat and devtools. Run the following command in the conda terminal.conda install conda-forge::r-seurat=5.1.0conda install conda-forge::r-devtools
-
b.Install related packages. Run the following command in R Console. (Access R Console by running the command “R” in the conda terminal).install.packages("BiocManager")BiocManager::install("edgeR")devtools::install_github("satijalab/seurat-data")Note: Seurat utilizes several packages to significantly enhance speed and performance. Based on the developers’ recommendations, we suggest installing the following packages:setRepositories(ind = 1:3, addURLs = c("https://satijalab.r-universe.dev", "https://bnprks.r-universe.dev/"))install.packages(c("BPCells", "presto", "glmGamPoi"))Optional: If you intend to process scRNA-seq data with multiple batches, we recommend installing the Harmony package for effective batch removal.install.packages("harmony")
-
c.Install Festem. Troubleshooting 3.devtools::install_github("XiDsLab/Festem")
-
a.
Data collection
Timing: 5 min
-
5.
Download an scRNA-seq dataset using SeuratData. Run the following command in R Console. Troubleshooting 4.
library(SeuratData)
options(timeout = 1000)
InstallData("ifnb")
Note: Single-cell datasets analyzed in this protocol were preprocessed and deposited into Zenodo: https://doi.org/10.5281/zenodo.11331165.
Key resources table
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Deposited data | ||
| Immune cell gene expressions from eight patients with lupus | Kang et al.3 | GEO: GSE96583 |
| Software and algorithms | ||
| Miniconda3 | Anaconda, Inc. | https://docs.anaconda.com/miniconda/ |
| R v.4.4.1 | R Core Team | https://www.r-project.org/ |
| Bioconductor v.1.30.23 | Bioconductor Core Team4 | https://bioconductor.org/ |
| Festem v.1.2.1 | Chen et al.1 | https://github.com/XiDsLab/Festem |
| Seurat v.5.1.0 | Hao et al.5 | https://cloud.r-project.org/web/packages/Seurat/index.html |
| SeuratData v.0.2.2.9001 | Satija et al.6 | https://github.com/satijalab/seurat-data |
| devtools v.2.4.5 | Wickham et al.7 | https://cran.r-project.org/web/packages/devtools/index.html |
| edgeR v.3.32.1 | Robinson et al.8 | https://bioconductor.org/packages/release/bioc/html/edgeR.html |
| ScottKnott v.1.3–1 | Jelihovschi et al.9 | https://cran.r-project.org/web/packages/ScottKnott/index.html |
| harmony v.1.2.0 | Korsunsky et al.10 | https://cran.r-project.org/web/packages/harmony/index.html |
| dplyr v.1.1.4 | Wickham et al.11 | https://cloud.r-project.org/web/packages/dplyr/index.html |
| Other | ||
| Personal computer | HP, Inc. | HP Star Book Pro 14 (AMD Ryzen 7 8845H processor) |
Materials and equipment
All analyses performed here (and the associated timing estimates) were conducted on a personal computer with an AMD Ryzen 7 8845H processor with 8 cores and 32 GB of RAM (running on Windows 11).
Step-by-step method details
The Festem protocol consists of three main parts: marker gene identification (identify all heterogeneously distributed genes), cell clustering and assignment of marker genes to clusters (determine which cluster each marker gene represents).
For scRNA-seq data, different instructions are required depending on the presence of batch effects. To ensure clarity, we provide two variants of the Festem protocol to handle both types of data: For data without batch effects, follow steps 1–7; for multi-batch data, follow steps 8–14.
Single scRNA-seq dataset workflow
Timing: <1 min (for step 1)
Timing: ∼5 min (for step 2)
Timing: <1 min (for step 3)
Timing: ∼1 s (for step 4)
Timing: ∼1 min (for steps 5–7)
-
1.
Package and data import. Troubleshooting 5.
library(Seurat)
library(SeuratData)
library(Festem)
library(dplyr)
data("ifnb")
if (as.numeric(substr(packageVersion("SeuratObject"),1,1))==5){
ifnb <- UpdateSeuratObject(ifnb)
}
ifnb <- ifnb[,ifnb@meta.data$stim=="CTRL"]
-
2.
Run Festem to select cell-type marker genes. Troubleshooting 6 and Troubleshooting 7.
ifnb <- RunFestem(ifnb, num.threads = 4)
Note: To speed up Festem, you can enable parallelization by setting the “num.threads” parameter to the desired number of CPU cores.
-
3.
Clustering using Festem-selected marker genes.
gene_set <- rownames(ifnb)[ifnb[["RNA"]][[]][,"Festem_rank"] <= 2500]
ifnb <- NormalizeData(ifnb)
ifnb <- ScaleData(ifnb, features = gene_set)
ifnb <- RunPCA(ifnb , verbose = FALSE, features = gene_set)
ifnb <- FindNeighbors(object = ifnb, dims = 1:20)
ifnb <- FindClusters(object = ifnb, resolution = 1.5)
ifnb <- RunTSNE(ifnb, reduction = "pca", dims = 1:20)
-
4.
Visualize the clusters (Figure 1A).
DimPlot(ifnb, label = T) + NoLegend()
-
5.
Assign Festem-detected marker genes to identified clusters (Figure 1B).
marker <- AllocateMarker(ifnb,VariableFeatures(ifnb))
-
6.
Visualize marker gene expression across clusters and annotate clusters (Figures 2A and 2B). Troubleshooting 8.
ifnb <- RenameIdents(ifnb, "0" = "CD4 Naive T", "1" = "CD4 Memory T", "2" = "CD14 Monocyte", "3" = "CD14 Monocyte", "4" = "CD16 Monocyte", "5" = "CD14 Monocyte", "6" = "B", "7" = "T Activated", "8" = "NK", "9" = "CD4 Memory T", "10" = "DC", "11" = "CD8 T", "12" = "B Activated", "13" = "T cell:Monocyte Complex", "14" = "HSP+ CD4 T", "15" = "IFNhi CD14 Monocyte", "16" = "Mk", "17" = "pDC", "18" = "CD34+ Progenitors", "19" = "CD14 Monocyte")
DimPlot(ifnb, label = T) + NoLegend()
MarkerHeatmap(ifnb, VariableFeatures(ifnb))
CRITICAL: When using different operating systems, the cluster index and t-SNE plot may exhibit slight variations compared to Figure 1A. Before renaming them, users should verify the expression of canonical cell type markers (e.g., those in Figure 2C) to prevent errors.
-
7.
Check canonical markers of annotated cell-types (Figure 2C).
# Check canonical markers
marker_list <- c("CD3D","CREM","IL7R","CCR7","CD27","SELL","GIMAP5","CACYBP","TCF7","GNLY","NKG7","CCL5","CD247","GZMB","CD8A","MS4A1","CD79A","CD37","MIR155HG","NME1","FCGR3A","VMO1","MS4A7","CCL2","S100A9","CD14","LYZ","HLA-DQA1","GPR183","FCER1A","CST3","CD1C","TSPAN13","IL3RA","IGJ","HSPA1A","HSPB1","HSPA1B","HSPH1","HSPE1","HSPD1","CD34","TPSAB1","GATA2","SNHG7","ISG15","ISG20","IFI6","IFIT1","PPBP","PF4")
# Reorder cell types
ifnb@active.ident <- factor(ifnb@active.ident,levels = c("Mk","T cell:Monocyte Complex","IFNhi CD14 Monocyte","CD34+ Progenitors","HSP+ CD4 T","pDC","DC","CD14 Monocyte","CD16 Monocyte","B Activated","B","CD8 T","NK","T Activated","CD4 Naive T","CD4 Memory T"))
ifnb <- ScaleData(ifnb,features = marker_list)
DotPlot(ifnb, features = marker_list, cols = c("blue", "red"), dot.scale = 8, idents = levels(ifnb@active.ident)) + RotatedAxis()
Figure 1.
Clustering and marker genes in the control group in IFNB dataset
(A) UMAP plot of the control group in IFNB dataset generated from step 4.
(B) An example of marker genes detected by Festem from step 5. p values are adjusted with the Benjamini-Hochberg method.
Figure 2.
Cell type annotations and marker genes in the control group in IFNB dataset
(A) Cell type annotations of cells generated from step 6.
(B) Heatmap for marker gene expression generated from step 6.
(C) The expressions of canonical markers of different cell types generated from step 7.
Multi-batch scRNA-seq dataset workflow
Timing: <1 min (for step 8)
Timing: ∼10 min (for step 9)
Timing: ∼1 min (for steps 10 and 11)
Timing: ∼2 min (for step 12)
Timing: <1 min (for steps 13 and 14)
In our pipeline, we first use Festem to select marker genes, and then apply batch correction. Festem is applied to each batch individually, and the results are combined to obtain the final set of marker genes. The rationale behind this approach is that each batch contains valuable information about whether a gene is a marker. We can evaluate whether a gene is a marker gene for each batch individually and then combine evidence from all batches to achieve a more confident evaluation. The advantages of this method include: (1) minimal influence of batch effects on marker gene selection; (2) independence from the batch correction method used; (3) easy parallelization, making it computationally efficient for large datasets.
-
8.
Package and data import.
library(Seurat)
library(SeuratData)
library(Festem)
library(dplyr)
data("ifnb")
if (as.numeric(substr(packageVersion("SeuratObject"),1,1))==5){
ifnb <- UpdateSeuratObject(ifnb)
}
-
9.
Run Festem to select cell-type marker genes. Troubleshooting 6 & Troubleshooting 7.
ifnb <- RunFestem(ifnb, batch = "stim", num.threads = 4).
Note: You can also provide a vector containing batch labels for each cell, for example:
ifnb <- RunFestem(ifnb, batch = ifnb@meta.data$stim, num.threads = 4)
-
10.
Batch removal based on Festem-selected marker genes via Harmony.10
gene_set <- rownames(ifnb)[ifnb[["RNA"]][[]][,"Festem_rank"] <= 2500]
ifnb <- NormalizeData(ifnb)
ifnb <- ScaleData(ifnb, features = gene_set)
ifnb <- RunPCA(ifnb , verbose = FALSE, features = gene_set)
ifnb <- harmony::RunHarmony(ifnb,"stim",plot_convergence = T, lambda = 10)
-
11.
Clustering using Festem-selected marker genes.
ifnb <- FindNeighbors(object = ifnb, dims = 1:30, reduction = "harmony")
ifnb <- FindClusters(object = ifnb, resolution = 1.25)
ifnb <- RunTSNE(ifnb, reduction = "harmony", dims = 1:30)
-
12.
Assign Festem-detected marker genes to identified clusters.
marker <- AllocateMarker(ifnb,VariableFeatures(ifnb))
-
13.
Visualize marker gene expression across clusters and annotate clusters (Figures 3A and 3B). Troubleshooting 8.
ifnb <- RenameIdents(ifnb, "0" = "IFNhi CD14 Monocyte", "1" = "CD4 Naive T", "2" = "CD14 Monocyte", "3" = "CD4 Memory T", "4" = "CD16 Monocyte", "5" = "B", "6" = "CD8 T", "7" = "NK", "8" = "T Activated", "9" = "HSP+ CD4 T", "10" = "DC", "11" = "B Activated", "12" = "T cell:Monocyte Complex", "13" = "CD4 Naive T", "14" = "CD8 T", "15" = "Mk", "16" = "pDC", "17" = "B", "18" = "CD34+ Progenitors", "19" = "Eryth")
DimPlot(ifnb, label = T) + NoLegend()
MarkerHeatmap(ifnb, VariableFeatures(ifnb))
-
14.
Check canonical markers of annotated cell-types (Figure 3C).
# Check canonical markers
marker_list <- c("CD3D","CREM","IL7R","CCR7","CD27","SELL","GIMAP5","CACYBP","TCF7","GNLY","NKG7","CCL5","CD247","GZMB","CD8A","MS4A1","CD79A","CD37","MIR155HG","NME1","FCGR3A","VMO1","MS4A7","CCL2","S100A9","CD14","LYZ","HLA-DQA1","GPR183","FCER1A","CST3","CD1C","TSPAN13","IL3RA","IGJ","HSPA1A","HSPB1","HSPA1B","HSPH1","HSPE1","HSPD1","CD34","TPSAB1","GATA2","SNHG7","ISG15","ISG20","IFI6","IFIT1","PPBP","PF4","HBA2","HBB")
# Reorder cell types
ifnb@active.ident <- factor(ifnb@active.ident,levels = c("Eryth","Mk","T cell:Monocyte Complex","IFNhi CD14 Monocyte","CD34+ Progenitors","HSP+ CD4 T","pDC","DC","CD14 Monocyte","CD16 Monocyte","B Activated","B","CD8 T","NK","T Activated","CD4 Naive T","CD4 Memory T"))
ifnb <- ScaleData(ifnb,features = marker_list)
DotPlot(ifnb, features = marker_list, cols = c("blue", "red"), dot.scale = 8, idents = levels(ifnb@active.ident)) + RotatedAxis()
Figure 3.
Joint analysis of the control and stimulated group in IFNB dataset
(A) Cell type annotations of cells generated from step 13.
(B) Heatmap for marker gene expression generated from step 13.
(C) The expressions of canonical markers of different cell types generated from step 14.
Expected outcomes
Running the above pipelines will generate a Seurat object containing marker genes identified by Festem, along with clustering and dimensionality reduction results (Figure 1A). Specifically, the detected marker genes will be stored as “VariableFeatures” in the Seurat object, with their adjusted p-values located in the “meta.data” of the active assay. Additionally, a data frame containing fold-changes and adjusted p-values of marker genes will be generated (Figure 1B).
Quantification and statistical analysis
In this protocol, all p-values are adjusted using the Benjamini-Hochberg procedure.12 When the data consist of multiple batches, p-values are first combined using the Bonferroni method13 before FDR adjustment. Marker gene discoveries are made under FDR significance level of 0.05.
Limitations
Festem assumes that gene expressions follow a negative binomial distribution, a model suitable for most scRNA-seq count data, especially for the mostly widely used unique molecular identifier (UMI) data.14,15 Festem may have low precisions or recalls if the scRNA-seq data significantly violate this assumption. Another limitation of Festem is its higher computational time compared to many gene selection methods, owing to the use of EM iterations. Nevertheless, Festem can concurrently perform gene selection and identify cell-type markers with computational times that are comparable to or faster than many widely-used DEG identification methods.1 Therefore, we consider the computational time required by Festem to be largely acceptable.
Troubleshooting
Problem 1
Failed to install R package igraph. R Console reports an error: “installation of package 'igraph' had non-zero exit status”.
Potential solution
This issue may arise due to a missing or non-default location of the GLPK package, particularly on Linux systems without root privileges. To resolve this, install igraph using Miniconda before installing Seurat with the following code:
conda install conda-forge::r-igraph.
Problem 2
Conda dependency checks cost a lot of time.
Potential solution
You can use mamba as an alternative to conda. First, download Miniforge from https://github.com/conda-forge/miniforge?tab=readme-ov-file and configure mamba according to its tutorial at https://mamba.readthedocs.io/en/latest/installation/mamba-installation.html. Then, open the Miniforge terminal and install R and R packages using the following commands.
mamba create -n Festem python=3.12.2
mamba activate Festem
mamba install conda-forge::r-base=4.4.1
mamba install conda-forge::r-seurat=5.1.0
mamba install conda-forge::r-devtools
Problem 3
Festem failed to install on Windows. R Console reports an error: “Error in system(paste(MAKE, p1(paste("-f", shQuote(makefiles))), "compilers"), : 'make' not found”.
Potential solution
-
1.
Download RTools (4.4) from https://cran.r-project.org/bin/windows/Rtools/rtools44/rtools.html and install it on your system. Then, open the conda terminal and add the path of Rtools to the environment variable:
set PATH=%PATH%;C:\rtools44\usr\bin
(Replace “C:\rtools44\usr\bin” with the path where you installed RTools).
-
2.
Alternatively, you can download the binary version of Festem from https://github.com/XiDsLab/Festem/releases/download/v1.2.1/Festem_1.2.1.zip and install it through the R console:
install.packages(“Festem_1.2.1.zip”, repo = NULL)
Problem 4
Failed to download package “ifnb.SeuratData”.
Potential solution
Manually download the package “ifnb.SeuratData” from https://seurat.nygenome.org/src/contrib/ifnb.SeuratData_3.1.0.tar.gz. Then, run the following command in R console to install it.
install.packages("ifnb.SeuratData_3.1.0.tar.gz", repos = NULL, type = "source")
library(ifnb.SeuratData)
LoadData("ifnb")
Problem 5
Failed to subset the control group of the dataset. R Console reports an error: “invalid class "Assay" object: slots in class definition but not in object: ‘assay.orig’” (related to step 1).
Potential solution
This issue arises because the downloaded data is in the form of SeuratV4, while you are using SeuratV5. To resolve this, please run the following command.
ifnb <- UpdateSeuratObject(ifnb)
Problem 6
When running Festem, R runs out of memory (related to step 2 and step 9).
Potential solution
Use a smaller block size during parallelization. For example, use the following code.
# Single scRNA-seq dataset
ifnb <- RunFestem(ifnb, num.threads = 4, block_size = 1e4)
# Multi-batch scRNA-seq dataset
ifnb <- RunFestem(ifnb, batch = "stim", num.threads = 4, block_size = 1e4)
Problem 7
We have prior information about cell types, how can we incorporate the prior with the Festem protocol? (related to step 2 and step 9).
Potential solution
If we have a good estimation of the number of cell types in the dataset, we can set the parameter “G” to enable Festem generating a pre-clustering with G cell types, for example.
ifnb <- RunFestem(ifnb, G = 14, num.threads = 4)
If we have a pre-clustering result or cell labels for the dataset, we can also provide it to Festem by setting the parameter “prior”. For example, if a pre-clustering result is stored in a vector called “label”, then we can use the following code.
ifnb <- RunFestem(ifnb, prior = label, num.threads = 4)
Problem 8
R fails or takes too much time to generate a heatmap for marker gene expressions (related to step 6 and step 13).
Potential solution
To reduce the time and computational resources required, you can sub-sample a smaller fraction of cells for plotting by setting the parameter “plot_cell_prop”. For example.
MarkerHeatmap(ifnb, VariableFeatures(ifnb), plot_cell_prop = 0.01)
Resource availability
Lead contact
Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Dr. Ruibin Xi (ruibinxi@math.pku.edu.cn).
Technical contact
Technical questions on executing this protocol should be directed to and will be answered by the technical contact, Zihao Chen (g.e.challenger@pku.edu.cn) and Dr. Changhu Wang (wangch156@pku.edu.cn).
Materials availability
This study did not generate any new reagents.
Data and code availability
-
•
The datasets used in these workflows are downloaded using SeuratData. Raw expression counts are available at 10× Genomics: https://www.10xgenomics.com/resources/datasets.
-
•
Additional information and R code for the latest version of Festem is provided on the GitHub repository: https://github.com/XiDsLab/Festem and at Zenodo: https://doi.org/10.5281/zenodo.14159896.
-
•
Preprocessed datasets analyzed in this protocol were available at Zenodo: https://doi.org/10.5281/zenodo.11331165.
Acknowledgments
We thank Qinghua Ran for testing our scripts. This work was supported by the National Key R&D Program of China (2020YFE0204200 to R.X.), the National Natural Science Foundation of China (12425110 and 12371286 to R.X.), the Foundation of Shuanghu Laboratory (SH-2024JK10 to R.X.), and the Sino-Russian Mathematics Center.
Author contributions
Conceptualization, R.X. and C.W.; methodology, C.W. and Z.C.; software, Z.C. and C.W.; formal analysis, Z.C.; writing, Z.C. and R.X.; funding acquisition and supervision, R.X.
Declaration of interests
R.X. holds stock in GeneX Health Co., Ltd.
Declaration of generative AI and AI-assisted technologies in the writing process
During the preparation of this work, the authors used Microsoft Copilot in Bing in order to polish the text. After using this tool/service, the authors reviewed and edited the content as needed and take full responsibility for the content of the published article.
Contributor Information
Zihao Chen, Email: g.e.challenger@pku.edu.cn.
Changhu Wang, Email: wangch156@pku.edu.cn.
Ruibin Xi, Email: ruibinxi@math.pku.edu.cn.
References
- 1.Chen Z., Wang C., Huang S., Shi Y., Xi R. Directly selecting cell-type marker genes for single-cell clustering analyses. Cell Rep. Methods. 2024;4 doi: 10.1016/j.crmeth.2024.100810. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Zhang J.M., Kamath G.M., Tse D.N. Valid post-clustering differential analysis for single-cell RNA-seq. Cell Syst. 2019;9:383–392.e6. doi: 10.1016/j.cels.2019.07.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Kang H.M., Subramaniam M., Targ S., Nguyen M., Maliskova L., McCarthy E., Wan E., Wong S., Byrnes L., Lanata C.M., et al. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat. Biotechnol. 2018;36:89–94. doi: 10.1038/nbt.4042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Huber W., Carey V.J., Gentleman R., Anders S., Carlson M., Carvalho B.S., Bravo H.C., Davis S., Gatto L., Girke T., et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat. Methods. 2015;12:115–121. doi: 10.1038/nmeth.3252. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Hao Y., Stuart T., Kowalski M.H., Choudhary S., Hoffman P., Hartman A., Srivastava A., Molla G., Madad S., Fernandez-Granda C., Satija R. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat. Biotechnol. 2024;42:293–304. doi: 10.1038/s41587-023-01767-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Satija R., Hoffman P., Butler A. SeuratData: Install and Manage Seurat Datasets R package version 0.2.2.9001, commit 4dc08e022f51c324bc7bf785b1b5771d2742701d. 2023. https://github.com/satijalab/seurat-data
- 7.Wickham H., Hester J., Chang W., Bryan J. devtools: Tools to Make Developing R Packages Easier R package version 2.4.5. 2022. https://CRAN.R-project.org/package=devtools
- 8.Robinson M.D., McCarthy D.J., Smyth G.K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26:139–140. doi: 10.1093/bioinformatics/btp616. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Jelihovschi E., Faria J.C., Allaman I.B. ScottKnott: a package for performing the Scott-Knott clustering algorithm in R. TeMA. 2014;15 003-017. [Google Scholar]
- 10.Korsunsky I., Fan J., Slowikowski K., Zhang F., Wei K., Baglaenko Y., Brenner M.B., Loh P.-R., Raychaudhuri S. Fast, sensitive, and accurate integration of single cell data with Harmony. Nat. Methods. 2018;16:1289–1296. doi: 10.1038/s41592-019-0619-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Wickham H., François R., Henry L., Müller K., Vaughan D. dplyr: A Grammar of Data Manipulation R package version 1.1.4. 2023. https://CRAN.R-project.org/package=dplyr
- 12.Benjamini Y., Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Stat. Soc. B. 1995;57:289–300. [Google Scholar]
- 13.Vovk V., Wang R. Combining p-values via averaging. Biometrika. 2020;107:791–808. [Google Scholar]
- 14.Chen W., Li Y., Easton J., Finkelstein D., Wu G., Chen X. UMI-count modeling and differential expression analysis for single-cell RNA sequencing. Genome Biol. 2018;19:70. doi: 10.1186/s13059-018-1438-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Choi K., Chen Y., Skelly D.A., Churchill G.A. Bayesian model selection reveals biological origins of zero inflation in single-cell transcriptomics. Genome Biol. 2020;21:183. doi: 10.1186/s13059-020-02103-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
-
•
The datasets used in these workflows are downloaded using SeuratData. Raw expression counts are available at 10× Genomics: https://www.10xgenomics.com/resources/datasets.
-
•
Additional information and R code for the latest version of Festem is provided on the GitHub repository: https://github.com/XiDsLab/Festem and at Zenodo: https://doi.org/10.5281/zenodo.14159896.
-
•
Preprocessed datasets analyzed in this protocol were available at Zenodo: https://doi.org/10.5281/zenodo.11331165.

Timing: ∼1 h
CRITICAL: When using different operating systems, the cluster index and t-SNE plot may exhibit slight variations compared to 

