Abstract
The application and analysis of single-cell transcriptomics in toxicology presents unique challenges. These include identifying cell sub-populations sensitive to perturbation; interpreting dynamic shifts in cell type proportions in response to chemical exposures; and performing differential expression analysis in dose-response studies spanning multiple treatment conditions. This review examines these challenges while presenting best practices for critical single cell analysis tasks. This covers areas such as cell type identification; analysis of differential cell type abundance; differential gene expression; and cellular trajectories. Towards enhancing the use of single-cell transcriptomics in toxicology, this review aims to address key challenges in this field and offer practical analytical solutions. Overall, applying appropriate bioinformatic techniques to single-cell transcriptomic data can yield valuable insights into cellular responses to toxic exposures.
1. Introduction
The era of single cell RNA sequencing (scRNA-seq) technologies has given us an unprecedented view of transcriptomic heterogeneity, even among cells of the same type and within the same tissue (Nguyen et al., 2018). Both the basal state of the cellular transcriptome and its response to perturbation by chemicals or drugs can be heterogenous (Chen et al. 2023). One source of this heterogeneity is transcriptional noise or fluctuations in gene expression, which can result from both intrinsic and extrinsic factors, such as transcriptional bursting and subtle variations in the immediate chemical environment of the cell (Elowitz et al., 2002). The extent of noise in gene expression influences rates of switching among cellular states and consequently the resulting shape of dose-response curves (Bhattacharya et al., 2010). Single cell transcriptomics, unlike bulk measures, can reveal differential cellular sensitivities to a chemical or drug (Q. Zhang et al., 2019). For example, an enrichment score based on single-cell gene expression signatures of tumor cells was used to predict the extent of perturbation induced by various drugs in “therapeutic clusters”, i.e., groups of cells with a similar drug response profile (Fustero-Torre et al., 2021). However, such characterization of cellular heterogeneity in responsiveness to environmental pollutants is lacking.
In addition to heterogeneity of response, cellular identities may be malleable to chemical exposure, thus leading to altered cell type proportions as a function of dose. These shifts in cell composition can be attributed to cell type-specific physiological and pathological processes such as cell death, immune infiltration, and altered differentiation (Haimbaugh et al. 2022; Nault et al. 2021; Khan et al. 2023). For example, the testes of zebrafish exposed to 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD) exhibited significant depletion of sperm cell types (Haimbaugh et al. 2022). Additionally, within the remaining sperm cells, scRNA-seq revealed enrichment of apoptosis and sperm disorder pathways which were not detectable in bulk RNA-seq. In livers of mice exposed to TCDD, the proportion of multiple immune cell types increased as a function of dose (Nault et al. 2021). Cell type proportions can also shift following chemical insult because of alterations in the process of cellular differentiation. scRNA-seq analysis of human hematopoiesis in vitro revealed that exposure to TCDD increased the proportions of monocytes and granulocytes, accompanied by a simultaneous depletion of B cells (Khan et al. 2023).
Below we focus on challenges most often faced in single cell transcriptomic analyses in toxicology, namely cell type identification, differential cell type abundance analysis, differential expression analysis, and cell trajectory analysis.
2. Preprocessing
Preprocessing is the essential first step in single cell transcriptomic analysis, consisting of quality control and normalization (Figure 1). Quality control should ideally be performed on each sample such that cells and genes with poor data quality are filtered out. In practice, most single cell preprocessing protocols filter out cells expressing fewer than 200 or more than 2500 genes, or cells having more than 5–20% counts originating from mitochondrial genes (S. Zhang et al., 2023). Cells expressing an excessively large number of genes could be reflective of a technical artifact where two cells were captured together and mistakenly labelled as one, referred to as a doublet. In datasets with higher sequencing depth and multiple cell types, doublets can be more accurately removed by specialized algorithms rather than simple cutoffs. In a recent benchmark of such algorithms, DoubletFinder (McGinnis et al., 2019) was found to have the best overall doublet detection accuracy (Xi & Li, 2021). Additionally, chemical treatment can alter the physical properties of cells, such as cell-cell adhesion (Collins et al., 2009), which can in turn affect the likelihood of cells forming doublets. Chemical exposure can also result in cell death and release of mRNA molecules into the solution, increasing the likelihood of capturing cell free (ambient) mRNA in droplet-based scRNA-seq experiments (Zheng et al., 2017). Computational tools like SoupX (Young & Behjati, 2020) can correct mRNA counts for the contaminating effects of cell free ambient RNA.
Figure 1. Single cell RNA-seq preprocessing workflow.

Raw counts from single cell or single nuclei RNA-seq experiments undergo quality control and normalization prior to downstream analysis. First, as part of quality control, low quality cells and genes are removed. Then doublets are identified and removed with DoubletFinder. Finally, ambient RNA is removed with SoupX, and pooling normalization done using scran.
Following quality control, the count data requires normalization before downstream analyses. Normalization transforms the count data to minimize technical cell-to-cell variation and cell-specific biases, such as capture efficiency and library size. The pooling normalization of scran is an effective and often used normalization method (Lun et al., 2016). Finally, the normalized data should be log(x+1) transformed, where x is the normalized gene count.
3. Cell Type Identification
Cell type identification is one of the more challenging tasks in the analysis of scRNA-seq data. While methods which simultaneously quantify transcriptomes and cell surface protein abundance like CITE-seq (Stoeckius et al. 2017) can provide more concrete cell type biomarkers, most single cell protocols rely only on mRNA expression for cell type identification (Heumos et al. 2023). Typically, cell type identification involves unsupervised clustering of cells, followed by manual cluster annotation using cell type-specific marker genes (Figure 2).
Figure 2. Cell type identification workflow.

Features of normalized single cell gene expression are filtered such that only a proportion of highly variable genes are retained using scanpy (Python) or Seurat (R). Samples are integrated using scVI or Scanorama if reference cell type expression is not available and with scANVI if the reference is available. Integrated data is clustered with Leiden clustering or with GiniClust – depending on whether rare cell types are present and how important it is to identify them. Finally, clusters are annotated as different cell types using cell type-specific marker genes available in databases such as PanglaoDB.
3.1. Integration with batch correction
Data integration across multiple samples prior to clustering ensures a higher number of cells within each cell type, increasing the statistical power of clustering methods. This is especially relevant in toxicology, where we can expect to have several disparate samples representing different doses or duration of treatment. Confounding sources of variation that result from technical differences in single cell experiments such as different sequencing technologies, laboratories, protocols, or batches across different biological replicates are called batch effects. These effects should be removed so as not to mask biological variation when integrating single cell data. In addition, certain biological factors such as inter-donor and sampling location variability are often considered unwanted and can also be removed as batch effects (Heumos et al. 2023). The choice of method for data integration and batch correction depends on the properties of the data set. On smaller and less complex datasets (i.e., < 10,000 cells), tools that utilize canonical correlation analysis (CCA) like Seurat (Butler et al. 2018) are more appropriate. A recent benchmark of 16 data integration tools (Luecken et al., 2022), demonstrated that scVI (Lopez et al. 2018b) and Scanorama (Hie et al., 2019) perform better when applied to larger and more complex datasets. In the case of tissues with well-established reference datasets, e.g., the Allen single cell mouse brain atlas (S. Zhang et al., 2023), automated frameworks that identify cell types in a semi-supervised manner like scANVI (Xu et al. 2021) outperform other methods. Additionally, selection of highly variable genes prior to integration improves performance in data integration (Luecken et al., 2022). When performing batch correction, the proper selection of batch covariates is vital. It is possible to quantify the variance attributable to different technical covariates and make the choice of batch covariates accordingly (Sikkema et al., 2023)
3.2. Clustering
Clustering is an unsupervised machine learning technique that can be applied to gene expression data to discover groups of cells that are transcriptionally more similar within groups than across groups. Several types of clustering methods have been developed, with different advantages and weaknesses. Community-detection-based methods such as Leiden clustering (Traag, Waltman, and van Eck 2019) are commonly used but cannot estimate the number of clusters and do not perform well for data sets with rare cell types. On the other hand, density-based methods such as GiniClust (Jiang et al., 2016) perform better in detecting rare cell types but sacrifice performance on larger clusters (S. Zhang et al., 2023). Further, to make the clustering methods more effective, we need to first perform dimensionality reduction on the normalized or integrated data, such as filtering of top principal components derived from principal component analysis (S. Zhang et al., 2023). Finally, the expression data can be visualized using non-linear dimensionality reduction methods such as UMAP (Becht et al., 2019).
3.3. Cell type annotation
Marker genes for cell type annotation can be obtained from compendia of curated cell type markers such as PanglaoDB (Franzén et al., 2019). Manual annotation of cell types requires a good understanding of the underlying gene expression patterns of individual cell types in the tissue of interest. In toxicology, this process can be made more difficult by the possibility that chemical exposure could alter the expression of key genes within a given cell type. For example, in hepatocytes making up the liver lobule, a building block of the liver with spatially distributed patterns of expression, sub-chronic TCDD treatment resulted in dose-dependent repression of both portal and central hepatocyte marker genes such as Cyp2f2 and Glul (Nault et al., 2023). Therefore, to avoid relying on marker genes that may have been altered by treatment, it is advisable to investigate cluster-specific expression of multiple cell-type marker genes when annotating cells.
4. Differential cell type abundance analysis
Influx or depletion of cell types within the sampled tissue following treatment will invariably affect the relative proportions of other cell types. For example, in livers of TCDD-treated mice the proportion of B cells increased to 24.7% compared to just 0.5% in control livers (Nault et al. 2021). Consequently, the proportion of hepatocytes was reduced drastically. However, it is unlikely that such a considerable hepatocyte loss or death occurred due to treatment. To address this problem, differential abundance analysis methods were developed to describe statistically significant changes in abundances of different cell types between conditions. A recent article by Büttner et al. benchmarked many potential differential abundance analysis methods, such as scDC (Y. Cao et al., 2019), against their own method, scCODA (Büttner et al., 2021). The authors concluded that scCODA performed the best on low-sample cases on non-rare cell types. However, power analysis revealed that for most methods, much higher sample numbers (20–30 samples) would be required to discover similar changes in rare cell type populations (Büttner et al., 2021).
5. Differential gene expression analysis
Once cell types are identified from the data, cell type-specific differential gene expression analysis (DGEA) can be performed to determine which genes exhibit significantly altered expression in a particular experimental condition. In toxicology this is often the result of chemical exposure, a change in diet, or onset of disease. In the case of a simple two-condition scenario, there is a plethora of single cell differential expression methods that could be applied (Squair et al. 2021).
In bulk tissue, differential expression is performed by considering two conditions and their biological replicates, fitting the raw counts to a negative binomial distribution, and comparing them using a statistical test such as the likelihood ratio test or Wald test, implemented in software packages like DESeq2 (Love, Huber, and Anders 2014). However, single cell transcriptomics introduces a wrinkle in this standard procedure. Instead of having only biological replicates, we also have pseudo-replicates in the form of individual cells (Zimmerman, Espeland, and Langefeld 2021). Typically, most single cell differential expression packages such as Seurat (Butler et al. 2018) and SCANPY (Wolf et al., 2018) implement the Wilcoxon Rank sum test (Wilcoxon 1945), treating individual cells as independent observations. Consequently, these approaches ignore the inherent correlations among cells of the same type and thus often exhibit inflated false positive rates. An alternative approach that exhibits improved control over false positives consists of calculating the average gene expression across all cells within individual cell types and then applying bulk-tissue differential expression methods like edgeR (Robinson, McCarthy, and Smyth 2010). This aggregated gene expression is often referred to as pseudo-bulk expression. (Squair et al. 2021; Zimmerman, Espeland, and Langefeld 2021)
Differential gene expression analysis in toxicology is further complicated by the presence of multiple different doses or conditions. This introduces a multiple group study design, for instance with different treatment doses. One approach would be to analyze the differential gene expression individually, in a pairwise fashion between each combination of doses or conditions using standard two-condition methods. However, this approach is prone to inflated false positives as the number of pairwise comparisons increases. A fit-for-purpose multiple-condition test that can account for multiple comparisons would reduce false positive rates at the expense of an increase in false negatives (Nault et al. 2022). Still, as false positives often lead to wasted efforts in identifying chemical or drug mode of action, the conservative fit-for-purpose scBT method (Nault et al. 2022) is preferable over the repeated use of two-condition comparison tests for analysis of multiple doses or chemical study designs. A notable exception is the case of cell types that experience a dose-dependent decrease in abundance with treatment. In this case, the linear multiple group likelihood ratio test outperforms other methods (Nault et al. 2022).
6. Cell trajectory analysis and comparison
Trajectories in single cell data represent changes over time (e.g., in mammalian organogenesis) (J. Cao et al. 2019) or dose (e.g., dose-response of a particular cell-type). Here cells are ordered based on expression patterns along a continuum of the biological process (“pseudo-time”) or dose-response process underlying the trajectory (Qiu et al. 2017; Haghverdi et al. 2016). Of the many methods for pseudo-time trajectory inference, the selection of which one to use will partially depend on the underlying topology of the process (Saelens et al. 2019). On simpler, non-branching trajectories, methods that simply order cells along the pseudo-time like TSCAN can be applied (Ji and Ji 2016). On more complex, branching trajectories, methods like Slingshot should be applied instead (Street et al. 2018). Alternatively, if the trajectory describes a dose-response process, then the recently implemented “pseudo-dose method” may be more appropriate (Kana et al. 2023).
Comparing trajectories within and across treatment conditions can yield useful biological insights. The PseudotimeDE method (Song and Li 2021) can perform differential gene expression analysis along one trajectory (e.g., from hematopoietic stem cells to fully developed B cells), or across multiple trajectories (e.g., myelopoiesis vs. lymphopoiesis). Comparison across treatment conditions can be handled by the condiments method (Bézieux et al. 2021) by considering whether: 1) the underlying process in each condition represents a shared trajectory (differential topology); 2) there are any differences of progression along shared trajectories (differential progression); and 3) there are differentially expressed genes with respect to biological condition. The Lamian package adopts a similar framework but performs statistical inference after accounting for cross-sample variability, effectively reducing the sample specific false discovery rates (Hou et al., 2021). However, comparing statistical approaches for differential trajectory analysis across conditions is still an area of active study. It is unknown how well more traditional bulk RNA-seq time-series methods that utilize the likelihood ratio test, e.g., DESeq2 (Love, Huber, and Anders 2014) or Bayes factor like DyNB (Äijö et al. 2014) compare with the above methods.
7. Regulatory network inference
Recent advances in single-cell omics have enabled the inference of cell type-specific gene regulatory networks, providing insights into the mechanisms driving cellular state transitions (Badia-i-Mompel et al., 2023). By integrating transcriptomic and epigenomic data, tools like Sc-compReg (Duren et al., 2021) systematically compare inferred transcription factor-target wiring between conditions. Additionally, scGeneRAI (Keyl et al., 2023) uses deep learning to reconstruct networks at the level of individual cells and compares networks between two cells or samples by computing a “network similarity” score: the average number of shared regulatory links among the two networks. Such a score can give an aggregate measure of differential gene regulation under chemical perturbation and help characterize dose-response from a network perspective.
8. Spatial transcriptomic analysis
The emergence of imaging- and sequencing-based spatially resolved transcriptomics has enabled discovery of biological mechanisms and alterations in gene expression at the tissue level (Bressan et al., 2023). Chemically induced pathologies can be localized to certain regions of complex tissues, e.g., acetaminophen-induced liver injury arising in the pericentral region of the liver lobule. Development of non-alcoholic fatty liver disease entails liver region-specific infiltration of immune cells (Karlmark et al., 2009). Alterations in localized cell-type composition and cell-cell communication patterns resulting from pathologies can be profiled with spatial omics. The data sets generated by spatial omics tools tend to be extremely large and require specialized algorithms for analysis. The analysis pipeline typically includes additional steps enabled by the spatial dimension embedded in the data, like identification of: (i) spatially variable genes, (ii) tissue “neighborhoods” with coordinated patterns of gene expression, and i(iii) ligand-receptor signaling interactions between spatially proximal cell types (Bressan et al., 2023). Cell types in heterogeneous tissue regions can be decomposed by tools like RCTD (Cable et al., 2022), while spatial patterns of gene expression can be assessed and compared across conditions with SpatialDE (Svensson et al., 2018). In toxicology, assessing differential patterns of spatial gene expression across dose-response will require development of novel computational tools.
9. Concluding remarks
The rise of single cell transcriptomics has enabled the examination of distinct cell types and the role they play in response to chemical exposure. Differential expression analysis of bulk transcriptomic data cannot distinguish between gene expression changes arising from altered cell type proportions or altered expression within a cell type, especially within a small subpopulation of cells particularly sensitive to perturbation (Nault et al., 2021). Single-cell differential gene expression analysis can inform estimation of biologically relevant points of departure and improve our understanding of adverse responses at the cell and tissue levels (National Toxicology Program, 2018).
However, application of single-cell transcriptomics in toxicology is still in its infancy and poses unique challenges. Given that toxicants can alter the expression of cell type-specific marker genes (Nault et al., 2023), integrating the whole data set together may lead to misidentification of cell types. Additionally, since relative cell type proportions can shift with chemical treatment, certain cell types may not be represented in all samples. This is particularly difficult to resolve when the cell type that shifts in abundance is already poorly represented in the cell population. Choice of the best approach to data integration and cell type identification in toxicological datasets remains an open challenge.
Highlights.
The analysis of single-cell gene expression data in toxicology presents unique challenges.
Chemical exposure can alter cell type proportions and expression of cell type-specific marker genes.
We discuss best practices for cell type identification, analysis of differential cell type abundance, differential expression analysis, and cell trajectory analysis in toxicology.
Funding and acknowledgments
This work was partially supported by Michigan AgBioResearch and the National Institute of Environmental Health Sciences of the National Institutes of Health (R01 ES031937 and P42 ES004911). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Sudin Bhattacharya reports financial support was provided by National Institutes of Health. If there are other authors, they declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Declaration of generative AI in scientific writing
During the preparation of this work the authors used claude.ai in order to improve the clarity of writing by providing prompts such as “rephrase in a way that increases clarity” to ensure that the key points were clearly communicated. After using this tool, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication.
Declaration of competing interest
The authors declare no conflict of interests.
Data availability
No data was used for the research described in the article.
References
- Äijö T, Butty V, Chen Z, Salo V, Tripathi S, Burge CB, Lahesmaa R, & Lähdesmäki H (2014). Methods for time series analysis of RNA-seq data with application to human Th17 cell differentiation. Bioinformatics, 30(12), i113. 10.1093/BIOINFORMATICS/BTU274 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Badia-i-Mompel P, Wessels L, Müller-Dott S, Trimbour R, Ramirez Flores RO, Argelaguet R, & Saez-Rodriguez J (2023). Gene regulatory network inference in the era of single-cell multi-omics. Nature Reviews. Genetics, 24(11), 739–754. 10.1038/S41576-023-00618-5 [DOI] [PubMed] [Google Scholar]
- Bézieux H. R. de, Berge K Van den, Street K, & Dudoit S (2021). Trajectory inference across multiple conditions with condiments: differential topology, progression, differentiation, and expression. BioRxiv, 2021.03.09.433671. 10.1101/2021.03.09.433671 [DOI] [Google Scholar]
- Bhattacharya S, Conolly RB, Kaminski NE, Thomas RS, Andersen ME, & Zhang Q (2010). A bistable switch underlying B-cell differentiation and its disruption by the environmental contaminant 2,3,7,8-Tetrachlorodibenzo-p-dioxin. Toxicological Sciences, 115(1), 51–65. 10.1093/toxsci/kfq035 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bressan D, Battistoni G, & Hannon GJ (2023). The dawn of spatial omics. Science (New York, N.Y.), 381(6657). 10.1126/SCIENCE.ABQ4964 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Butler A, Hoffman P, Smibert P, Papalexi E, & Satija R (2018). Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nature Biotechnology 2018 36:5, 36(5), 411–420. 10.1038/nbt.4096 [DOI] [PMC free article] [PubMed] [Google Scholar]
- (*).Büttner M, Ostner J, Müller CL, Theis FJ, & Schubert B (2021). scCODA is a Bayesian model for compositional single-cell data analysis. Nature Communications, 12(1). 10.1038/S41467-021-27150-6 [DOI] [PMC free article] [PubMed] [Google Scholar]; This study introduces scCODA (single-cell compositional data analysis), a Bayesian fit for purpose model for detection of changes in cell type composition between different experimental conditions. scCODA demonstrated excellent detection performance, while reliably controlling for false discoveries. Further, scCODA identified experimentally verified cell type changes that were overlooked by other differential cell type abundance analysis methods.
- Cable DM, Murray E, Zou LS, Goeva A, Macosko EZ, Chen F, & Irizarry RA (2022). Robust decomposition of cell type mixtures in spatial transcriptomics. Nature Biotechnology, 40(4), 517–526. 10.1038/S41587-021-00830-W [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cao J, Spielmann M, Qiu X, Huang X, Ibrahim DM, Hill AJ, Zhang F, Mundlos S, Christiansen L, Steemers FJ, Trapnell C, & Shendure J (2019). The single-cell transcriptional landscape of mammalian organogenesis. Nature 2019 566:7745, 566(7745), 496–502. 10.1038/s41586-019-0969-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cao Y, Lin Y, Ormerod JT, Yang P, Yang JYH, & Lo KK (2019). scDC: single cell differential composition analysis. BMC Bioinformatics, 20(Suppl 19). 10.1186/S12859-019-3211-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen JY, Hug C, Reyes J, Tian C, Gerosa L, Fröhlich F, Ponsioen B, Snippert HJG, Spencer SL, Jambhekar A, Sorger PK, & Lahav G (2023). Multi-range ERK responses shape the proliferative trajectory of single cells following oncogene induction. Cell Reports, 42(3), 112252. 10.1016/J.CELREP.2023.112252 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Collins LL, Lew BJ, & Lawrence BP (2009). TCDD exposure disrupts mammary epithelial cell differentiation and function. Reproductive Toxicology (Elmsford, N.Y.), 28(1), 11–17. 10.1016/J.REPROTOX.2009.02.013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Duren Z, Lu WS, Arthur JG, Shah P, Xin J, Meschi F, Li ML, Nemec CM, Yin Y, & Wong WH (2021). Sc-compReg enables the comparison of gene regulatory networks between conditions using single-cell data. Nature Communications, 12(1). 10.1038/S41467-021-25089-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Elowitz MB, Levine AJ, Siggia ED, & Swain PS (2002). Stochastic gene expression in a single cell. Science (New York, N.Y.), 297(5584), 1183–1186. 10.1126/SCIENCE.1070919 [DOI] [PubMed] [Google Scholar]
- Franzén O, Gan LM, & Björkegren JLM (2019). PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data. Database : The Journal of Biological Databases and Curation, 2019(1), baz046–baz046. 10.1093/DATABASE/BAZ046 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fustero-Torre C, Jiménez-Santos MJ, García-Martín S, Carretero-Puche C, García-Jimeno L, Ivanchuk V, Di Domenico T, Gómez-López G, & Al-Shahrour F (2021). Beyondcell: targeting cancer therapeutic heterogeneity in single-cell RNA-seq data. Genome Medicine, 13(1). 10.1186/S13073-021-01001-X [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haghverdi L, Büttner M, Wolf FA, Buettner F, & Theis FJ (2016). Diffusion pseudotime robustly reconstructs lineage branching. Nature Methods 2016 13:10, 13(10), 845–848. 10.1038/nmeth.3971 [DOI] [PubMed] [Google Scholar]
- Haimbaugh A, Akemann C, Meyer D, Gurdziel K, & Baker TR (2022). Insight into 2,3,7,8-tetrachlorodibenzo-p-dioxin-induced disruption of zebrafish spermatogenesis via single cell RNA-seq. PNAS Nexus, 1(3). 10.1093/PNASNEXUS/PGAC060 [DOI] [PMC free article] [PubMed] [Google Scholar]
- (**).Heumos L, Schaar AC, Lance C, Litinetskaya A, Drost F, Zappia L, Lücken MD, Strobl DC, Henao J, Curion F, Aliee H, Ansari M, Badia-i-Mompel P, Büttner M, Dann E, Dimitrov D, Dony L, Frishberg A, He D, … Theis FJ (2023). Best practices for single-cell analysis across modalities. Nature Reviews Genetics 2023 24:8, 24(8), 550–572. 10.1038/s41576-023-00586-w [DOI] [PMC free article] [PubMed] [Google Scholar]; This review summarizes independent benchmarking studies of unimodal and multimodal single cell analyses and provides a comprehensive set of best-practice workflows for most common types of single cell analyses.
- Hie B, Bryson B, & Berger B (2019). Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nature Biotechnology, 37(6), 685–691. 10.1038/S41587-019-0113-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hou W, Ji Z, Chen Z, Wherry EJ, Hicks SC, & Ji H (2021). A statistical framework for differential pseudotime analysis with multiple single-cell RNA-seq samples. BioRxiv. 10.1101/2021.07.10.451910 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ji Z, & Ji H (2016). TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis. Nucleic Acids Research, 44(13), e117–e117. 10.1093/NAR/GKW430 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jiang L, Chen H, Pinello L, & Yuan GC (2016). GiniClust: detecting rare cell types from single-cell gene expression data with Gini index. Genome Biology, 17(1). 10.1186/S13059-016-1010-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- (*).Kana O, Nault R, Filipovic D, Marri D, Zacharewski T, & Bhattacharya S (2023). Generative modeling of single-cell gene expression for dose-dependent chemical perturbations. Patterns, 4(8), 100817. 10.1016/j.patter.2023.100817 [DOI] [PMC free article] [PubMed] [Google Scholar]; This study introduces scVIDR (single-cell variational inference of dose-response), a generative deep learning framework for prediction of single cell transcriptomic changes due to chemical perturbations across cell types and chemical doses. scVIDR is a fit-for-purpose tool for the prediction of transcriptomic perturbations across the dose-response and outperforms most current tools for prediction of single cell perturbations which are usually designed for single dose perturbations.
- Karlmark KR, Weiskirchen R, Zimmermann HW, Gassler N, Ginhoux F, Weber C, Merad M, Luedde T, Trautwein C, & Tacke F (2009). Hepatic recruitment of the inflammatory Gr1+ monocyte subset upon liver injury promotes hepatic fibrosis. Hepatology (Baltimore, Md.), 50(1), 261–274. 10.1002/HEP.22950 [DOI] [PubMed] [Google Scholar]
- Keyl P, Bischoff P, Dernbach G, Bockmayr M, Fritz R, Horst D, Blüthgen N, Montavon G, Müller KR, & Klauschen F (2023). Single-cell gene regulatory network prediction by explainable AI. Nucleic Acids Research, 51(4), E20–E20. 10.1093/NAR/GKAC1212 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Khan DMIO, Karmaus PWF, Bach A, Crawford RB, & Kaminski NE (2023). An in vitro model of human hematopoiesis identifies a regulatory role for the aryl hydrocarbon receptor. Blood Advances, 7(20). 10.1182/BLOODADVANCES.2023010169 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lopez R, Regier J, Cole MB, Jordan MI, & Yosef N (2018). Deep generative modeling for single-cell transcriptomics. Nature Methods, 15(12), 1053–1058. 10.1038/S41592-018-0229-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Love MI, Huber W, & Anders S (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15(12), 1–21. 10.1186/S13059-014-0550-8/FIGURES/9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- (**).Luecken MD, Büttner M, Chaichoompu K, Danese A, Interlandi M, Mueller MF, Strobl DC, Zappia L, Dugas M, Colomé-Tatché M, & Theis FJ (2022). Benchmarking atlas-level data integration in single-cell genomics. Nature Methods, 19(1), 41–50. 10.1038/s41592-021-01336-8 [DOI] [PMC free article] [PubMed] [Google Scholar]; This study provides a benchmark of 68 combinations of single cell data integration and preprocessing methods for complex data integration tasks such as atlas-level single cell transcriptomic integration. Scanorama, scVI, scGen and scANVI methods were shown to perform well, particularly on complex transcriptomic integration tasks.
- Lun ATL, Bach K, & Marioni JC (2016). Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biology, 17(1). 10.1186/S13059-016-0947-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- McGinnis CS, Murrow LM, & Gartner ZJ (2019). DoubletFinder: Doublet Detection in Single-Cell RNA Sequencing Data Using Artificial Nearest Neighbors. Cell Systems, 8(4), 329–337.e4. 10.1016/J.CELS.2019.03.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nault R, Fader KA, Bhattacharya S, & Zacharewski TR (2021). Single-Nuclei RNA Sequencing Assessment of the Hepatic Effects of 2,3,7,8-Tetrachlorodibenzo-p-dioxin. Cellular and Molecular Gastroenterology and Hepatology, 11(1), 147–159. 10.1016/J.JCMGH.2020.07.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- (**).Nault R, Saha S, Bhattacharya S, Dodson J, Sinha S, Maiti T, & Zacharewski T (2022). Benchmarking of a Bayesian single cell RNAseq differential gene expression test for dose–response study designs. Nucleic Acids Research, 50(8), e48–e48. 10.1093/NAR/GKAC019 [DOI] [PMC free article] [PubMed] [Google Scholar]; This study introduces scBT, a multiplicity corrected Bayesian multiple group testing method developed specifically for differential gene expression analysis of dose-response scRNA-seq experiments. Nine existing and proposed methods were benchmarked on simulated and real experimental dose-response datasets, and fit-for-purpose methods such as scBT outperformed the standard two group-comparison methods.
- Nault R, Saha S, Bhattacharya S, Sinha S, Maiti T, & Zacharewski T (2023). Single-cell transcriptomics shows dose-dependent disruption of hepatic zonation by TCDD in mice. Toxicological Sciences : An Official Journal of the Society of Toxicology, 191(1), 135–148. 10.1093/toxsci/kfac109 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nguyen QH, Lukowski SW, Chiu HS, Senabouth A, Bruxner TJC, Christ AN, Palpant NJ, & Powell JE (2018). Single-cell RNA-seq of human induced pluripotent stem cells reveals cellular heterogeneity and cell state transitions between subpopulations. Genome Research, 28(7), 1053–1066. 10.1101/GR.223925.117 [DOI] [PMC free article] [PubMed] [Google Scholar]
- NTP Research Report on National Toxicology Program Approach to Genomic Dose-Response Modeling: Research Report 5 [Internet]. (2018). 10.22427/NTP-RR-5 [DOI] [PubMed] [Google Scholar]
- Qiu X, Mao Q, Tang Y, Wang L, Chawla R, Pliner HA, & Trapnell C (2017). Reversed graph embedding resolves complex single-cell trajectories. Nature Methods 2017 14:10, 14(10), 979–982. 10.1038/nmeth.4402 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Robinson MD, McCarthy DJ, & Smyth GK (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 26(1), 139. 10.1093/BIOINFORMATICS/BTP616 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Saelens W, Cannoodt R, Todorov H, & Saeys Y (2019). A comparison of single-cell trajectory inference methods. Nature Biotechnology 2019 37:5, 37(5), 547–554. 10.1038/s41587-019-0071-9 [DOI] [PubMed] [Google Scholar]
- Sikkema L, Ramírez-Suástegui C, Strobl DC, Gillett TE, Zappia L, Madissoon E, Markov NS, Zaragosi LE, Ji Y, Ansari M, Arguel MJ, Apperloo L, Banchero M, Bécavin C, Berg M, Chichelnitskiy E, Chung M. i., Collin A, Gay ACA, … Theis FJ (2023). An integrated cell atlas of the lung in health and disease. Nature Medicine, 29(6), 1563–1577. 10.1038/S41591-023-02327-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Song D, & Li JJ (2021). PseudotimeDE: inference of differential gene expression along cell pseudotime with well-calibrated p-values from single-cell RNA sequencing data. Genome Biology, 22(1), 1–25. 10.1186/S13059-021-02341-Y/FIGURES/8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Squair JW, Gautier M, Kathe C, Anderson MA, James ND, Hutson TH, Hudelle R, Qaiser T, Matson KJE, Barraud Q, Levine AJ, La Manno G, Skinnider MA, & Courtine G (2021). Confronting false discoveries in single-cell differential expression. Nature Communications 2021 12:1, 12(1), 1–15. 10.1038/s41467-021-25960-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stoeckius M, Hafemeister C, Stephenson W, Houck-Loomis B, Chattopadhyay PK, Swerdlow H, Satija R, & Smibert P (2017). Simultaneous epitope and transcriptome measurement in single cells. Nature Methods 2017 14:9, 14(9), 865–868. 10.1038/nmeth.4380 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Street K, Risso D, Fletcher RB, Das D, Ngai J, Yosef N, Purdom E, & Dudoit S (2018). Slingshot: Cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics, 19(1), 1–16. 10.1186/S12864-018-4772-0/FIGURES/5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Svensson V, Teichmann SA, & Stegle O (2018). SpatialDE: identification of spatially variable genes. Nature Methods, 15(5), 343–346. 10.1038/NMETH.4636 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Traag VA, Waltman L, & van Eck NJ (2019). From Louvain to Leiden: guaranteeing well-connected communities. Scientific Reports 2019 9:1, 9(1), 1–12. 10.1038/s41598-019-41695-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wilcoxon F (1945). Individual Comparisons by Ranking Methods. Biometrics Bulletin, 1(6), 80. 10.2307/3001968 [DOI] [Google Scholar]
- Wolf FA, Angerer P, & Theis FJ (2018). SCANPY: large-scale single-cell gene expression data analysis. Genome Biology, 19(1). 10.1186/S13059-017-1382-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xi NM, & Li JJ (2021). Benchmarking Computational Doublet-Detection Methods for Single-Cell RNA Sequencing Data. Cell Systems, 12(2), 176–194.e6. 10.1016/J.CELS.2020.11.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xu C, Lopez R, Mehlman E, Regier J, Jordan MI, & Yosef N (2021). Probabilistic harmonization and annotation of single-cell transcriptomics data with deep generative models. Molecular Systems Biology, 17(1), e9620. 10.15252/MSB.20209620 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Young MD, & Behjati S (2020). SoupX removes ambient RNA contamination from droplet-based single-cell RNA sequencing data. GigaScience, 9(12), 1–10. 10.1093/GIGASCIENCE/GIAA151 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang Q, Caudle WM, Pi J, Bhattacharya S, Andersen ME, Kaminski NE, & Conolly RB (2019). Embracing Systems Toxicology at Single-Cell Resolution. Current Opinion in Toxicology, 16, 49–57. 10.1016/J.COTOX.2019.04.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- (**).Zhang S, Xiangtao LI, Jiecong LIN, Qiuzhen LIN, & Wong KC (2023). Review of single-cell RNA-seq data clustering for cell-type identification and characterization. RNA (New York, N.Y.), 29(5). 10.1261/RNA.078965.121 [DOI] [PMC free article] [PubMed] [Google Scholar]; This study provides a comprehensive review and summarizes the advantages and limitations of existing single-cell RNA-seq data clustering methods. Additionally, preprocessing techniques such as quality control, normalization and dimensionality reduction were also reviewed.
- Zheng GXY, Terry JM, Belgrader P, Ryvkin P, Bent ZW, Wilson R, Ziraldo SB, Wheeler TD, McDermott GP, Zhu J, Gregory MT, Shuga J, Montesclaros L, Underwood JG, Masquelier DA, Nishimura SY, Schnall-Levin M, Wyatt PW, Hindson CM, … Bielas JH (2017). Massively parallel digital transcriptional profiling of single cells. Nature Communications, 8. 10.1038/NCOMMS14049 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zimmerman KD, Espeland MA, & Langefeld CD (2021). A practical solution to pseudoreplication bias in single-cell studies. Nature Communications 2021 12:1, 12(1), 1–9. 10.1038/s41467-021-21038-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
No data was used for the research described in the article.
