Skip to main content
PLOS One logoLink to PLOS One
. 2025 Sep 24;20(9):e0332440. doi: 10.1371/journal.pone.0332440

Understanding and mitigating the impact of ambient mRNA contamination in single-cell RNA-sequencing analysis

Jantarika Kumar Arora 1,2,*, Louisa K James 1, Varodom Charoensawan 2,3,4,5,6,7,*
Editor: Wan-Tien Chiang8
PMCID: PMC12459771  PMID: 40991648

Abstract

Droplet-based single-cell RNA sequencing (scRNA-seq) frequently encounters significant challenges from contamination of cell-free mRNAs, known as “ambient mRNAs”, which can substantially distort single-cell transcriptome data interpretation to a large extent. In this study, we investigate the impact of ambient mRNA contamination on differential gene expression and biological pathway enrichment analyses, using two independent scRNA-seq datasets: ten peripheral blood mononuclear cells (PBMCs) samples from dengue-infected patients and forty-two scRNA-seq samples of human fetal liver tissues. We apply two independent ambient mRNA correction approaches – CellBender (automate correction) and SoupX (using a predefined set of potential ambient mRNA genes). We demonstrate that ambient mRNA transcripts appear among differentially expressed genes (DEGs), subsequently leading to the identification of significant ambient-related biological pathways in unexpected cell subpopulations before ambient mRNA contamination correction. In contrast, after suitable correction, we observe a reduction in ambient mRNA expression levels, resulting in improved DEG identification and leading to the highlight of biologically relevant pathways specific to cell subpopulations. Our study underscores the importance of understanding and applying appropriate corrections for ambient mRNA contamination to enhance the reliability and accuracy of scRNA-seq data analyses, thereby improving the robustness of data interpretation in droplet-based scRNA-seq datasets.

Introduction

Single-cell RNA-sequencing (scRNA-seq) has become a powerful technique for investigating transcriptomic profiles and complex cellular heterogeneity at the single-cell resolution [13]. This technology offers not only profound insights into cellular heterogeneity, but also improves our understanding on the functions of highly complex biological systems in both normal and disease-related physiological contexts [4,5]. Droplet-based scRNA-seq platforms, such as Drop-seq [6], inDrop [7], and Chromium 10x Genomics [8], have been widely implemented in various biological contexts, mainly due to their capacity to capture a large number of individual cells at a relatively low cost per cell [9]. These platforms are also suitable for detecting novel cell types [10,11], as well as identifying cell subpopulations within intricate biological samples [1214].

Despite the several advantages of droplet-based single-cell technologies, one of the important challenges is the contamination of cell-free mRNAs, frequently referred to as “ambient mRNA”. This contamination can significantly confound the biological interpretation of single-cell datasets, as demonstrated in previous studies [1520]. In brain single-nuclei RNA sequencing, for example, previously annotated neuronal cell types were separated by ambient mRNA contamination and immature oligodendrocytes were found to be contaminated with ambient mRNAs [16]. However, after computational removal of this ambient contamination, committed oligodendrocyte progenitor cells (a rare population) were detected, which had not been annotated in most previous adult human brain datasets [16]. This underscores how ambient mRNA contamination can impact cell type annotation. Several computational tools, including SoupX [20], CellBender [17], and DecontX [19] among others, have been developed to estimate and remove ambient mRNAs contamination, subsequently improving the quality of expression matrices and enhancing the expression pattern of cell type-specific marker genes [15,1720]. Previous studies have also demonstrated the impact of ambient mRNA contamination on downstream analyses [16,18]. However, there are still several aspects of ambient mRNA contaminations that remain to be characterised. Despite these advancements, the effects of ambient mRNA correction on certain downstream analyses, including differential gene expression and pathway enrichment analyses, particularly at the subpopulation levels, remains largely unclear.

In this study, we have performed comprehensive analyses to evaluate the impact of ambient mRNA contamination on the biological interpretation of actual biological scRNA-seq datasets. These included re-analyses of a time-course scRNA-seq dataset of the immune cell responses during the acute phase of dengue infection patients [13], and of an integrated dataset of forty-two scRNA-seq samples obtained from human fetal liver tissues [21], to demonstrate our points. We employed two independent ambient mRNA correction tools, CellBender [17] and SoupX [20]. To ensure a more comprehensive assessment of the impact of ambient mRNA contamination on downstream analyses, independent of methodological differences between correction approaches, we provided a predefined set of predicted potential ambient mRNA genes for SoupX correction, whereas CellBender performed automated prediction and correction. We went on to investigate the influence of contaminated ambient mRNAs on downstream analyses, focusing on the identification of differentially expressed genes (DEGs) and biological pathway enrichments within T and B cell subpopulations. Comparing transcriptomic profiles of these immune cell subpopulations before and after ambient mRNA correction revealed an improvement in DEG identification, subsequently leading to the emergence of biologically relevant pathways specific to cell subpopulations after correction. Overall, our study highlights the critical importance of addressing ambient mRNA contamination to enhance reliability of scRNA-seq data interpretation and downstream biological insights.

Materials and methods

Single-cell RNA-sequencing datasets

The raw sequencing reads of the single-cell RNA-seq datasets used in this study include the peripheral blood mononuclear cell (PBMC) datasets, which consists of eight single-cell experiments from dengue patients and one healthy donor [13], available through the ArrayExpress repository: E-MTAB-9467. Another healthy sample was obtained from the 10x Genomics website (4k PBMCs from a healthy donor, Single Cell Gene Expression Dataset by Cell Ranger 2.1.0, 10x Genomics, 2017, November 8, https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/pbmc4k). In addition, the raw sequencing data of forty-two single-cell experiments of human fetal liver tissues [21] were obtained from the ArrayExpress database, under the accession number: E-MTAB-7407. The raw and filtered 10x Genomics species-mixing dataset, which contains a mixture of human HEK293T and mouse NIH3T3 cells, is available at https://www.10xgenomics.com/datasets/10-k-1-1-mixture-of-human-hek-293-t-and-mouse-nih-3-t-3-cells-3-v-3-1-3-1-standard-6-0-0, retrieved on 20 June 2025.

Pre-processing and quality control of scRNA-seq data

Data pre-processing.

All datasets used in this study were processed using the same consistent pipeline described as follows. Raw FASTQ files were aligned and quantified using the CellRanger Single-Cell Software Suite (version 8.0.1) and the reference human genome GRCh38-2024-A (10x Genomics, USA). Standard preprocessing steps, including normalization of gene expression levels, scaling, clustering, and dimensionality reduction, of individual single-cell data was carried out using Seurat V.5.2.1 [22]. Expression levels of genes in each cell were normalised using the LogNormalize approach available from the NormalizaData function, where the unique molecular identifier (UMI) counts of each gene were divided by the total number of UMIs per cell, then multiplied by a scale factor of 104, and subsequently log-transformed. Cell clusters were identified using the FindClusters function from the Seurat package (V.5.2.1) [22] with the default settings. Dimensionality reduction was performed using the Uniform Manifold Approximation and Projection (UMAP) method via the RunUMAP function, with the first 10 principal components (PCs). Cells with the total mitochondrial gene expression exceeding 10% were excluded. Doublets were detected and discarded using DoubletFinder [23] with the default settings.

Ambient mRNA correction.

To correct the expression levels of ambient mRNA contaminations, CellBender (v.0.3.0) [17] and SoupX (v1.6.2) [20] were applied using the default settings for all libraries, except for the 10x Genomics species-mixing dataset, where contamination fraction was estimated automatically using autoEstCont with the parameters: tfidfMin = 0.01, soupQuantile = 0.8, and forceAccept = TRUE. The raw and filtered gene-barcode matrices were used as inputs to estimate the expression profiles of ambient mRNAs. To enhance the accuracy of estimating the contamination fraction in individual cells, we incorporated a curated set of genes that were not typically expressed by cells of a certain type, along with clustering information. Specifically, for the single-cell dengue dataset, we included a set of immunoglobulins (Ig) genes, whereas for the human fetal liver dataset, a set of hemoglobin (Hb) genes was provided.

Data integration and normalisation

After quality control steps described above, individual samples were integrated using the SCTransform v2 normalisation approach from Seurat V.5.2.1 [22,24], with the default settings. Cell clustering and UMAP dimensionality reduction were then performed using FindClusters and RunUMAP, using the first 30 PCs. Expression levels of genes in each cell were normalised using the NormalizaData function, as mentioned above.

Cell type annotation and subpopulation analysis

Cell type annotation was performed using Azimuth (v.0.5.0) [25]. For the dengue datasets, we applied the “Human - PBMC” reference [25], while the “Human-Liver” reference [2630] was used for fetal tissue datasets. T and B cell populations were selected based on the “celltype.l1” and “celltype.l2” annotation level from Azimuth [25]. Additionally, we exclude non-T and non-B cells using known canonical marker genes (S1 Table) for T cells and the annotations as in the original article [21] for B cells where relevant to enhance the accuracy of downstream analyses at subpopulation levels (Figs 3 and 4).

Differential gene expression analysis

A total of 38,606 genes were tested for differential gene expression (DEG) analysis using the Wilcoxon rank sum test, implemented in the FindAllMarkers function of Seurat V.5.2.1 [22]. Genes were considered differentially expressed using Seurat’s default parameters – specifically, they were expressed in at least 10% of cells in either group and had a minimum log2 fold-change threshold of 0.1. DEGs were defined as those with an adjusted p-value < 0.01 after Bonferroni correction for multiple testing (also using default settings). These default parameters were used for the initial identification of DEGs and were not intended to define biologically meaningful expression changes. The total numbers of DEGs passing these cutoffs across datasets are listed in S2 Table.

Pathway analyses

g:Profiler2 (v.0.2.3) [31] was applied to systematically assess biological processes using human reference genes from GRCh38.p14 and focusing on Gene Ontology Biological Process (GO:BP). To reduce the false positives, the false discovery rate (FDR) approach was applied for multiple testing correction, and adjusted p-values < 0.05 were considered statistically significant. For GO:BP analyses, DEGs were selected by ranking genes based on average log2 fold-change (avg_log2FC), without applying a fixed fold-change threshold. This rank-based approach priorities the most biologically distinct genes and avoids arbitrarily strict cutoffs, which may be too stringent — particularly when comparing results before and after ambient mRNA correction. The top 20 DEGs (ranked by avg_log2FC) were selected in the dengue datasets and for each seurat cluster in the 10x Genomics species-mixing dataset (10x Genomics; https://www.10xgenomics.com/datasets/10-k-1-1-mixture-of-human-hek-293-t-and-mouse-nih-3-t-3-cells-3-v-3-1-3-1-standard-6-0-0, retrieved on 20 June 2025). In the human fetal tissues, the top 2000 DEGs of “pro B cells” were selected. To visualise unique and overlapping GO:BP terms before and after ambient mRNA correction, Venn diagrams were constructed using InteractiVenn [32].

Results and discussion

Evaluating the impact of ambient mRNA contamination in the peripheral blood mononuclear cell samples

To demonstrate the presence of ambient mRNA contamination and its impact on downstream analysis, we first utilised a publicly available peripheral blood mononuclear cell (PBMC) dataset [13]. The dataset exhibited contamination from ambient mRNAs or background noise, as evidenced by the presence of nonzero counts of known marker genes in unexpected cell types [17,19,20] (S1 Fig in S1 File). After performing the quality control steps, we assigned immune cell types based on cell type annotations from Azimuth [25] using the “Human - PBMC” reference [25], supplemented by established marker genes where relevant (Fig 1A, S1 Table, and S2 Fig in S1 File). Major PBMC populations, including CD8 T cells, NK cells, B cells and Plasma cells (PCs), exhibited canonical marker gene expression consistent with their identity (Fig 1A, S1 Table, and S2 Fig in S1 File).

Fig 1. Expression of B cell-related genes in non-B cell populations.

Fig 1

(A). Uniform Manifold Approximation and Projection (UMAP) plot showing single-cell transcriptome profiles of peripheral blood mononuclear cells (PBMCs) of the dengue dataset [13], colored by cell types annotated using Azimuth, supplemented by canonical cell-type marker genes where relevant (S1 Table and S2 Fig in S1 File). (B). UMAP feature plots demonstrating the relative transcription levels of T cell marker genes (first panel), B cell and plasma cell (PC) marker genes (second panel), and B cell-related genes (third and fourth panels). Cell types are annotated using Azimuth, supplemented by canonical cell-type marker genes (S1 Table) where relevant. pDCs = plasmacytoid dendritic cells.

Based on the expression of known canonical marker genes, we observed relatively high expression levels of the T cell markers, CD3E (encoding the CD3-epsilon polypeptide to form the T cell receptor complex) and TRAC (T cell Receptor Alpha Constant) in annotated T cells, with average expression ranging from 1.29–1.43 and up to 75% of T cells expressing the genes, compared to other clusters (average expression 0.03–0.82, < 54% expressing cells) (Fig 1B, S3 Table, and S2 Fig in S1 File). Similarly, B cell lineage genes MS4A1 (B-lymphocyte-specific membrane protein) and CD79A (B-cell antigen receptor complex-associated protein) exhibited higher expression levels (1.26–2.10) and were expressed in > 76.23% of B cells and PCs, compared to other cell types (average expression < 0.14 and < 16% expressing cells) (Fig 1B, S4 Table, and S2 Fig in S1 File). In contrast, certain immunoglobulin (Ig) genes (referred to as “B cell-related genes”, herein), comprising IGKC, IGHG1, IGHG4, and JCHAIN, were expressed across multiple populations at relatively low expression levels (0.45–3.33) and in up to 99% of these cells expressing the genes, when compared to B cells and PCs expected to express these genes (average expression 1.06–6.08 and 62–100% expressing cells) (Fig 1B, S4 Table, and S2 Fig in S1 File).

For droplet-based scRNA-seq technology, a small amount of cell-free mRNA molecules can be distributed into droplets and subsequently sequenced alongside mRNAs from intact single cells [17,19,20], resulting in non-zero molecule counts within droplets containing cell-free mRNAs [20]. The cell-free mRNA molecules, commonly known as “ambient mRNAs”, are likely derived from cells that have undergone lysis, stress, or apoptosis during the experiment [17]. This can subsequently introduce several challenges for the analyses, including known marker genes of certain cell types being observed in other unexpected cell types [20].

Appropriate correction of ambient mRNA levels reduced pervasive expression of marker genes

Having observed substantial levels of ambient mRNA contamination in the selected scRNA-seq dataset (Fig 1, S2S4 Tables, and S1-S2 Figs in S1 File), we implemented ambient mRNA correction using two independent methods: CellBender [17] and SoupX [20]. These tools estimate the contamination fraction of ambient mRNAs within individual cells and subsequently adjust the transcription levels in expression matrices. CellBender performs automated prediction and correction of ambient mRNA contamination, whereas SoupX is capable of either automated or manual ambient mRNA correction, allowing for fine-turning of the detection process [17,20]. Typically, known marker genes of specific cell types being expressed at relatively low levels across other cell types can frequently be considered ambient mRNA contamination [17,20]. In this particular PBMC dataset, we specified Ig genes as potential contaminating genes for estimating contamination fractions with SoupX. Evidently, Ig genes were indeed identified as potential ambient mRNA contamination among the top 10 genes predicted by SoupX (S5 Table). Moreover, they exhibited relatively low expression levels across multiple cell types (Fig 1, S2S4 Tables, and S1–S2 Figs in S1 File).

After applying the ambient mRNA correction, we compared gene transcription levels to those before correction (Fig 2; see S3 Fig for SoupX in S1 File). The transcription levels of canonical marker genes in corresponding cell types, including selected T and B cell markers, remained consistently high and relatively unchanged after correction (S6 Table and S4 Fig in S1 File) with both CellBender (S6 Table and S5 Fig in S1 File) and SoupX (S6 Table and S6 Fig in S1 File). Specifically, the average expression levels of these genes across all cells were 0.36 before correction, 0.34 after CellBender and 0.35 after SoupX corrections (S6 Table). This suggests the robustness of the ambient mRNA correction step in accurately preserving the expression patterns of well-established cell type markers.

Fig 2. Expression profiles of ambient mRNAs before and after correction.

Fig 2

(A). UMAP plot representing the single-cell transcriptome profiles of PBMCs before (left panel) and after (right panel) ambient mRNA correction using CellBender [17] (refer to S3 Fig in S1 File for results with SoupX [20]). Cell types are annotated using Azimuth [25] and supplemented with established canonical marker genes where relevant (S1 Table and S2 Fig in S1 File). (B). Violin plots showing the normalised transcription levels of the B cell-related genes in each of the non-B cell and B cell populations (comprising B cells and PCs), comparing between before and after ambient mRNA correction using CellBender (see S3 Fig for SoupX in S1 File).

Conversely, after the correction, we observed significant decreases in the transcription levels of B cell-related genes in “non-B cell” populations following correction with both CellBender and SoupX (Figs 2A-2B, see S3 Fig for SoupX, and S7 Fig in S1 File). The average expression levels of B cell-related genes in “non-B cells” dropped from 0.47 before correction to 0.07 after CellBender and 0.05 after SoupX correction (S7 Table). These results suggest that the correction process effectively removed the expression signals originating from ambient mRNAs. Additionally, we observed a reduction in the percentage of cells expressing these genes after correction (S8 Table). Importantly, the transcription levels of these B cell-related genes in “B cells” (comprising B cells and PCs here) remained unchanged before and after ambient mRNA correction (Fig 2B, see S3 Fig for SoupX in S1 File), indicating that the correction process successfully retained the intrinsic gene expression patterns specific to B cells.

Enhancement of differential gene expression and biological pathway enrichment in T cell subpopulations after ambient mRNA correction

As we observed the presence of ambient mRNA contamination across T cells in the PBMC dataset used to showcase our points here (Figs 12, and S1-S7 Figs in S1 File), we then investigated the extent to which the ambient mRNA correction impacts subsequent bioinformatic analyses, specifically focusing on the identification of differentially expressed genes (DEGs) and biological pathway enrichments. Following the integration of ten scRNA-seq profiles from the same representative study [13], we observed no apparent batch effects in T cells (S8 Fig in S1 File). To specifically assess the impact of ambient mRNA contamination on T cell subpopulations without confounding signals from other immune cell types, we further extracted T cell subpopulations based on the “celltype.l2” annotation level from Azimuth. Non-T cell populations were then excluded using established canonical marker genes (S1 Table and S2 Fig in S1 File). The number of annotated T cells remained relatively comparable before and after ambient mRNA correction (S9 Table and S9 Fig in S1 File), suggesting that contamination did not impact the annotation.

While T cell subpopulations expressed T cell markers, we also observed widespread expression of Ig genes, including IGKC, IGHA1, IGHG4, and IGLC2, across almost all annotated T cell subsets, despite these genes being specific to B cells and PCs (S10 Fig in S1 File). After applying ambient mRNA correction, the expression of these Ig genes was noticeably reduced across T cells (S11 Fig for CellBender and S12 Fig for SoupX in S1 File), with average expression levels decreasing from 0.54 before correction, to 0.10 after CellBender and 0.06 after SoupX (S10 Table), suggesting effective removal of background contamination. Specifically, following correction using CellBender, which performs automated ambient mRNA removal, IGKC and IGHG4 were still observed but were primarily restricted to “CD4 Proliferating” and “CD8 Proliferating” T cells (S11 Fig in S1 File and S10 Table). In contrast, after correction using SoupX, in which Ig genes were explicitly provided as predefined potential ambient mRNA genes, expression levels of these Ig genes were reduced to low or near-zero counts across T cells (S12 Fig in S1 File and S10 Table).

We further investigated how ambient mRNA contamination influences the identification of DEGs within T cell subpopulations. Without the ambient mRNA correction, fifteen B cell-related genes appeared among the top 20 DEGs in at least one of the three biological conditions (acute, convalescent, or healthy control), when compared to the other two conditions, across a wide range of well-characterised T cell subpopulations (Figs 3A; left panel). After correction, Ig genes were still present but at markedly lower expression levels in most T cell subpopulations compared to before correction, as shown using the same color bar (Fig 3A and S13 Fig in S1 File). The reduction in Ig gene expression levels was observed with both the results from the CellBender and SoupX pipelines but was slightly more pronounced in the SoupX result in this case (S13 Fig in S1 File). Additionally, nine Ig genes remained in the top 20 DEGs after CellBender correction (S13 Fig in S1 File; middle panel) and eleven Ig genes after SoupX (S13 Fig in S1 File; lower panel), out of the fifteen Ig genes previously identified as DEGs (S13 Fig in S1 File; upper panel). Accordingly, six Ig genes – IGHM, IGHG3, IGHA2, IGLC2, IGLC3, and IGKV1–18 – after CellBender correction and four – IGHM, IGHA2, IGLC3, and IGKV1–18 – after SoupX correction, were effectively removed from the top 20 DEGs in T cells subpopulations (S13 Fig in S1 File).

Fig 3. Enhancement of T cell-specific DEGs and biological pathways after ambient mRNA contamination correction.

Fig 3

(A). Heatmap showing the relative transcription levels of Ig genes among the top 20 DEGs ranked by average log2 fold-change (avg_log2FC) in each biological condition (acute, convalescent, or healthy control) compared to the other two, across T cell subpopulations annotated using Azimuth, both before (left panel) and after (right panel) ambient mRNA correction with CellBender (see S14 Fig for SoupX in S1 File). (B). The Gene Ontology Biological Processes (GO:BPs) of the DEGs from (A) before (left panel) and after (right panel) ambient mRNA correction with CellBender. Pathways labeled in red are B cell-specific pathways that have been previously reported in dengue infection study [13] (see S16 Fig for SoupX in S1 File).

We next explored the impact of ambient mRNA contamination on pathway enrichment analysis, based on Gene Ontology Biological Processes (GO:BPs) of the top 20 DEGs (ranked by average log2 fold-change) in each biological condition (acute, convalescent, or healthy control) across T cell subpopulations (Fig 3B). While the number of input genes for pathway analysis was similar before and after ambient mRNA correction, we identified 17 significant GO:BP terms prior to correction, all of which were influenced by Ig genes (contributing from 34% to 100%; S11 Table). In contrast, after correction, the number of significant GO:BP terms increased substantially – 191 GO:BPs in the CellBender result (S12 Table and S15 Fig in S1 File), and 46 significant GO:BP terms in the SoupX result (S13 Table and S16 Fig in S1 File; top panel). The low number of significant biological pathways before correction is likely due to the presence of non-specific genes, such as Ig genes from B cells and PCs, which dominate the pathway analysis and hence obscure relevant biological signals. Conversely, ambient mRNA correction adjusted the expression levels and rankings of DEGs, allowing the detection of more biologically relevant genes and yielding a marked increase in statistically significant GO:BP enrichments.

Interestingly, out of the 17 GO:BP terms identified before correction, 13 were uniquely observed prior to ambient mRNA removal, compared to after correction with the CellBender result (Fig 3B; left panel and S15 Fig in S1 File). These 13 GO:BP terms indeed represented significant biological pathways that have been shown to be B-cell specific pathways and upregulated during acute dengue infection [13], including GO:0016064 (immunoglobulin mediated immune response), GO:0019724 (B cell mediated immunity), GO:0006959 (humoral immune response), GO:0002460 (adaptive immune response based on somatic recombination of immune receptors built from immunoglobulin superfamily domains), and GO:0050853 (B cell receptor signaling pathway) (Fig 3B; left panel). However, this was not the case after the correction using SoupX, where all 17 GO:BPs were identified. Several GO terms, including the B-cell specific pathways mentioned above, showed lower statistical significance (S16 Fig in S1 File). This suggests that the correction effectively reduced the impact of ambient mRNA contamination, which likely contributed to inflated statistical significance of certain pathways before correction.

We then observed four GO:BP terms – GO:0050896 (response to stimulus), GO:0098542 (defense response to other organism), GO:0006955 (immune response), and GO:0042742 (defense response to bacterium) – that were identified both before and after the CellBender correction (S15 Fig in S1 File). Remarkably, while these GO:BP terms remained consistent, the composition of the intersection genes contributing to them changed significantly (S15 Fig in S1 File). Specifically, before correction, these pathways were dominated by Ig genes, likely due to ambient mRNA contamination originating from B cells and PCs. After the correction using CellBender, however, these pathways were instead associated with contributions from interferon-stimulated genes (ISGs) and interferon-induced gene products (e.g., ISG15, IFI44L, IFI27, IFIT2, IFIT3), which were previously undetected (S15 Fig in S1 File). Notably, these genes have been reported to be highly expressed in specific T cell subsets during acute dengue infection [33,34].

Among the 187 uniquely identified GO:BPs after CellBender correction, we observed biological pathways associated with cell cycle processes, cell division, and DNA repair (Fig 3B), which have been identified in HLA-DR+ CD38+ CD8 T cells during dengue fever [35]. Additionally, we found type I interferon-related pathways (Fig 3B), which have been shown to be expressed in T cells [13,34,36]. These results suggest that ambient mRNA correction effectively reduced the confounding influence of Ig genes, thereby revealing a more relevant immune signature in T cell subpopulations during acute dengue infection. We noted that the choice of ambient mRNA correction tools, such as DecontX [19], FastCAR [15], and scAR [37], may yield variable results in terms of adjusted expression valves. This variability may influence the outcomes of downstream analyses, including differential gene expression (e.g., log2 fold-change values) and pathway enrichment analyses (e.g., p-values or significant pathways), as different tools make different assumptions and use different approaches for evaluating and removing ambient mRNA contamination.

Improving differential gene expression and biological pathway enrichment in B cell subpopulations after ambient mRNA correction

We then asked whether the improvement in DEG identification and pathway enrichment after ambient mRNA correction could be seen in independent dataset and/or different cell types. Here, we obtained the scRNA-seq data of forty-two samples derived from human fetal liver tissues [21]. After data integration with no batch effects (S17 Fig in S1 File), hemoglobin (Hb) genes were identified as potential ambient mRNA contaminants, ranking among the top 5 genes predicted by SoupX (S14 Table). Consistently, we observed a large number of cells expressing Hb genes (S18 Fig in S1 File), which are known markers of erythroid (red blood cell) lineage, indicating contamination from Hb transcripts. We employed CellBender [17] and SoupX [20], using Hb genes as predefined potential ambient mRNA contamination gene list to estimate the contamination fractions in individual cells. Following the same analytic pipeline described above, we extracted B cell subpopulations as annotated by Azimuth, excluding non-B cells based on the annotation from the original study [21]. Similar to T cell subsets, the number of annotated B cell subpopulations remained relatively comparable before and after ambient mRNA correction (S9 Table and S19 Fig in S1 File). Here, we focused on B cell subsets because the contamination was primarily driven by Hb genes, which are markers of the erythroid lineage. Since B cells and erythroid cells originate from distinct progenitor lineages, this allowed us to assess the impact of ambient mRNA contamination from an unrelated cell type and evaluate the effectiveness of the correction.

Before ambient mRNA correction, nine Hb genes, namely HBQ1, HBM, HBG1, HBG2, HBA1, HBA2, HBZ, HBB, and HBD, were identified as DEGs across all annotated B cell subpopulations (Fig 4A; left panel and S20 Fig in S1 File; upper panel). After correction, Hb genes remained detectable but exhibited lower expression levels and were less widespread across B cell subsets (as shown using the same scale) in both CellBender (Fig 4A) and SoupX (S20 Fig in S1 File) results, with a more pronounced reduction in SoupX (S20 Fig in S1 File). Following correction with CellBender, eight of previously identified Hb genes remained among the DEGs, except for HBQ1 (S20 Fig in S1 File; middle panel). On the contrary, after correction with SoupX, where Hb genes were predefined as potential ambient mRNAs, only three Hb genes (HBQ1, HBA1, and HBD) were identified among the DEGs (S20 Fig in S1 File; lower panel).

Fig 4. Enhancement of B cell-specific DEGs and biological pathways after ambient mRNA contamination correction.

Fig 4

(A). Heatmap demonstrating the relative expression levels of Hb genes among the DEGs of each B cell subpopulation, as compared to the rest, comparing between before (left panel) and after (right panel) ambient mRNA correction with CellBender (see S21 Fig for SoupX in S1 File). B cell subpopulations were annotated using Azimuth, excluding non-B cells based on the annotation from the original study [21]. (B). The Gene Ontology Biological Processes (GO:BPs) of the DEGs from top 2000 DEGs ranked by average log2 fold-change (avg_log2FC) in pro-B cells before (upper panel) and after (lower panel) ambient mRNA correction with CellBender (see S21 and S22 Figs for SoupX in S1 File).

We further investigated the impact of ambient mRNA contamination on pathway analyses, focusing on GO:BPs derived from the top 2000 DEGs ranked by average log2 fold-change (avg_log2FC) in pro-B cells. Similar to PBMC-derived T cell subpopulations, we observed that biological pathways driven by Hb genes were uniquely identified before ambient mRNA correction but were absent after correction with both CellBender (Fig 4B and S15 Table) and SoupX (S22 Fig in S1 File and S16 Table). These pathways include GO:0007599 (hemostasis), GO:0050817 (coagulation), GO:0042743 (hydrogen peroxide metabolic process), GO:0030168 (platelet activation), GO:0015670 (carbon dioxide transport), and GO:0070527 (platelet aggregation) (Fig 4B, S22 Fig in S1 File, and S15-S16 Tables). The presence of these pathways suggests that ambient Hb gene contamination can lead to misleading enrichment of erythroid- and platelet-associated processes, which are biologically unrelated to pro-B cells. Notably, after ambient mRNA removal, these pathways were no longer detected (Fig 4B, S22 Fig in S1 File, and S15-S18 Tables), supporting the effectiveness of ambient contamination correction in eliminating improper signals and refining biological pathway interpretation. Pro-B cells are lineage-committed precursors of mature B cells, in which V(D)J recombination occurs to generate a functional B cell receptor, followed by positive selection. Consequently, pro-B cells were enriched in genes associated with recombination and DNA repair/stability (e.g., HMGB1, PCNA, DDX11, and MSH2), as well as genes involved in proliferation and cell survival (AKT1, PDE3A, and TNFSF13B). Importantly, these genes were observed in biological pathways that emerged only after correction in both CellBender (Fig 4B and S17 Table) and SoupX (S22 Fig in S1 File and and S18 Table), further supporting that ambient mRNA removal improves the accuracy of downstream functional analyses.

To further validate our findings, we leveraged a publicly available cross-species dataset from 10x Genomics (10-k-1-1-mixture-of-human-hek-293-t-and-mouse-nih-3-t-3-cells-3-v-3-1-3-1-standard-6-0-0, retrieved on 20 June 2025), in which human cells were intentionally contaminated with mouse transcripts, providing a well-defined “ground-truth” for cross-species ambient mRNA contamination (S23 Fig in S1 File). Similarly to observations in our T and B cell subpopulations, the contamination of mouse transcripts led to numerous false-positive DEGs and contributed to the presence of mouse transcripts in pathway enrichment analyses when no correction was applied (S24-S25 Figs in S1 File). After ambient mRNA correction, the cross-species contamination signals were remarkably reduced, as evidenced by using the same scales before and after correction (S24 Fig in S1 File), and no mouse genes were present in the significant biological pathways (S19 Table). Additionally, one GO:BP term (GO:0002181, cytoplasmic translation) was shared between before and after contamination correction. Prior to correction, genes in this GO:BP term were exclusively (100%) mouse genes, whereas after correction, this pathway was composed of human genes (S25 Fig in S1 File).

Taken together, our study highlights the practical applications of mitigating ambient mRNA contamination, including improved DEG identification and enhanced specificity of cell type-specific gene expression profiles and biological pathways in biological scRNA-seq datasets. Systematic studies using simulated or controlled experimental settings – for instance, those mimicking ambient contamination arising from cell death (e.g., abnormally elevated housekeeping, ribosomal, or mitochondrial genes), or overloaded droplets – represent a promising direction for further research to investigate the impact of ambient mRNA on the downstream analyses.

Conclusion

Our study demonstrates the substantial improvements in data quality and biological insight achieved by addressing ambient mRNA contamination in single-cell transcriptome data. Specifically, we have shown that correcting ambient contamination enhances the identification of differentially expressed genes and refines biological pathway enrichment analyses, leading to more accurate interpretations of cell type-related functions. Our work emphasises the critical importance of appropriate ambient mRNA contamination correction in scRNA-seq preprocessing to enhance the robustness of biological interpretation within complex biological systems in single-cell RNA-seq datasets.

Supporting information

S1 Table. Know canonical markers.

(XLSX)

pone.0332440.s001.xlsx (9.4KB, xlsx)
S2 Table. The total numbers of DEGs.

(XLSX)

pone.0332440.s002.xlsx (9.9KB, xlsx)
S3 Table. Average expression levels and percentages of cells expressing known canonical marker genes across annotated cell types, related to Fig 1B (dengue infection datasets).

(XLSX)

pone.0332440.s003.xlsx (12.7KB, xlsx)
S4 Table. Average expression levels and percentages of cells expressing known canonical marker genes across annotated cell types, related to Fig 1B (dengue infection datasets).

(XLSX)

pone.0332440.s004.xlsx (13.2KB, xlsx)
S5 Table. Top 50 potential ambient mRNA genes predicted by SoupX (dengue infection-dataset).

(XLSX)

pone.0332440.s005.xlsx (6.2KB, xlsx)
S6 Table. Average expression levels of known canonical marker genes across annotated cell types before and after correction, related to Fig 2 (dengue infection datasets).

(XLSX)

pone.0332440.s006.xlsx (17.8KB, xlsx)
S7 Table. Average expression levels of B cell-related genes across B and Non-B cells before and after correction, related to Fig 2 (dengue infection datasets).

(XLSX)

pone.0332440.s007.xlsx (13.6KB, xlsx)
S8 Table. Number and percentage of cells expressing Ig genes.

(XLSX)

pone.0332440.s008.xlsx (24.7KB, xlsx)
S9 Table. Numbers of annotated T and B cell subpopulations before and after correction. Related to Figs 34.

(XLSX)

pone.0332440.s009.xlsx (18.8KB, xlsx)
S10 Table. Average expression levels of immunoglobulin genes across T cell suppopulations before and after correction, related to Fig 3 (dengue infection datasets).

(XLSX)

pone.0332440.s010.xlsx (16.2KB, xlsx)
S11 Table. Significant 17 GO:BPs in T subsets before correction (dengue infection-dataset).

(XLSX)

pone.0332440.s011.xlsx (10.1KB, xlsx)
S12 Table. Significant 191 GO:BPs in T subsets CellBender correction (dengue infection-dataset).

(XLSX)

pone.0332440.s012.xlsx (29.7KB, xlsx)
S13 Table. Significant 46 GO:BPs in T subsets SoupX correction (dengue infection-dataset).

(XLSX)

pone.0332440.s013.xlsx (12.3KB, xlsx)
S14 Table. Top 50 potential ambient mRNA genes predicted by SoupX (fetal liver tissue-dataset).

(XLSX)

pone.0332440.s014.xlsx (6.2KB, xlsx)
S15 Table. Significant 279 unique GO:BPs in Pro-B cells before correction, compared to CellBender (fetal liver tissue-dataset).

(XLSX)

pone.0332440.s015.xlsx (41.7KB, xlsx)
S16 Table. Significant 365 unique GO:BPs in Pro-B cells before correction, compared to SoupX (fetal liver tissue-dataset).

(XLSX)

pone.0332440.s016.xlsx (52.4KB, xlsx)
S17 Table. Significant 60 unique GO:BPs in Pro-B cells CellBender correction (fetal liver tissue-dataset).

(XLSX)

pone.0332440.s017.xlsx (14.9KB, xlsx)
S18 Table. Significant 47 unique GO:BPs in Pro-B cells SoupX correction (fetal liver tissue-dataset).

(XLSX)

pone.0332440.s018.xlsx (12.3KB, xlsx)
S19 Table. Significant 82 GO BPs in a species-mixing dataset (10x Genomics) after SoupX correction.

(XLSX)

pone.0332440.s019.xlsx (20.4KB, xlsx)
S1 File. Supporting Figs S1–S25.

(PDF)

pone.0332440.s020.pdf (6.8MB, pdf)

Acknowledgments

This research utilised Queen Mary’s Apocrita HPC facility, supported by QMUL Research-IT (http://doi.org/10.5281/zenodo.438045). We acknowledge the ITS Research Team at QMUL for their support. Resources for data processing were also provided by Mahidol University and the Office of the Ministry of Higher Education, Science, Research and Innovation under the Reinventing University project: the Center of Excellence in AI-Based Medical Diagnosis (AI-MD) sub-project. We thank Sarintip Nguantad for running CellBender to preprocess single-cell RNA-seq data.

Data Availability

Single-cell RNA-sequencing datasets: The raw sequencing reads of the single-cell RNA-seq datasets used in this study include the peripheral blood mononuclear cell (PBMC) datasets, which consists of eight single-cell experiments from dengue patients and one healthy donor [13], available through the ArrayExpress repository: E-MTAB-9467. Another healthy sample was obtained from the 10x Genomics website (4k PBMCs from a healthy donor, Single Cell Gene Expression Dataset by Cell Ranger 2.1.0, 10x Genomics, 2017, November 8, https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/pbmc4k). In addition, the raw sequencing data of forty-two single-cell experiments of human fetal liver tissues [21] were obtained from the ArrayExpress database, under the accession number: E-MTAB-7407. The raw and filtered 10x Genomics species-mixing dataset, which contains a mixture of human HEK293T and mouse NIH3T3 cells, is available at https://www.10xgenomics.com/datasets/10-k-1-1-mixture-of-human-hek-293-t-and-mouse-nih-3-t-3-cells-3-v-3-1-3-1-standard-6-0-0, retrieved on 20 June 2025.

Funding Statement

This project is funded by the mid-career researcher grant from National Research Council of Thailand (NRCT) and Mahidol University (N42A670557) through VC. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Haque A, Engel J, Teichmann SA, Lönnberg T. A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications. Genome Med. 2017;9(1):75. doi: 10.1186/s13073-017-0467-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Hwang B, Lee JH, Bang D. Single-cell RNA sequencing technologies and bioinformatics pipelines. Exp Mol Med. 2018;50(8):1–14. doi: 10.1038/s12276-018-0071-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Schäfer PSL, Dimitrov D, Villablanca EJ, Saez-Rodriguez J. Integrating single-cell multi-omics and prior biological knowledge for a functional characterization of the immune system. Nat Immunol. 2024;25(3):405–17. doi: 10.1038/s41590-024-01768-2 [DOI] [PubMed] [Google Scholar]
  • 4.Huang D, Ma N, Li X, Gou Y, Duan Y, Liu B, et al. Advances in single-cell RNA sequencing and its applications in cancer research. J Hematol Oncol. 2023;16(1):98. doi: 10.1186/s13045-023-01494-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Potter SS. Single-cell RNA sequencing for the study of development, physiology and disease. Nat Rev Nephrol. 2018;14(8):479–92. doi: 10.1038/s41581-018-0021-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Macosko EZ, Basu A, Satija R, Nemesh J, Shekhar K, Goldman M, et al. Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets. Cell. 2015;161(5):1202–14. doi: 10.1016/j.cell.2015.05.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Zilionis R, Nainys J, Veres A, Savova V, Zemmour D, Klein AM, et al. Single-cell barcoding and sequencing using droplet microfluidics. Nat Protoc. 2017;12(1):44–73. doi: 10.1038/nprot.2016.154 [DOI] [PubMed] [Google Scholar]
  • 8.Zheng GXY, Terry JM, Belgrader P, Ryvkin P, Bent ZW, Wilson R, et al. Massively parallel digital transcriptional profiling of single cells. Nat Commun. 2017;8:14049. doi: 10.1038/ncomms14049 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Ziegenhain C, Vieth B, Parekh S, Reinius B, Guillaumet-Adkins A, Smets M, et al. Comparative Analysis of Single-Cell RNA Sequencing Methods. Mol Cell. 2017;65(4):631-643.e4. doi: 10.1016/j.molcel.2017.01.023 [DOI] [PubMed] [Google Scholar]
  • 10.Ma J, Tran G, Wan AMD, Young EWK, Kumacheva E, Iscove NN, et al. Microdroplet-based one-step RT-PCR for ultrahigh throughput single-cell multiplex gene expression analysis and rare cell detection. Sci Rep. 2021;11(1):6777. doi: 10.1038/s41598-021-86087-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Pan X w, Zhang H, Xu D, Chen J x, Chen W j, Gan S s. Identification of a novel cancer stem cell subpopulation that promotes progression of human fatal renal cell carcinoma by single-cell RNA-seq analysis. International Journal of Biological Sciences. 2020;16(16):3149. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Arora JK, Matangkasombut P, Charoensawan V, Opasawatchai A, DENFREE Thailand. Single-cell RNA sequencing reveals the expansion of circulating tissue-homing B cell subsets in secondary acute dengue viral infection. Heliyon. 2024;10(10):e30314. doi: 10.1016/j.heliyon.2024.e30314 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Arora JK, Opasawatchai A, Poonpanichakul T, Jiravejchakul N, Sungnak W, DENFREE Thailand, et al. Single-cell temporal analysis of natural dengue infection reveals skin-homing lymphocyte expansion one day before defervescence. iScience. 2022;25(4):104034. doi: 10.1016/j.isci.2022.104034 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Kinker GS, Greenwald AC, Tal R, Orlova Z, Cuoco MS, McFarland JM, et al. Pan-cancer single-cell RNA-seq identifies recurring programs of cellular heterogeneity. Nat Genet. 2020;52(11):1208–18. doi: 10.1038/s41588-020-00726-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Berg M, Petoukhov I, van den Ende I, Meyer KB, Guryev V, Vonk JM, et al. FastCAR: fast correction for ambient RNA to facilitate differential gene expression analysis in single-cell RNA-sequencing datasets. BMC Genomics. 2023;24(1):722. doi: 10.1186/s12864-023-09822-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Caglayan E, Liu Y, Konopka G. Neuronal ambient RNA contamination causes misinterpreted and masked cell types in brain single-nuclei datasets. Neuron. 2022;110(24):4043-4056.e5. doi: 10.1016/j.neuron.2022.09.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Fleming SJ, Chaffin MD, Arduini A, Akkad A-D, Banks E, Marioni JC, et al. Unsupervised removal of systematic background noise from droplet-based single-cell experiments using CellBender. Nat Methods. 2023;20(9):1323–35. doi: 10.1038/s41592-023-01943-7 [DOI] [PubMed] [Google Scholar]
  • 18.Floriddia E. The impact of ambient RNA. Nat Neurosci. 2022;25(12):1583. doi: 10.1038/s41593-022-01232-0 [DOI] [PubMed] [Google Scholar]
  • 19.Yang S, Corbett SE, Koga Y, Wang Z, Johnson WE, Yajima M, et al. Decontamination of ambient RNA in single-cell RNA-seq with DecontX. Genome Biol. 2020;21(1):57. doi: 10.1186/s13059-020-1950-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Young MD, Behjati S. SoupX removes ambient RNA contamination from droplet-based single-cell RNA sequencing data. Gigascience. 2020;9(12):giaa151. doi: 10.1093/gigascience/giaa151 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Popescu D-M, Botting RA, Stephenson E, Green K, Webb S, Jardine L, et al. Decoding human fetal liver haematopoiesis. Nature. 2019;574(7778):365–71. doi: 10.1038/s41586-019-1652-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Hao Y, Stuart T, Kowalski MH, Choudhary S, Hoffman P, Hartman A, et al. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat Biotechnol. 2024;42(2):293–304. doi: 10.1038/s41587-023-01767-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.McGinnis CS, Murrow LM, Gartner ZJ. DoubletFinder: Doublet Detection in Single-Cell RNA Sequencing Data Using Artificial Nearest Neighbors. Cell Syst. 2019;8(4):329-337.e4. doi: 10.1016/j.cels.2019.03.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Choudhary S, Satija R. Comparison and evaluation of statistical error models for scRNA-seq. Genome Biol. 2022;23(1):27. doi: 10.1186/s13059-021-02584-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Hao Y, Hao S, Andersen-Nissen E, Mauck WM 3rd, Zheng S, Butler A, et al. Integrated analysis of multimodal single-cell data. Cell. 2021;184(13):3573-3587.e29. doi: 10.1016/j.cell.2021.04.048 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Aizarani N, Saviano A, Sagar, Mailly L, Durand S, Herman JS, et al. A human liver cell atlas reveals heterogeneity and epithelial progenitors. Nature. 2019;572(7768):199–204. doi: 10.1038/s41586-019-1373-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.MacParland SA, Liu JC, Ma X-Z, Innes BT, Bartczak AM, Gage BK, et al. Single cell RNA sequencing of human liver reveals distinct intrahepatic macrophage populations. Nat Commun. 2018;9(1):4383. doi: 10.1038/s41467-018-06318-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Payen VL, Lavergne A, Alevra Sarika N, Colonval M, Karim L, Deckers M, et al. Single-cell RNA sequencing of human liver reveals hepatic stellate cell heterogeneity. JHEP Rep. 2021;3(3):100278. doi: 10.1016/j.jhepr.2021.100278 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Ramachandran P, Dobie R, Wilson-Kanamori JR, Dora EF, Henderson BEP, Luu NT, et al. Resolving the fibrotic niche of human liver cirrhosis at single-cell level. Nature. 2019;575(7783):512–8. doi: 10.1038/s41586-019-1631-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Zhang M, Yang H, Wan L, Wang Z, Wang H, Ge C, et al. Single-cell transcriptomic architecture and intercellular crosstalk of human intrahepatic cholangiocarcinoma. J Hepatol. 2020;73(5):1118–30. doi: 10.1016/j.jhep.2020.05.039 [DOI] [PubMed] [Google Scholar]
  • 31.Kolberg L, Raudvere U, Kuzmin I, Adler P, Vilo J, Peterson H. g:Profiler-interoperable web service for functional enrichment analysis and gene identifier mapping (2023 update). Nucleic Acids Res. 2023;51(W1):W207–12. doi: 10.1093/nar/gkad347 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Heberle H, Meirelles GV, da Silva FR, Telles GP, Minghim R. InteractiVenn: a web-based tool for the analysis of sets through Venn diagrams. BMC Bioinformatics. 2015;16(1):169. doi: 10.1186/s12859-015-0611-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Patil VS, Madrigal A, Schmiedel BJ, Clarke J, O’Rourke P, de Silva AD, et al. Precursors of human CD4+ cytotoxic T lymphocytes identified by single-cell transcriptome analysis. Sci Immunol. 2018;3(19):eaan8664. doi: 10.1126/sciimmunol.aan8664 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Waickman AT, Friberg H, Gromowski GD, Rutvisuttinunt W, Li T, Siegfried H, et al. Temporally integrated single cell RNA sequencing analysis of PBMC from experimental and natural primary human DENV-1 infections. PLoS Pathog. 2021;17(1):e1009240. doi: 10.1371/journal.ppat.1009240 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Chandele A, Sewatanon J, Gunisetty S, Singla M, Onlamoon N, Akondy RS, et al. Characterization of Human CD8 T Cell Responses in Dengue Virus-Infected Patients from India. J Virol. 2016;90(24):11259–78. doi: 10.1128/JVI.01424-16 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Tsai C-Y, Liong KH, Gunalan MG, Li N, Lim DSL, Fisher DA, et al. Type I IFNs and IL-18 regulate the antiviral response of primary human γδ T cells against dendritic cells infected with Dengue virus. J Immunol. 2015;194(8):3890–900. doi: 10.4049/jimmunol.1303343 [DOI] [PubMed] [Google Scholar]
  • 37.Sheng C, Lopes R, Li G, Schuierer S, Waldt A, Cuttat R, et al. Probabilistic machine learning ensures accurate ambient denoising in droplet-based single-cell omics. Cold Spring Harbor Laboratory. 2022. doi: 10.1101/2022.01.14.476312 [DOI] [Google Scholar]

Decision Letter 0

Wan-Tien Chiang

4 Sep 2024

PONE-D-24-33285Understanding and mitigating the impact of ambient mRNA contamination in single-cell RNA-sequencing analysisPLOS ONE

Dear Dr. Kumar Arora,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Oct 19 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols .

We look forward to receiving your revised manuscript.

Kind regards,

Wan-Tien Chiang

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Thank you for stating the following financial disclosure: 

"This project is funded by the mid-career researcher grant from National Research Council of Thailand (NRCT) and Mahidol University (N42A670557) through VC."

Please state what role the funders took in the study.  If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." 

If this statement is not correct you must amend it as needed. 

Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf.

3. We noted in your submission details that a portion of your manuscript may have been presented or published elsewhere. [Single-cell RNA-seq datasets of dengue patients and one healthy donor were available through the ArrayExpress repository under accession number E-MTAB-9467. Additionally, one healthy sample was obtained from the 10x Genomics website (4k PBMCs from a healthy donor, Single Cell Gene Expression Dataset by Cell Ranger 2.1.0, 10x Genomics, 2017, November 8, https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/pbmc4k). The raw and filtered gene-barcode matrices of the ten samples were obtained from: https://data.mendeley.com/datasets/6ry3x7r8hf/3. The raw and filtered gene-barcode matrices of forty single-cell experiments from human fetal liver tissues were deposited in the GigaDB repository, available at: http://gigadb.org/dataset/100836#.  ] Please clarify whether this [conference proceeding or publication] was peer-reviewed and formally published. If this work was previously peer-reviewed and published, in the cover letter please provide the reason that this work does not constitute dual publication and should be included in the current manuscript.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: I Don't Know

Reviewer #2: No

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors present a study of the effects of removal of cell-free or "ambient" RNA on downstream interpretations of scRNA-seq data. The authors used publicly-available datasets for this study. This topic is an interesting one for the field, because removal of cell-free RNA has not yet become a ubiquitous part of scRNA-seq data analysis, and it is important for studies like this to demonstrate rigorously whether or not removal of cell-free RNA impacts biological findings in an important and quantitative way.

While the topic and stated aims of the study are of great interest, the analyses as presented may not go as far as I would personally like to see. I have several recommendations to strengthen the study.

Major concerns:

1. The major claim about the usefulness of removing ambient RNA is that it enhances "the accuracy of computational analyses and the robustness of data interpretation in scRNA-seq datasets." While I do agree with this statement, the results in the text focus on (1) increased log2-fold-changes of known marker genes, and (2) "notable improvement in clustering efficiency". (1) has been demonstrated by previous authors as cited in this manuscript. See the SoupX paper [15] Figure 3 for an analysis of PBMCs, which is quite similar to Figures 2 and 3 in this manuscript. And see the SoupX paper [15] Figure 4 for an analysis of kidney data, specifically showing the removal of the hemoglobin gene HBB, which is quite similar to Figure 4 in this manuscript. That is not to say that there is nothing novel here, but I am not sure that log-fold-changes of marker genes goes far enough to demonstrate the utility of ambient RNA removal.

2. Major claim (2) above – “notable improvement in clustering efficiency” – may be a bit problematic. How do the authors quantify improvements in clustering efficiency? What does clustering “efficiency” mean? In Figure 4 for example, I see UMAPs before and after correction. And I see a plot of PCs before and after correction. The problem with the UMAP is that it is qualitative. The problem with the PC plot is that I am not sure whether the movement of dots on this plot is meaningful. How meaningful? How can this be quantified?

3. The authors state in the introduction that “the effects of ambient mRNA correction on certain downstream analyses … remains largely unclear.” While I agree, the authors should probably note that other studies have been carried out which focus largely on this question, including PMID: 36240767, but there may be others.

4. The authors focused the study solely on the SoupX method. While it makes sense to limit the scope of work, it is unclear whether the conclusions would be the same if DecontX or CellBender were used. (See suggestions below.)

Minor concerns:

1. The dataset integration in the UMAP in Figure S9 looks potentially problematic. Was the data integrated by the authors? How? Is that the statement in the Methods about “processed as described in the original article”? I am not sure this would give me enough information to reproduce the analysis here. What does that UMAP look like if cells are colored by “batch”?

2. CD79A is listed in the text as a B cell marker, but shows up in Table S1 as a plasma cell marker.

3. It is unclear to me whether it is sound to run SoupX with a manually-provided set of hemoglobin genes, and then look at the output performance on hemoglobin genes to show that SoupX worked well. Namely “virtually no Hb genes were detected after the ambient mRNA correction.” Is this a good thing? How do we know that SoupX is not overcorrecting? How do we know if SoupX worked for other genes which were not provided on the input “potential contamination gene list”?

4. TUBB and TUBA1B are referred to as “adhesion molecules”, which I am not sure is accurate.

5. Section 3.4 begins “We then asked if the improvement on cell type identification and DEG analyses as a result of ambient mRNA correction can be seen in other independent dataset…” Did the authors demonstrate improvement in cell type identification? I would be very interested to see this, but I am not sure this was clearly demonstrated here. Something like scPred (PMID: 31829268) or another automatic-cell-type-annotation tool could potentially be used to annotate the datasets before and after ambient RNA removal. Then the performance enhancement could be quantified.

6. How are there so many B cell populations in Fig. 4 and Fig. S11? There appear to be 12 B cell clusters after correction. This seems like a large number of clusters of B cells. Are these biological meaningful clusters, and how can a reader be convinced of that? Or are we over-clustering here? What does this UMAP look like if cells are colored by “batch”?

7. I was confused about the point being made in Fig. 4. What is the relationship between the cells in “before correction” cluster 13 and “after correction” clusters 10 and 11? Are they the same cells? In general, how do the authors identify which “before correction” cluster corresponds to which “after correction” cluster (especially as there are different numbers of clusters)?

8. What does Fig. 4C look like “before correction”, if you use the same “after correction” cluster labels for cluster 10 and 11? Are the histone-related gene markers something that is only seen after ambient RNA correction?

Suggestions:

This study would be much more impactful if it either (1) went further with analyses of downstream effects, and/or (2) compared multiple methods of ambient RNA removal, rather than SoupX alone.

Recommendations:

I would really like to see a study like this published, and I think the authors have followed an interesting line of work. However, I have some concerns about whether the current analyses as presented go farther than what is shown in the literature. In particular, there is work that seems comparable in Figures 3 and 4 of [15], or Figure 2 and 3 of [14], or Figure 4 and 5 of [16]. I recommend this article should be published with some subset of the above-suggested revisions.

Reviewer #2: Please see the attached comments.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy .

Reviewer #1: No

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: comments.docx

pone.0332440.s021.docx (15.9KB, docx)
PLoS One. 2025 Sep 24;20(9):e0332440. doi: 10.1371/journal.pone.0332440.r002

Author response to Decision Letter 1


27 Feb 2025

Dear Dr. Wan-Tien Chaing, PLOS ONE Editorial Team, and Reviewers,

We would like to thank you for the opportunity to improve our manuscript and for the insightful comments that have helped enhance the quality of our study. We have carefully addressed and incorporated the suggestions into our revised manuscript. Please find point-by-point responses and specific details of the changes made in the“Response to Reviewers” document.

Best regards,

Jantraika Kumar Arora (on behalf of the authors)

Attachment

Submitted filename: Response to Reviewers.pdf

pone.0332440.s023.pdf (6.6MB, pdf)

Decision Letter 1

Wan-Tien Chiang

20 May 2025

PONE-D-24-33285R1Understanding and mitigating the impact of ambient mRNA contamination in single-cell RNA-sequencing analysisPLOS ONE

Dear Dr. Kumar Arora,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jul 04 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols .

We look forward to receiving your revised manuscript.

Kind regards,

Wan-Tien Chiang

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #3: All comments have been addressed

Reviewer #4: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #3: Yes

Reviewer #4: Partly

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #3: I Don't Know

Reviewer #4: (No Response)

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #3: Yes

Reviewer #4: (No Response)

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #3: Yes

Reviewer #4: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #3: 1. For introduction part: regrading impact of ambient mRNA contamination in single-cell RNA-sequencing analysis on differential gene expression and biological pathway enrichment analyses, can authors explain the differences between real biological findings and artifacts induced by ambient mRNA contamination?

2. Methods/results: authors may use simulated dataset to address impact of ambient mRNA contamination in single-cell RNA-sequencing analysis on differential gene expression and biological pathway enrichment analyses in following different simulated matrix if ambient mRNA contamination are due to following reasons:

1) Cell death or damage: background gene expression signals; Marker genes appearing unexpectedly in non-target cells (false positives); Abnormally elevated housekeeping, ribosomal, and mitochondrial genes

2) Cross-contamination in library preparation: Low-level expression of highly cell-type-specific genes in unrelated cell populations

3) Improper cell concentration: Overloaded cells result in multiple cells per droplet (doublets); Mixed expression patterns from multiple cell types (e.g., T cell and monocyte markers simultaneously detected)

3. Methods/results: for determine impact of ambient mRNA contamination in single-cell RNA-sequencing analysis on differential gene expression and biological pathway enrichment analyses, authors may need explain quantitative standards, not just using qualitative methods such as "increase/decrease"

4. Discussion: authors may need explain potential benchmakt effects for many related softwares such as DecontX, scAR

Reviewer #4: The authors applied two computational methods (SoupX and CellBender) to estimate and remove ambient mRNAs in droplet-based scRNA-seq datasets and observed reduced false positive differential expressed genes (DEGs) on 2 cell types of interests and subsequently improved biological pathways. The results in this manuscript show dataset dependent performance improvements as well as some level of persisting artifacts after using decontamination methods. This would serve as a useful reference for the field to draw analysis conclusions with caution and scrutiny on scRNAseq analysis.

I appreciate the comments from Reviewer 1 and Reviewer 2 and authors’ additional analysis and improved clarity. I have a few suggestions that I believe would be necessary to further improve the manuscript:

1. Authors should report cell-level as well as dataset level contamination percentage estimated by both methods (SoupX and CellBender).

As pointed out by reviewer 2, a simulation study could further illustrate how contamination level would affect downstream analysis. Although simulation itself could potentially require the work that’s beyond the scope of this manuscript, a deeper insight could be provided by reporting the estimated contamination levels by both methods and commenting on their associations with dataset dependent improvements.

2. 2000 DEGs are a lot for a subpopulation – i.e. a highly functionally specific cell type (“top 2000 DEGs in pro-B cells” Page 6 Line 156). And this result is even more concerning when they are only the positive set of the DEGs (“the positive markers were retained (Fig 3A and 4A).” Page 6 Line 148) – implying the whole DEG set including both the positively and negatively differentially expressed gene pool is even larger.

Can the authors explain whether this is biologically plausible? In addition, authors should report the number of genes used for performing DE analysis as well as the resulting number of DEGs.

3. Fig S9 and S19 provided relative cell-type composition information but lacked absolute value information – the number of cells in each type before and after ambient mRNA correction. Cell-type assignment could change after correction, and it would be informative to show magnitude of such changes for both methods given two very different methodological approaches between SoupX and CellBender.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy .

Reviewer #3: No

Reviewer #4: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step.

PLoS One. 2025 Sep 24;20(9):e0332440. doi: 10.1371/journal.pone.0332440.r004

Author response to Decision Letter 2


3 Jul 2025

We would like to thank you for the opportunity to revise our manuscript and for the insightful comments that have significantly improved the quality of our study. We have carefully addressed and incorporated the suggestions into the revised manuscript. Please find our point-by-point responses and specific details of the changes made in the “Response to Reviewers” document.

Attachment

Submitted filename: Response_to_Reviewers_auresp_2.pdf

pone.0332440.s024.pdf (1.1MB, pdf)

Decision Letter 2

Wan-Tien Chiang

29 Jul 2025

PONE-D-24-33285R2Understanding and mitigating the impact of ambient mRNA contamination in single-cell RNA-sequencing analysisPLOS ONE

Dear Dr. Kumar Arora,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Sep 12 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols .

We look forward to receiving your revised manuscript.

Kind regards,

Wan-Tien Chiang

Academic Editor

PLOS ONE

Journal Requirements:

1. If the reviewer comments include a recommendation to cite specific previously published works, please review and evaluate these publications to determine whether they are relevant and should be cited. There is no requirement to cite these works unless the editor has indicated otherwise. 

2. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #3: All comments have been addressed

Reviewer #4: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #3: Yes

Reviewer #4: (No Response)

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #3: Yes

Reviewer #4: (No Response)

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #3: Yes

Reviewer #4: (No Response)

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #3: (No Response)

Reviewer #4: (No Response)

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #3: Last question remarks for this manuscript were mostly addressed:

1. Introduction: Add a new example;

2. Methods: include a new ambient mRNA correction method;

3. Methods/Results: statistical methods were used in celltype annotation/DE analysis/pathway enrichment;

4. Add a new validation dataset.

Reviewer #4: Thanks to the authors addressing clearly to my comments.

There is a minor clarification I would request authors to state in the manuscript about log-fold change threshold:

It was stated DE analysis's criterium on log-fold change is 0.1, "a minimum log2 fold-change threshold of 0.1 " (Page 6 Line 157). for the purpose of examining false positive DE genes in the context of this manuscript is a okay. However this should not be used in general to draw biological conclusions upon as log2 ratio being 0.1 means the expression ratio of the 2 compared groups is 1.0717, i.e. one group has about 7% higher expression than the other which is extremely low.

Hence authors should state in the manuscript that this threshold 0.1 was chosen to have a larger gene pool to show ambient RNAs' bias effect on DE analysis rather as the normal standard for DE analysis.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy .

Reviewer #3: No

Reviewer #4: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step.

PLoS One. 2025 Sep 24;20(9):e0332440. doi: 10.1371/journal.pone.0332440.r006

Author response to Decision Letter 3


2 Aug 2025

We greatly appreciate the reviewers’ thoughtful and insightful feedback. We have carefully addressed all comments and incorporated the suggested revisions into the manuscript. Please find our point-by-point responses and details of the changes in the “Response to Reviewers” document.

Sincerely,

Jantarika Kumar Arora (on behalf of the authors)

Attachment

Submitted filename: Response_to_Reviewers_auresp_3.pdf

pone.0332440.s025.pdf (103.9KB, pdf)

Decision Letter 3

Wan-Tien Chiang

31 Aug 2025

Understanding and mitigating the impact of ambient mRNA contamination in single-cell RNA-sequencing analysis

PONE-D-24-33285R3

Dear Dr. Kumar Arora,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager®  and clicking the ‘Update My Information' link at the top of the page. For questions related to billing, please contact billing support .

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Wan-Tien Chiang

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Wan-Tien Chiang

PONE-D-24-33285R3

PLOS ONE

Dear Dr. Kumar Arora,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

You will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Wan-Tien Chiang

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Table. Know canonical markers.

    (XLSX)

    pone.0332440.s001.xlsx (9.4KB, xlsx)
    S2 Table. The total numbers of DEGs.

    (XLSX)

    pone.0332440.s002.xlsx (9.9KB, xlsx)
    S3 Table. Average expression levels and percentages of cells expressing known canonical marker genes across annotated cell types, related to Fig 1B (dengue infection datasets).

    (XLSX)

    pone.0332440.s003.xlsx (12.7KB, xlsx)
    S4 Table. Average expression levels and percentages of cells expressing known canonical marker genes across annotated cell types, related to Fig 1B (dengue infection datasets).

    (XLSX)

    pone.0332440.s004.xlsx (13.2KB, xlsx)
    S5 Table. Top 50 potential ambient mRNA genes predicted by SoupX (dengue infection-dataset).

    (XLSX)

    pone.0332440.s005.xlsx (6.2KB, xlsx)
    S6 Table. Average expression levels of known canonical marker genes across annotated cell types before and after correction, related to Fig 2 (dengue infection datasets).

    (XLSX)

    pone.0332440.s006.xlsx (17.8KB, xlsx)
    S7 Table. Average expression levels of B cell-related genes across B and Non-B cells before and after correction, related to Fig 2 (dengue infection datasets).

    (XLSX)

    pone.0332440.s007.xlsx (13.6KB, xlsx)
    S8 Table. Number and percentage of cells expressing Ig genes.

    (XLSX)

    pone.0332440.s008.xlsx (24.7KB, xlsx)
    S9 Table. Numbers of annotated T and B cell subpopulations before and after correction. Related to Figs 34.

    (XLSX)

    pone.0332440.s009.xlsx (18.8KB, xlsx)
    S10 Table. Average expression levels of immunoglobulin genes across T cell suppopulations before and after correction, related to Fig 3 (dengue infection datasets).

    (XLSX)

    pone.0332440.s010.xlsx (16.2KB, xlsx)
    S11 Table. Significant 17 GO:BPs in T subsets before correction (dengue infection-dataset).

    (XLSX)

    pone.0332440.s011.xlsx (10.1KB, xlsx)
    S12 Table. Significant 191 GO:BPs in T subsets CellBender correction (dengue infection-dataset).

    (XLSX)

    pone.0332440.s012.xlsx (29.7KB, xlsx)
    S13 Table. Significant 46 GO:BPs in T subsets SoupX correction (dengue infection-dataset).

    (XLSX)

    pone.0332440.s013.xlsx (12.3KB, xlsx)
    S14 Table. Top 50 potential ambient mRNA genes predicted by SoupX (fetal liver tissue-dataset).

    (XLSX)

    pone.0332440.s014.xlsx (6.2KB, xlsx)
    S15 Table. Significant 279 unique GO:BPs in Pro-B cells before correction, compared to CellBender (fetal liver tissue-dataset).

    (XLSX)

    pone.0332440.s015.xlsx (41.7KB, xlsx)
    S16 Table. Significant 365 unique GO:BPs in Pro-B cells before correction, compared to SoupX (fetal liver tissue-dataset).

    (XLSX)

    pone.0332440.s016.xlsx (52.4KB, xlsx)
    S17 Table. Significant 60 unique GO:BPs in Pro-B cells CellBender correction (fetal liver tissue-dataset).

    (XLSX)

    pone.0332440.s017.xlsx (14.9KB, xlsx)
    S18 Table. Significant 47 unique GO:BPs in Pro-B cells SoupX correction (fetal liver tissue-dataset).

    (XLSX)

    pone.0332440.s018.xlsx (12.3KB, xlsx)
    S19 Table. Significant 82 GO BPs in a species-mixing dataset (10x Genomics) after SoupX correction.

    (XLSX)

    pone.0332440.s019.xlsx (20.4KB, xlsx)
    S1 File. Supporting Figs S1–S25.

    (PDF)

    pone.0332440.s020.pdf (6.8MB, pdf)
    Attachment

    Submitted filename: comments.docx

    pone.0332440.s021.docx (15.9KB, docx)
    Attachment

    Submitted filename: Response to Reviewers.pdf

    pone.0332440.s023.pdf (6.6MB, pdf)
    Attachment

    Submitted filename: Response_to_Reviewers_auresp_2.pdf

    pone.0332440.s024.pdf (1.1MB, pdf)
    Attachment

    Submitted filename: Response_to_Reviewers_auresp_3.pdf

    pone.0332440.s025.pdf (103.9KB, pdf)

    Data Availability Statement

    Single-cell RNA-sequencing datasets: The raw sequencing reads of the single-cell RNA-seq datasets used in this study include the peripheral blood mononuclear cell (PBMC) datasets, which consists of eight single-cell experiments from dengue patients and one healthy donor [13], available through the ArrayExpress repository: E-MTAB-9467. Another healthy sample was obtained from the 10x Genomics website (4k PBMCs from a healthy donor, Single Cell Gene Expression Dataset by Cell Ranger 2.1.0, 10x Genomics, 2017, November 8, https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/pbmc4k). In addition, the raw sequencing data of forty-two single-cell experiments of human fetal liver tissues [21] were obtained from the ArrayExpress database, under the accession number: E-MTAB-7407. The raw and filtered 10x Genomics species-mixing dataset, which contains a mixture of human HEK293T and mouse NIH3T3 cells, is available at https://www.10xgenomics.com/datasets/10-k-1-1-mixture-of-human-hek-293-t-and-mouse-nih-3-t-3-cells-3-v-3-1-3-1-standard-6-0-0, retrieved on 20 June 2025.


    Articles from PLOS One are provided here courtesy of PLOS

    RESOURCES