Abstract
Cancer tissues are heterogeneous mixtures of tumor, stromal and immune cells, where each component comprises multiple distinct cell types and/or states. Mapping this heterogeneity and understanding the unique contributions of each cell type to the tumor transcriptome is crucial for advancing cancer biology, yet high-throughput expression profiles from tumor tissues only represent combined signals from all cellular sources. Computational deconvolution of these mixed signals has emerged as a powerful approach to dissect both cellular composition and cell-type-specific expression patterns. Here, we provide a comprehensive guide to transcriptome deconvolution, specifically tailored for cancer researchers, presenting a systematic framework for selecting and applying deconvolution methods, considering the unique complexities of tumor tissues, data availability, and method assumptions. We detail 43 deconvolution methods and outline how different approaches serve distinctive applications in cancer research: from understanding tumor-immune surveillance to identifying cancer subtypes, discovering prognostic biomarkers, and characterizing spatial tumor architecture. By examining the capabilities and limitations of these methods, we highlight emerging trends and future directions, particularly in addressing tumor cell plasticity and dynamic cell states.
Introduction
The central dogma of molecular biology describes the unidirectional flow of genetic information from DNA to RNA to protein, emphasizing the critical role of RNA as a messenger. Unlike DNA, RNA expression levels vary substantially across genes, cells, and biological conditions, providing crucial instructions for protein synthesis. The composition of RNA molecules varies greatly across cell types and states, making gene expression patterns invaluable for understanding cellular functions within tissues1.
Quantifying RNA molecules across cells and tissues is essential for elucidating the molecular mechanism underlying human diseases. Initial efforts to quantify RNA content at the cellular level began with techniques like Northern blotting2 and quantitative PCR (qPCR)3. Additional methods like RNA fluorescent in situ hybridization (FISH)4,5 and specialized flow cytometry6 techniques were later developed to detect RNA alongside protein markers7. While pioneering, these techniques were limited to analyzing only a few genes or cells simultaneously. Technological advances have since transformed the field, beginning with microarrays in 1995 ref8–10 and advancing to short-read sequencing in the 2000s ref11–14. These approaches typically involve RNA extracted from tissue samples containing thousands to millions of heterogeneous cells. Further advances have been driven by the development of single-cell RNA sequencing (scRNA-seq), which has become a powerful tool for dissecting the complex cellular landscape of cancer15–18, given the heterogeneous nature of most tumor tissues. By 2015, scRNA-seq became mainstream in biological research19,20, with continuous further refinements over the following decade.
Cancer is characterized by profound changes in gene expression and pathway activity, which drive its hallmark features21. Solid tumors consist of at least three major groups of cells: tumor cells, immune cells, and non-immune stromal cells, including cancer-associated fibroblasts, endothelial cells, and other cells (Box 1). Unlike bulk RNA-seq, which produces a composite signal from these diverse populations, scRNA-seq enables the analysis of individual cell types, providing critical insights into cancer biology. This approach has revealed substantial variability in tumor cell states22–24, ranging from fully differentiated to undifferentiated, as well as the diverse functional states of fibroblasts25 and immune cells22,26, highlighting the dynamic nature of the tumor microenvironment (TME). Despite its strengths, scRNA-seq faces several technical and analytical limitations (Fig. 1), which are exacerbated by the complexity of tumor tissues (Box 1). Bulk RNA-seq, while less detailed, is more resource-efficient and more logistically suitable for large cohort studies (Fig. 1), with extensive data available for thousands of patients across various cancers. Therefore, while scRNA-seq excels in resolving cellular heterogeneity, bulk RNA-seq remains indispensable for analyzing large patient cohorts with long-term clinical outcome data, which is essential for establishing prognostic biomarkers.
Box 1.
Unique challenges of transcriptomic deconvolution in cancer
Key features that create significant barriers to accurate transcriptomic deconvolution include:
Tumor heterogeneity and plasticity.
Cancer tissues exhibit remarkable variability in cellular composition at multiple levels, and the proportion of malignant cells can range from less than 10% to over 90% between different patients or among tumor regions of the same patient. Further, tumor cells display exceptional transcriptional diversity15,166,167 and phenotypic plasticity [G], whereby transitioning between cellular states is associated with vastly different transcriptional landscapes23,30. This plasticity violates a core assumption of most deconvolution approaches: that reference profiles remain stable across samples. Even when single-cell studies identify recurring cancer cell states, their dynamic nature and tendency to exist along a continuum rather than as discrete entities168 impedes accurate deconvolution.
Tumors also contain a diverse mixture of infiltrating immune cells, stromal cells, and vascular components, all in varying activation states and proportions. These elements create distinct microenvironmental niches. Cancer-associated stromal cells also exhibit a wide range of activation states that differ markedly from or are absent in their counterparts in healthy tissues169–173. For example, cancer-associated fibroblasts can exist in at least 4-6 distinct functional states not observed in normal tissues174–177, while tumor-associated macrophages display polarization signatures beyond the classical M1 and M2 dichotomy observed in normal inflammation178,179.
While heterogeneity exists in non-malignant contexts180, even in blood samples, the combinatorial complexity arising from genetic instability, clonal evolution, and microenvironmental remodeling creates unique, cancer-specific challenges for deconvolution. Even within the same tumor type, substantial variation in cellular composition can occur between patients and across different stages of disease progression - at a magnitude far surpassing what is typically observed in normal tissues.
[bH1] Limitations of reference data. A fundamental challenge in cancer deconvolution is obtaining comprehensive reference data for all cell types and subtypes present in tumor tissues. Recent benchmarking studies indicate that cell-type deconvolution methods, when applied to tumor samples, can exhibit reduced accuracy compared to non-cancer applications51,181. Using peripheral blood-derived immune cell signatures for tumor deconvolution can result in lower accuracy (correlation coefficients as low as r=0.04) compared to using tumor-derived references (r>0.8)182. This gap in performance due to reference data limitations constitutes a major technical challenge in deconvolution and represents a critical distinction between cancer and non-cancer applications.
Technical challenges in reference generation.
Generating reliable scRNA-seq and snRNA-seq data to create a reliable reference for cancer presents substantial technical hurdles155,156,158. Low capture efficiency [G] and dropout effects [G] result in sparse expression matrices that may miss key marker genes. Cell doublets can create artificial hybrid expression profiles that confound reference signatures183. Tissue dissociation can bias against certain cell populations. These factors can further mask rare cell-type-specific markers and immune cell populations (for example, neutrophils157) and less characterized stem-like epithelial cells184 are likely underrepresented in single-cell data185. These limitations can introduce large biases in cell-type specific estimates, when single-cell data are used as a reference.
Multicollinearity [G] in cell types.
Cell types or states often display highly correlated transcriptional programs, creating multicollinearity issues that confound accurate deconvolution79,186,187. This is particularly evident in related immune cell states within the TME (for example, different T cell subtypes) and in tumor cells with gradient-like state transitions. Standard deconvolution methods show substantial estimation errors when cell types share significant portions of their marker genes - a common challenge in the heterogeneous tumor microenvironment74,76. Only a few methods adopted steps to explicitly address this challenge. For example, CIBERSORT uses ν-support vector regression (ν-SVR) with LASSO regularization [G] to select discriminative marker genes and mitigate the effects of collinearity between cell types27, and DeMixT introduced a profile likelihood-based score to select genes that more closely follow the model assumption of distinctions in gene expression between mixing components61. In general, cell subtypes with collinear gene expression patterns exhibit higher error rates or uncertainty than others68. Therefore, the benchmarked deconvolution accuracy of a given method is a fluid concept, as it may decrease substantially when the more granular cell subtypes are of primary interest, when high-resolution deconvolution is required. Other methods like MuSiC implement a hierarchical approach to first deconvolve major cell types and then further deconvolve cell subtypes within each major group, thereby reducing the dimensionality of the problem at the initial step29,44.
Rare cell population detection.
A recognized challenge in single-cell analysis is that biological discoveries are often limited by difficulties in capturing and profiling rare cell populations that may disproportionately influence the clinicopathological outcome of tumors160,161,188,189. In cancer, examples include stem-like tumor cells, cells of minimal residual disease, specific effector or regulatory immune subtypes, or early metastatic precursors190–193. It is nearly impossible to estimate, through deconvolution, rare cell populations that have yet to be fully characterized. Even for those that are well characterized, the transcriptomic signal from rare cells is easily obscured by dominant signals from more abundant cell types and background noise [G]. This unfavorable signal-to-noise ratio typically becomes problematic when fractions fall below 1-5%194. Consistent detection and accurate quantification of very rare populations (that are below 1% within tumors) remains one of the biggest challenges in cancer transcriptomic deconvolution. While computational advances continue to emerge in this area, rigorous benchmarking of method sensitivity for rare populations is still needed. Establishing robust lower detection limits for deconvolution methods, potentially drawing insights from ultra-sensitive techniques in liquid biopsy analysis195, will be crucial for reliable transcriptomic quantification of clinically significant rare cell populations.
Figure 1. Benefits and limitations of bulk and scRNA-seq data generation.
Schematic overview of the bulk and scRNA-seq data generation. scRNA-seq begins with high-quality tissue samples dissociated into single cells or nuclei. These samples undergo rigorous quality control, both experimentally (for example, cell sorting and filtering out low viability cells) and bioinformatically (for example, excluding cells with low unique molecular identifier (UMI) counts, low gene reads, high mitochondrial content, or doublets). The resulting data is a carefully curated single-cell resolution atlas characterized by high costs, low sample sizes, and complex experimental procedures. Widespread use of scRNA-seq is hampered by several technical challenges. The dissociation of fresh tissues varies in efficiency across cancer types155, often depletes some cell populations51,156,157, and can induce substantial transcriptional changes156. Single-nucleus RNA sequencing (snRNA-seq) can address some of these limitations, enabling the profiling of fragile or adherent cell types, particularly from frozen tissues158. However, it brings other disadvantages, such as lower transcriptome coverage, loss of cytoplasmic RNA information, and an increased risk of ambient RNA contamination159. Both scRNA-seq and snRNA-seq approaches are further subjected to shallow sequencing depth and dropout effects, resulting in sparse data160. From the generated raw data, the manual curation and annotation of single cells involves a series of subjective decision-making steps, where technical effects such as doublets or low-quality cells confound the biological variation of cells161. Batch effects from experimental complexity and patient heterogeneity impede cross-study data integration162, hindering accurate identification of cell subpopulations. These challenges are further compounded by limitations in clinical application, as most of the available single-cell datasets present a small number of patients and lack long-term follow-up data. 10X scRNA-seq databases like 3CA163 (965 patients from 76 studies), and TISCH2 ref164 (969 patients from 139 studies) showed notably smaller cohort sizes, and neither of the databases provide clinical information. Researchers may have to perform extensive literature review to identify scRNA-seq data linked to clinical metadata (see Supplementary Table 1 for details of scales of data availability for four major cancer types). In contrast, bulk RNA-seq can be applied to a variety of sample types, including fresh, fresh-frozen, and FFPE tissues. The process involves tissue lysis followed by sequencing, allowing for application across large cohorts due to its cost-effectiveness, low technical variability, robustness to sample quality, and straightforward experimental protocols. Bulk RNA-seq, while confounded by cellular heterogeneity, benefits from long-term clinical follow-up. As of December 2024, cBioPortal165 (a database for bulk cohorts) contained data from over 20,000 patients from more than 80 studies spanning various cancer types, with survival information available for 86% of the cases. By integrating both types of sequencing data through deconvolution, researchers can mitigate the limitations of each approach, leveraging the comprehensive resolution of scRNA-seq with the broad applicability of bulk RNA-seq to enhance the analysis of complex tissue samples.
The integration of bulk RNA-seq and scRNA-seq data offers a compelling opportunity to advance cancer research by leveraging the strengths of both methodologies. However, bulk RNA-seq data analysis requires interpreting signals derived from a mixture of cell types, which is an important challenge. Mathematical deconvolution [G] addresses this challenge by separating these complex signals into their constituent cell-type-specific components. Overall, current deconvolution methods enable the estimation of cell-type proportions, with some methods further estimating cell-type specific gene expression profiles, thereby extracting detailed biological insights from bulk datasets.
Many deconvolution methods were initially developed and validated using non-cancer tissue samples27–29, where cellular composition remains relatively stable and cell types maintain consistent gene expression patterns. However, analyzing cancer tissues adds unique complexities (Box 1), including tumor heterogeneity, cellular plasticity and intercellular interactions23,24,30,31, which were mapped with the help of spatial sequencing technologies32–35. These features underscore the need for specialized approaches or careful adaptation of existing deconvolution methods in cancer research.
In this Review, we emphasize the pivotal role of transcriptomic deconvolution in harmonizing bulk and single-cell data from cancer, facilitating the estimation of individual cell type contributions, and mapping these insights to clinical outcomes. For the purpose of this review, we define deconvolution as an in silico procedure that uses bulk RNA-seq data to produce cell-type- or compartment-specific measurements, such as absolute or relative counts of cells, or corresponding gene expression levels. We aim to equip cancer researchers with bioinformatics knowledge necessary to fully utilize bulk RNA-seq data, ultimately enhancing our understanding of tumor biology. We guide researchers to effectively navigate through the process of performing deconvolution, select the optimal method(s) based on their specific research question, and understand future developments in the field. To achieve this, we first provide an overview of the available deconvolution techniques, including reference-based, semi-reference-based, and reference-free approaches, along with insights from benchmarking studies; we then detail applications of transcriptomic deconvolution in cancer research, covering estimation of TME cellular composition, tumor subtype identification, extraction of tumor-specific expression profiles, and prediction of patient outcomes and treatment responses; and finally, discuss future directions in methodology development and clinical translation.
Overview of transcriptomic deconvolution techniques
Transcriptomic deconvolution techniques can be broadly categorized into three approaches: reference-based methods, semi-reference-based methods, and reference-free methods (Fig. 2, Table 1). These methods differ in their reliance on prior knowledge and the type of reference data (Box 2) they require, each offering distinct advantages and disadvantages influenced by the biological context under investigation.
Figure 2. Conceptual overview of bulk RNA-seq deconvolution methods.
High-level classification of deconvolution methods based on application scenarios and computational models. Deconvolution methods are categorized into reference-based, semi-reference-based, and reference-free approaches. The first row illustrates reference-based methods, where individual methods employ one of the following computational frameworks [We have specified how each approach employs the framework]: probabilistic models, Non-negative Matrix Factorization (NMF), linear regression, and deep learning frameworks. The second row depicts semi-reference-based methods, combining probabilistic models and NMF. The third row represents reference-free methods, which either utilize pre-defined cell type signatures, leverage gene signatures or operate without any reference, primarily relying on NMF. The outputs are classified into cell-type proportions (solid box, provided by all methods) and deconvolved cell-type-specific expression profiles (dotted box, provided by a subset of methods).
Table 1. Summary of bulk transcriptomic deconvolution tools classified by their underlying principles.
| Method | Reference (in addition to the bulk expression data) |
Output | Statistical modeling | Applicability a | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Ref. matrix |
Marker gene list |
Adjacent normal tissue sample |
Cell Fraction |
Mean expression |
Per-sample expression |
Marker gene list |
Cancer cell-specific |
Immune cells in cancer tissues |
Non-cancer specific / General |
||
| Bulk-reference based | |||||||||||
| CIBERSORT27 | □ b | – | – | □ | – | – | – | Support vector regression (SVR) | – | □ | □ |
| DeconRNASeq66 | □ | □ | Non-negative constrained least square regression | – | □ | □ | |||||
| EPIC28 | □ b | □ | Constrained least square regression | □ | □ | □ | |||||
| FARDEEP136 | □ | □ | Adaptive least-trimmed-squares regression | – | □ | □ | |||||
| GEDIT137 | □ | □ | Non-negative linear regression | □ | □ | ||||||
| MIXTURE101 | □ | □ | v-SVR | □ | □ | ||||||
| PERT138 | □ | □ | Non-negative maximum likelihood | □ | □ | ||||||
| quanTIseq40 | □ b | □ | Constrained least square regression | □ | – | ||||||
| TIMER39 | □ b | □ | Constrained least square regression | □ | |||||||
| Single-cell-reference based | |||||||||||
| AutoGeneS41 | □ | – | – | □ | – | – | – | SVR | – | □ | □ |
| BayesPrism46 | □ | □ | □ | □ | Gibbs sampling | □ | □ | □ | |||
| Bisque42 | □ | □ | – | – | Non-negative least squares | – | □ | □ | |||
| (NNLS) | |||||||||||
| GCDseqR48 | □ | □ | □ | Bayesian hierarchical model | – | □ | □ | ||||
| CIBERSORTx55 | □ | □ | □ | □ | □ | SVR | □ | □ | □ | ||
| CPM139 | □ | □ | – | – | – | SVR | □ | □ | |||
| deconvSeq140 | □ | □ | Generalized linear model | □ | □ | ||||||
| DeMixSC53 | □ b | □ | Weighted NNLS (wNNLS) trained with benchmark data from ovarian cancer | □ | □ | □ | |||||
| DWLS43 | □ | □ | Weighted non-negative least squares (wNNLS) | □ | □ | □ | |||||
| GTM-decon50 | □ | □ | □ | □ | Guided topic model | – | □ | □ | |||
| Kassandra102 | □ | □ | – | – | Supervised machine learning | □ | □ | □ | |||
| MuSiC2.044 | □ | □ | wNNLS | – | □ | □ | |||||
| MuSiC29 | □ | □ | wNNLS | □ | □ | ||||||
| ProM141 | □ | □ | wNNLS | □ | □ | ||||||
| RNA-Sieve142 | □ | □ | Maximum likelihood estimate (MLE) | □ | □ | ||||||
| Scaden49 | □ | □ | Deep learning trained with pseudo-bulk | □ | □ | ||||||
| SCDC45 | □ | □ | wNNLS | □ | □ | ||||||
| SQUID52 | □ | □ | Linear transform with dampened weighted least squares | □ | □ | □ | |||||
| Semi-reference based | |||||||||||
| BaylCE47 | □ | – | – | □ | □ | – | – | Bayesian hierarchical model | □ | □ | □ |
| Deblender143 | □ | □ | □ | □ | Least squares NMF Quadratic programming (QP) |
□ | □ | □ | |||
| DeCompress64 | □ | □ | □ | Compressed sensing | □ | □ | □ | ||||
| DeMix60 | – | □ | □ | □ | □ | MLE | □c | – | – | ||
| DeMixT61 | □ | □ | □ | □ | Iterated conditional modes (ICM) for MLE |
□c | □ | – | |||
| ISOpureR62 | □ | □ | □ | – | Dirichlet-multinomial mixture model | □c | – | – | |||
| NITUMID144 | □ | – | □ | □ | Semi-supervised NMF | □ | □ | □ | |||
| PREDE63 | □ | □ | □ | Iterative constrained QP with NMF | □ | □ | □ | ||||
| SECRET145 | □ | □ | – | Weighted constrained nonlinear optimization | □ | □ | □ | ||||
| Semi-CAM146 | □ | □ | □ | □ | Convex analysis of mixtures, semi-NMF | – | □ | □ | |||
| Reference free | |||||||||||
| CAM70,147 | – | – | – | ✓ | ✓ | – | ✓ | Convex analysis of mixtures (CAM) | – | ✓ | ✓ |
| DeClust71 | □ | □ | □ | Iterative optimization minimizing mean squared error (MSE) | □ | – | – | ||||
| deconf65 | □ | □ | – | Least squares non-negative matrix factorization (NMF) | – | □ | □ | ||||
| linseed68 | □ | – | □ | Simplex corner detection with fast combinatorial non-negative least squares (NNLS) |
□ | □ | □ | ||||
| MMAD148 | □ | □ | – | MLE | – | □ | – | ||||
| TOAST69 | □ | – | □ | Least squares NMF | □ | □ | □ | ||||
Available transcriptional deconvolution methods relevant to cancer are listed, including those published up to December 2024. Additional methods that fall under at least one of the following categories were excluded from the table: (1) focus on specific non-cancer tissue types, limiting applicability to tumor deconvolution; (2) proposed as immediate extension beyond existing approaches; (3) primary application to non-transcriptomic data types (for example, methylation data); and (4) lack of validation in cancer contexts or over established methods. For completeness, those additional computational deconvolution methods that are not included in this table but were included in a recent methods review79 are provided in Supplementary Table 2.
Applicability classifications are based on each tool’s intended design purpose and/or demonstrated effectiveness in published studies (original method papers and independent benchmarks51,52,58,72,74,76,149–153). Check marks (✓) indicate either: (1) the method was specifically designed for that application, or (2) the method has been successfully validated for that application in published studies. Absence of check marks may reflect design limitations or lack of validation studies rather than definitive unsuitability.
We discern three categories: tumor-specific deconvolution (estimates cancer cell proportions/expression profiles, and/or identifies cancer cell subtypes), immune cell-type estimation (quantifies immune cell populations in cancer tissues), and non-cancer specific or general cell type deconvolution (applicability to healthy tissues and other biological settings not yet demonstrated in cancer context).[Au: Perhaps the 3 applicability need to be supported by references? Maybe instead of the check mark, a specific reference number can be added? Reply: We attempted to add specific references for each application but found it impractical due to varying levels of validation across methods. Some methods like CIBERSORT have dozens of studies per application category, while others have only been demonstrated in original/benchmark studies. Adding references would clutter the table and compromise its function as a quick comparison tool. We do confirm that all checkmarks are based on at least one research paper. ]
The method has a built-in reference matrix without user input.
The method has been demonstrated to output the cancer cell-specific transcript proportion.
Box 2.
Choice of reference data
We define reference data broadly as input data that are required in addition to the raw mixed expression matrix to perform deconvolution. Most methods, except for reference-free deconvolution, require some reference data (Table 1). These come in varying formats: as a matrix of signature or marker gene expression per cell type; single-cell derived pooled mean expression across genes per cell type; or gene expression matrices for individual cell types (with incomplete cell types for semi-reference-based and complete cell types for reference-based deconvolution). The choice of these input data is critical, and therefore we provide the following considerations.
Marker genes and signature genes.
These two concepts are often used interchangeably to represent stably expressed genes that are specific for a given cell type, hence whose expression levels can be used to form a reference matrix. Historically, the term marker gene was also used to annotate a gene that is exclusively expressed in one cell type, for example, for some immune cell types36; these are ideal markers for regression-based deconvolution methods. However, it is challenging to identify stably expressed genes for tumor cells in individual cancer types. Although several studies developed approaches to find these signature genes75,196, more investigations are warranted.
Comprehensive cell type and tissue representation.
Effective cancer transcriptome deconvolution requires reference data that captures the full spectrum of relevant cell types within their tissue-specific transcriptional context. For example, immune reference panels such as LM2227, derived from PBMCs, have been widely used to profile immune infiltration across tumors. However, these references lack non-immune components and fail to reflect the tissue-specific transcriptional profiles of immune cells themselves, limiting their accuracy76. This limitation becomes more pronounced for non-immune populations such as epithelial cells or cancer-associated fibroblasts, which exhibit broader transcriptional diversity and stronger tissue dependence. Unlike many non-cancer applications where one generic reference may suffice, cancer deconvolution demands tissue- and context-specific references to capture the heterogeneity of both primary and metastatic tumors. Existing references often underrepresent rare, transitional, or site-specific cell states that play critical roles in tumor biology. Expanding reference datasets to cover a wider range of cell types, tissue origins, tumor subtypes, and treatment conditions could enhance deconvolution accuracy, though such efforts must be weighed against the complexity of data collection and integration.
Matching sequencing platforms.
Recent studies demonstrated the importance of training the deconvolution model using the measurable discrepancy between platforms in single-cell versus bulk RNA sequencing from the same tumor samples51,53. This type of advanced reference data is yet to be generated for most tumor tissues, due to experimental challenges.
Reference data provided by the tool.
From the users’ perspective, not having to develop and provide appropriate reference data is a desirable feature of a deconvolution tool. Although this is not possible for many tools, some do provide such utility (annotated methods in Table 1). This useful feature comes with a tradeoff that the internal reference data can generate biased results when they are applied to bulk data that do not align with the key model assumptions. There are two main types of internal reference data: marker genes and gene expression levels for major immune cell types27, and histology-type specific single-cell and bulk RNA-seq paired benchmark data51,53.
Reference-based deconvolution methods
Reference-based deconvolution is a widely used strategy for analyzing complex mixtures of cell types in heterogeneous tumor tissue samples. This approach relies on predefined reference data over a complete list of cell types, in the form of a gene-by-cell-type expression matrix [G] that captures the expression patterns of distinct cell types within the bulk tumor samples to be deconvolved (Table 1). Earlier work utilized genes uniquely and abundantly expressed in one cell type at a time36, and later used bulk-derived references created from bulk sequencing of purified or sorted cells27,37,38. More recent work is focused on using scRNA-seq-derived references, which are created by averaging gene expression across cells of the same cell type.
Many bulk reference-based deconvolution methods use constrained least squares regression (for example, TIMER39, quanTIseq40, and EPIC28) or support vector regression (CIBERSORT27). The advent of scRNA-seq enhanced the resolution of reference profiles, allowing the development of more refined deconvolution methods. Most single-cell reference-based methods29,41–45 employ weighted non-negative least squares [G] (wNNLS) with single-cell-based reference matrices to iteratively estimate cell-type proportions from bulk mixtures, differing in their preprocessing of the reference matrices and the weight functions in the wNNLS framework. Other modeling frameworks include hierarchical Bayesian models [G] (BayesPrism46, BayICE47, and CDseqR48), a deep learning framework trained on large-scale single-cell simulated bulk samples (Scaden49), and a guided topic modeling approach [G] (GTM-decon50). Guided topic modeling identified topics instead of cell types from scRNA-seq data, hence providing an attractive alternative that does not depend on manual cell-type annotations, albeit at a cost of losing some biological interpretability.
A crucial area for further development is addressing the inherent technical differences between bulk and scRNA-seq platforms. These differences lead to divergent gene expression measurements from the same set of cells, which undermines the accuracy of current single-cell-based deconvolution methods51–54. Recent efforts have specifically addressed the technical differences between scRNA-seq and bulk RNA-seq platforms. CIBERSORTx55 uses a batch correction approach called S-mode that transforms single-cell reference profiles to match the distributional properties of bulk expression data. SQUID52 leverages a linear transformation matrix42 derived from paired single-cell and bulk samples to enhance cross-platform compatibility. DeMixSC53 incorporates matched benchmark datasets where both single-cell and bulk sequencing are performed on the same tumor samples, enabling direct calibration of the deconvolution model to account for platform-specific biases. These approaches help mitigate the technical artifacts that would otherwise compromise deconvolution accuracy when integrating data across platforms.
While this Review primarily focuses on deconvolution methods that directly estimate cell-type proportions, we briefly note that gene set enrichment analysis (GSEA)-based approaches such as xCell37, ESTIMATE56 and MCPcounter57 provide a different perspective on cellular composition. Unlike regression-based deconvolution methods that estimate absolute proportions of cell types within each sample, GSEA-based methods measure the relative abundance of cell types across a group of samples. These methods compute enrichment scores using predefined gene sets characteristic of different cell types or tissue compartments. For instance, ESTIMATE56 provides stromal and immune scores that indicate the relative presence of these compartments across tumor samples, but cannot determine the absolute fraction of these components within any individual sample. A key drawback of enrichment methods is the lack of specificity: gene sets from one cell type or cell state can easily include genes shared by related cell types/states, potentially leading to ambiguous interpretations37,58,59. Because the gene sets are predefined, the enriched cell types may not include the nuanced cell subtypes of interest. On the other hand, GSEA-based methods circumvent the need to collect external reference data by directly utilizing pre-defined gene sets as their reference system, hence they are straightforward to run and can be useful in exploratory studies, large-scale cohort analyses, and when reference-quality single-cell data are unavailable for the tissue or cancer type of interest. These approaches are especially valuable for rapid assessment of immune infiltration patterns across heterogeneous datasets or when investigating understudied cancer types with limited characterized cell states. However, GSEA-based approaches may be less appropriate when precise quantitative estimates of cell-type proportions are required, when studying rare or novel cell subpopulations not captured in predefined gene sets, or when investigating tumor-specific cellular states that require tailored reference profiles.
Semi-reference-based deconvolution methods
Semi-reference-based deconvolution has become an invaluable tool in cancer research, addressing the specific challenges posed by tumor cell plasticity and heterogeneous tissue samples. Semi-reference-based methods are most useful when reference expression profiles for the complete list of cell types are unavailable, as they can work with incomplete or partial data. Specifically, ´partial´ in this context refers to gene expression profiles or marker gene lists that represent only a subset of the cell types or genes that would typically be included in a full reference profile. Several approaches have been developed to tackle this complex task: DeMix60 used an iterative heuristic search for maximum likelihood estimation. This was later superseded by DeMixT61 using a more principled Iterated Conditional Modes (ICM) [G] algorithm, iteratively refining estimates of cell proportions and gene expression profiles to maximize the likelihood of the observed data. ISOpureR62 applies a Dirichlet-multinomial mixture [G] model to estimate cancer cell proportions, maintaining accuracy even with varying levels of tumor purity. PREDE63 enhances cell-type specificity through a Non-negative Matrix Factorization [G] (NMF) framework, accounting for small variances in gene expression. DeCompress64 integrates linear subspace identification [G] with compressed sensing techniques [G] to parse complex cellular signals more effectively. BayICE47 further enhances the capabilities of semi-reference-based deconvolution by employing a hierarchical Bayesian model that simultaneously estimates cell-type proportions, expression profiles, and novel gene signatures. This approach addresses issues such as shift-invariance [G] to facilitate unbiased estimates of cellular components.
However, easing the data requirements for detecting novel biology comes with a major trade-off, leading to reduced model identifiability and exacerbating the curse of dimensionality. Both are well-known statistical challenges that destabilize parameter estimation. Therefore, some of these methods47,63 ignore the expression variation and fix cell-type expression at the mean level for the partial references, sacrificing accuracy for robust estimation. Others60,61 model both mean and variance of cell-type-specific expression levels and employ additional techniques to tackle these challenges, limiting to two or three components to avoid the parameter space becoming excessively large.
By leveraging bulk RNA-seq data alongside partial reference profiles, these methods may offer significant flexibility and cost-efficiency, allowing researchers to deconvolve both well-defined and uncharacterized cell types, thereby providing critical insights into the complex and dynamic nature of tumors where full reference profiles are often unavailable. The corresponding biological scenarios include investigation of highly variable tumor cells or fibroblasts.
Reference-free deconvolution methods
Reference-free deconvolution methods are desirable as they require no additional reference data for any cell types, hence in principle they are the most generalizable and widely applicable. So far, approaches that fall under this category focused on first identifying marker genes from the bulk expression data to be deconvolved and then using these markers to estimate cell-type-specific proportions and expression. The development of these approaches was pioneered early in the field, driven by the natural mathematical framework provided by NMF and the attractive potential for broad application without reference requirements. However, earlier methods (for example, deconf65, DeconRNASeq66) were limited to deconvolving one patient sample at a time, thereby lacking capability of borrowing strength across samples for unbiased parameter estimation. Early approaches are also too generalized to take in any prior biological knowledge. These disadvantages may contribute to a diminished interest in further developments while semi-reference and reference-based methods were shown to be effective in utilizing marker gene expression. More recently, however, with the successful adaptation of scRNA-seq studies67, NMF has been revisited for reference-free deconvolution. For example, Linseed68 leverages the mutual linearity of tissue-specific genes to uncover the topological space necessary for NMF deconvolution and TOAST69 builds on deconf65 but iteratively identifies cell-type-specific features and performs deconvolution. In addition to NMF, CAM70 uses convex analysis [G] to geometrically identify extreme points (vertices) of the data structure, which correspond to subpopulation-specific marker genes. Another example is DeClust71, which employs prior markers to determine the proportions of immune and stromal cells, before stratifying patients based on these cellular compartments.
While flexible and broadly applicable, reference-free methods, at the current time, often underperform compared to reference-based tools when suitable reference data is available72. Nevertheless, these methods remain valuable for exploratory studies and can be particularly useful in cancer types that lack well-characterized reference profiles, where their application may provide initial insights into cellular composition without relying on predefined reference data73.
Lessons from benchmarking studies
The unique challenges of cancer transcriptome deconvolution (Box 1) and correspondingly higher demands on the reference data generation (Box 2) require tissue- and context-specific benchmarking approaches to mimic the complex real-data scenarios as closely as possible. While there have been extensive benchmarking efforts (Fig. 3A) to evaluate the performance of deconvolution methods, only a handful of studies40,51,53,74,75 took the unique features of cancer tissues into consideration. Other non-cancer-specific benchmarks, for example, based on peripheral blood mononuclear cells (PBMCs) or cell-line mixtures without cancer cells (Box 1), may inform what the worst performing model types under the specific data scenario are, but limited inference can be made for future applications to cancer studies. Overall, creating a gold-standard dataset for evaluating deconvolution methods in cancer is a major unresolved challenge, requiring a compromise of transferability to real-data scenarios for experimental feasibility (Fig. 3A).
Figure 3. Lessons from benchmarking studies.
(A) Overview of the current benchmarking designs. Current experimental designs to generate benchmark datasets for evaluating deconvolution methods vary largely in terms of the technical difficulty in data generation as well as the transferability of their conclusions to real data scenarios. Simulated mixtures (in silico) such as those generated by computational aggregation of single-cell RNA sequencing (scRNA-seq) samples provide convenient but oversimplified conditions. Cell line mixtures (Experimental approach) with predefined proportions of cultured cell lines or purified primary cells provide controlled yet biologically relevant complexity. Transferability to real data depends on the complexity of the design and what type of cell lines are being mixed. Experimental approaches offer increasing biological realism. Bulk RNA-seq paired with immunohistochemistry (IHC) or flow cytometry (both experimental approaches) offers cell composition inferences from adjacent sections (IHC) or flow cytometry of the bulk sample. Matched bulk and single-cell RNA-seq data derived from one aliquot of the same tissue sample captures the full biological and technical complexity, enabling benchmarking and providing robust assessment. (B) To benchmark semi-reference-based methods, we recommend using cell line mixtures with controlled ratios, enabling precise quantification of both cell counts and RNA content. This dual measurement approach allows evaluation of cell count proportions and cell transcript proportions using the methods of interest. (C) For benchmarking reference-based methods, we recommend generating matched data from tissue samples that are dissociated into single-cell or single-nucleus suspensions and divided into two aliquots. Both undergo identical poly(A) mRNA enrichment protocols. This matched approach minimizes confounding effects from tissue digestion differences and technical biases, enabling simultaneous benchmarking and reference construction. (D) Recommended methods based on previous independent benchmarking studies, whose designs were limited to evaluating reference-based deconvolution methods.
Building on studies using the current benchmarking designs51–53,75,76 in balancing between real-data applicability and experimental feasibility, insights into the performance of deconvolution methods can be obtained across a spectrum of validation frameworks77. For semi-reference-based methods, controlled cell-line mixtures serve as a reasonable benchmarking strategy for its ability to distinguish between cell-type proportions and cell-type-specific RNA transcript proportions (Fig. 3B). In DeMixT61 and its application to pan-cancer datasets75, this approach was employed to accurately assess method performance with known cellular ratios and to measure tumor-cell transcript outputs. For reference-based methods, a reasonable and achievable approach involves generating matched bulk and single-cell data from the same dissociated cell suspension using identical library preparation protocols (Fig. 3C). The paired bulk and scRNA-seq design enables more accurate benchmarking with aligned ground truth and enhances deconvolution performance through the use of context-matched reference profiles. This approach, implemented by Guo et al.53 (as part of reference data) in retina, by Hippen et al.51 in ovarian cancer tissues, and most recently in human prefrontal cortex78, minimizes confounding effects from tissue digestion differences and technical biases between platforms. Finally, current benchmarking efforts reveal that reference-free methods underperform by 15-25% compared to reference-based approaches in cancer applications, while semi-reference-based methods lack sufficient independent validation studies79. We therefore provide recommendations for reference-based methods where multiple independent benchmarking studies, including cancer-specific evaluations, have been performed (Fig. 3D).
More efforts on designing and generating cancer-specific benchmarking datasets is needed to gain further insights across all categories of methods54. The ideal situation of having one perfect benchmarking study may not be attainable due to the exceptionally high transcriptional variability across cancer tissues (Box 1). Therefore, interpreting model output under a biological context, with or without the ideal benchmarking, becomes ever more important.
A framework for deconvolution in cancer
Different biological or clinical questions drive the diverse transcriptome deconvolution needs of researchers. Using our systematic decision path (Fig. 4), researchers can navigate key checkpoints to select appropriate deconvolution strategies based on data availability, model assumptions, and specific research objectives.
Figure 4. A guide to cancer transcriptomic deconvolution: from study design to method selection.
(A) A decision tree guiding the selection of an appropriate transcriptomic analysis strategy based on available data. When internal single-cell RNA sequencing (scRNA-seq) data are accessible, researchers can directly investigate novel cell states, assess cell-type composition across conditions, or identify cell-type-specific signature genes, optionally integrating with bulk RNA-seq data if statistical power is insufficient or for external validation. Without internal scRNA-seq data, hypotheses can be generated using differential expression analysis (DEA) or gene set enrichment analysis (GSEA) from bulk RNA-seq cohorts, potentially augmented by publicly available scRNA-seq datasets. Subsequently, bulk RNA-seq deconvolution can be performed, and the analysis can be directed towards either a comprehensive characterization of the tumor microenvironment (TME) or a tumor cell-specific characterization. To validate TME-specific signals, researchers often employ techniques like immunohistochemistry (IHC) and cytometry by time-of-flight (CyTOF). To validate tumor cell intrinsic properties, such as stem-like cell states, researchers often use in vitro assays (for example, genetically modified cell lines) and in vivo models such as patient-derived xenografts (PDX) or genetically engineered mouse models (GEMMs). (B) A decision tree for selecting appropriate deconvolution method once the need for deconvolution has been established. For TME cell analysis (blue shading), single-cell reference-based methods are recommended, with its accuracy dependent on the availability of tissue-matched bulk or single-cell data. For tumor cell analysis (red shading), reference-based methods, semi-reference-based methods, or reference-free approaches are available depending on data type:. Semi-reference-based deconvolution methods are advantageous to study tumor intrinsic activities as they impose no constraints with tumor cell expression profiles. Hence, even when scRNA-seq data are available, semi-reference-based methods should also be applied. Other more nuanced tumor-specific signals will likely be discovered through reference-free approaches. Finally, the availability of other measurements of the tumor samples to benchmark cell-type specific signals may be accounted for when prioritizing the deconvolution strategy.
After applying suitable deconvolution methods to bulk data, researchers can extract insights from the primary cellular or molecular data (for example, TME composition, tumor subtypes, and tumor-specific expression profiles) and then perform downstream analysis for clinical or biological investigations (e.g., prognostic biomarkers and treatment response predictors). The following sections explore these applications in detail, demonstrating how deconvolution bridges fundamental cancer biology and clinical decision-making.
Estimation of tumor microenvironment cellular composition
The TME is a complex interplay of immune cells, stromal cells, and extracellular matrix components that influence tumor progression, immune evasion, and response to therapy80. Deconvolution methods offer powerful tools to dissect this complexity and quantify the cellular composition of the TME.
Immune profiling through deconvolution is one of the most advanced applications in TME studies. Immune cells are characterized by relatively small transcriptomes and distinct gene expression patterns, reflecting their specialized functions81–83. For example, CD3D and CD3E are highly expressed in T cells, CD19 and membrane spanning 4-domains A1 (MS4A1, encoding CD20) in B cells, CD68 and CD163 in macrophages, and ITGAX (also known as CD11c) in dendritic cells. More detailed marker genes include CD8A and CD8B and granzyme B (GZMB) for cytotoxic T cells, CD4 and interferon gamma (IFNG) in helper T cells, forkhead box P3 (FOXP3) and cytotoxic T-lymphocyte associated protein 4 (CTLA4) for regulatory T cells, and CD38 and syndecan 1 (SDC1, also known as CD138) for plasma cells84. The application of reference-based deconvolution (Table 1) for immune profiling has been successful since its inception in 2016, where bulk expression profiles of individual immune cell types extracted from PBMCs have been used as reference datasets27,39. Dissecting specific immune cell types has provided critical insights into how various immune cells influence tumor behavior. For instance, using transcriptomic deconvolution approaches including CIBERSORT and immune gene expression signatures, six distinct immune subtypes across 10,000 tumors from 33 cancer types were identified, revealing the complexity of immune landscapes within tumors85. Using CIBERSORT27 to analyze tumor-associated leukocytes across a broad spectrum of human cancers, a pan-cancer map was constructed that identified specific immune cell populations86. This study highlighted the distinct roles of tumor-associated neutrophils and plasma cells in the tumor microenvironment.
Understanding the dynamics between tumor, immune, and stromal components is of essential importance, but it is more difficult to quantify these components jointly. This complexity arises from the large transcriptome size and high transcriptional variability observed in fibroblasts, endothelial cells, and epithelial cells75. Tumor cell plasticity further introduces considerable variation (Box 1) in tumor-cell-specific expression, often violating the core assumption of reference-based methods that a mean reference profile reliably represents cell-type specific transcriptional activity. In contrast, semi-reference-based and reference-free methods have provided a more adaptable strategy for deconvolving these complex components, as they allow the tumor cell component to remain unknown, offering greater flexibility in analyzing stromal elements (Table 1). For example, DeMixT61 was successfully applied to head-and-neck squamous cell carcinoma samples from the Cancer Genome Atlas projects (TCGA) to estimate proportions of tumor, stromal, and immune components using only normal tissues as reference for stromal cells, while keeping the tumor component unconstrained. Similarly, BayICE47 was successfully applied to deconvolve non-small-cell lung cancer samples by using reference profiles for immune cells and normal lung tissue, while leaving the highly variable tumor component unconstrained. These approaches demonstrate how semi-reference-based methods can overcome the limitations of reference-based approaches when analyzing the full complexity of the TME.
With the increasing availability of scRNA-seq data, researchers can now employ a two-step strategy: first analyzing scRNA-seq data to identify and annotate the major cell types present in a specific tissue and then using these annotated cell types to create reference profiles for deconvolution of bulk RNA-seq data from the same tissue type (Table 1). This approach enables the identification of tissue- and biology-relevant cell subtypes and provides more reasonable reference profiles for the matched tumor cells. In a study on pancreatic ductal adenocarcinoma, scRNA-seq was leveraged to pinpoint a distinct interleukin 1β (IL-1β)+ macrophage subpopulation marked by elevated expression of inflammatory genes such as IL1B, tumor necrosis factor (TNF), and CXC motif chemokine ligand 8 (CXCL8)87. By applying CIBERSORTx55 to deconvolve bulk transcriptomic datasets from pancreatic adenocarcinoma using single-cell annotations as a reference, the prevalence of this macrophage subtype was found to substantially contribute to the inflammatory landscape of the tumors.
Spatial transcriptomic technologies provide an important dimension to TME profiling by offering spatially resolved gene expression, enabling the investigation of the spatial organization and interactions between diverse cell types within the complex tumor architecture (Box 3). Deconvolution approaches for spatial transcriptomic data are valuable as they can reveal cellular composition and functional states while preserving crucial information about physical locations and proximity relationships between different cell populations. Advances in this rapidly evolving field will further enhance our understanding of tumor heterogeneity, cell–cell communication networks, and response to therapies in a native spatial context.
Box 3.
Integrating spatial context: deconvolution in spatial transcriptomics
While bulk RNA-seq deconvolution supports dissecting TME components at the patient level, recent advances in spatial transcriptomics [G] provide powerful tools to investigate the spatially-defined transcriptome197. Among the available platforms with high transcript throughput, several of the widely adopted spatial platforms (including the currently popular 10X Visium platform198,199) have multicellular resolution, where each bulk spatial unit, or ´spot´, contains multiple cells. Thus, for these data, computational tools for spot-level deconvolution to resolve the cellular composition within each spot are needed to refine the analysis of spatial heterogeneity in cellular architecture and tumor ecosystems54. These methods will facilitate the integration of spatially-resolved phenotypes with clinical outcomes, ultimately providing a more comprehensive understanding of the TME200.
Extending from bulk deconvolution methods, tools for spatial transcriptomics spot deconvolution require a bespoke design for sparse, spatially resolved data. These methods can be similarly categorized based on reference requirements201 and share similar methodological strengths and weaknesses (Figure). The computational frameworks underlying these approaches include probabilistic models that use statistical distributions to quantify uncertainty in cell-type assignments, NMF that decomposes gene expression matrices while maintaining biological interpretability, linear regression approaches that model spot expression as weighted combinations of cell-type profiles, and deep learning methods that capture complex non-linear relationships between expression patterns and cellular composition.
Reference-based approaches utilize scRNA-seq data to deconvolve spots into fractions of different cell types202 or map single cells to individual spots203. Semi-reference-based204 and reference-free205 approaches focus on cell fraction deconvolution when reference data are limited or incomplete[Au: Please expand the text a bit describing the “Applications” shown in the Figure. In the current version, it is not becoming clear whether ´Immune infiltrated´ is better assessed with reference-based approaches or if this can be assessed with either approach. Please specify. Given the nascency of this field, limited biological discoveries using these methods have been reported. In one notable example, Ma et al.32 uncovered the distribution of subtypes of cancer-associated fibroblasts and their role in shaping an immunosuppressive TME. However, most studies utilizing this approach remain exploratory with at least a 10-fold smaller sample size at the patient level, indicating that the field is still in the early stages of direct clinical translation206,207. Integrating spatial transcriptomics data with data rich in sample size and clinical information, e.g., bulk RNA-seq, may offer broader applications. For a more comprehensive exploration of spot-deconvolution technologies, benchmarking, and applications, readers are directed to recent reviews151,200,208–211.
Identifying subtype-specific features through deconvolution
Deconvolution techniques can be employed to stratify tumors into distinct subtypes, driven by either intrinsic tumor cell properties or by interactions between tumor cells and their microenvironment. While traditional clustering of molecular profiles across patients for subtyping has been very successful88,89 — exemplified by methods like the well-established breast cancer subtyping method prediction analysis microarray 50 (PAM50)90 and consensus molecular subtypes (CMS)91 for colorectal cancer — deconvolution offers the potential to uncover new subtypes through more granular analysis of cellular composition that is often masked in bulk expression profiles.
To subtype tumors based on the TME, or to identify tumor ecosystem subtypes, methods that accurately quantify the complex cellular interactions within the TME are essential. Reference-based methods, such as CIBERSORT27 and its single-cell-based version, CIBERSORTx55, are widely used for estimating cell-type proportions using cell-type-specific gene signatures. CIBERSORTx55 offers additional capabilities, including the identification of multicellular patterns, referred to as ´ecotypes´ within tumors. These ecotypes are recurring combinations of cell types and states that represent unique cellular environments that correlate with specific tumor subtypes and biological behaviors.
For instance, applying CIBERSORTx55 and DWLS43 to breast cancer samples identified nine ecotypes, each defined by distinct combinations of tumor, immune, and stromal components92. Ecotype 3 was characterized by a predominance of basal-like and highly proliferative tumor cells, suggesting a more aggressive tumor phenotype, while Ecotype 4 featured substantial infiltration of anti-tumor immune cells, indicating a more immune-active tumor microenvironment92. [Au: I wonder if these results can be connected to PAM50. Does it improve breast cancer subtyping? These findings underscore how deconvolution can reveal tumor subtypes based on their cellular ecosystem.
While deconvolution can reveal novel tumor ecosystem subtypes, their identification typically serves as a steppingstone toward clinical applications. Most studies correlate these cellular composition-based subtypes with patient prognosis and treatment response87,93, as tumor architecture directly influences biological behavior and therapeutic vulnerability.
Extraction of tumor cell RNA expression levels
Identifying cancer-intrinsic features requires tools that specifically output tumor-specific expression profiles, separating tumor signals from the complex mixture of stromal and immune components. Methods such as DeClust71, DeMixT61, ISOpureR62, CIBERSORTx55, and BayesPrism46 are particularly useful in this regard, as they separate cancer cell RNA expression levels from confounding non-cancerous cell expression levels (Table 1), thereby reducing inter-sample heterogeneity and increasing statistical power.
Deconvolution methods that output tumor-specific RNA expression typically produce three types of data: total tumor mRNA abundance, representing the overall transcriptional activity of malignant cells; patient-specific gene-level expression profiles, showing individual tumor expression patterns; and population-average gene-level expression profiles, representing the mean tumor expression patterns across samples. These different outputs serve complementary purposes in cancer research. While methods like DeMixT61 focus primarily on estimating tumor mRNA proportions (the fraction of total mRNA derived from tumor cells, rather than cellular proportions), with tumor-specific expression for each patient as a secondary output. ISOpureR62 is optimized for population-average tumor expression profiles only. CIBERSORTx55 and BayesPrism46 provide both patient-specific and population-average expression estimates across various cell types. Table 1 shows additional methods with diverse statistical approaches, each offering different advantages depending on the research question.
This capability enables identification of distinct molecular signatures corresponding to established tumor subtypes, providing a more accurate representation of tumor-intrinsic characteristics. For example, applying ISOpure62 to bulk gene expression data from breast cancer samples separated the gene expression profiles of tumor cells from those of non-tumor cells94. The study discovered that the PAM50 genes90 exhibited different expression patterns between tumor-specific and non-tumor mRNA profiles, with tumor-specific mRNA profiles providing clearer subtype separation.
A method to determine tumor-specific mRNA expression levels (TmS), an application extended from DeMixT61, revealed that higher tumor-specific transcript expression are associated with epithelial-to-mesenchymal transition, stemness signatures, hypoxia, active metabolic pathways, increased abundance of open-chromatin, and more frequent TP53 driver mutations, whereas tumor subtypes with low TmS scores display less aggressive features across cancers75. By isolating tumor-specific expression profiles, the confounding effects of non-tumor components are minimized, offering a more precise representation of tumor biology that can reveal previously masked associations with genomic alterations, biological pathways, and clinical outcomes.
Predicting patient outcomes and treatment responses
Building upon the insights generated through deconvolution, approaches for predicting patient outcomes and treatment responses can be developed, enabling the translation of these insights into clinically actionable information. This represents one of the major applications that deconvolution is uniquely suited for.
Specific cell-type proportions within tumors, as determined by deconvolution, serve as robust prognostic markers. For example, using TmS75, with expression data further adjusted for tumor purity and ploidy, high tumor-specific total mRNA expression is shown as a key marker associated with tumor cell plasticity and disease progression. Additionally, using CIBERSORT27, higher proportions of CD8+ T cells were linked to improved survival across multiple cancer types, and using PREDE63, higher macrophage proportions were found to correlate with worse survival in breast and bladder cancers95. Moreover, the neutrophil-to-lymphocyte ratio, as calculated by CIBERSORT27, has been consistently associated with poor prognosis across various cancer types86. Some deconvolution tools also extend these analyses beyond cell proportions to include variations in phenotypic cell states. For instance, using BayesPrism, shifts in macrophage states that correlate with clinical outcomes in cutaneous melanoma and glioblastoma were identified46.
The integration of single-cell discovery with bulk validation through deconvolution has emerged as a powerful translational approach that leverages the high resolution of single-cell data while overcoming its limited sample size. As mentioned earlier, a novel IL-1β+ macrophage population was first identified through scRNA-seq analysis, and its presence and clinical relevance was then validated in larger bulk cohorts using CIBERSORTx87. This macrophage subtype, characterized by elevated expression of inflammatory genes (IL1B, TNF, CXCL8), was shown to significantly contribute to the pathogenic inflammatory landscape of pancreatic tumors, establishing it as a potential therapeutic target for reducing tumor-promoting inflammation. Similarly, in hepatocellular carcinoma, unique ´onco-fetal´ cell populations, including EPCAM+ onco-fetal hepatocytes and PDGFRA+ onco-fetal fibroblasts were identified93. Through deconvolution of bulk hepatocellular carcinoma datasets, a higher proportion of these onco-fetal cells was found to correlate with an increased risk of relapse and poorer response to PD-1 blockade immunotherapy, emphasizing the potential of scRNA-seq-based deconvolution in uncovering clinically relevant tumor subtypes.
While immunotherapy has transformed cancer treatment with remarkable responses in some patients, identifying those who will benefit remains a critical clinical challenge, as current response rates vary widely across cancer types. Deconvolution tools are particularly valuable in monitoring treatment response and predicting therapy outcomes, especially for immunotherapies where the TME composition directly impacts efficacy. CIBERSORTx55 has been used to define cell types that predict response to immunotherapy across different cancer types96–100, with multiple studies demonstrating that pre-treatment immune profiles can stratify responders from non-responders. Other deconvolution tools such as MIXTURE101, Kassandra102, and ProM103 have shown promise in predicting immunotherapy outcomes. A recent comprehensive transcriptome deconvolution analysis104 using non-negative least squares regression across 1,486 tumors from multiple cancer datasets, including TCGA cohorts and immune checkpoint blockade-treated patients, demonstrated that stromal — rather than cancer — cell expression of PDL1, CXCL9, and CXCL13 served as key determinants of immune checkpoint inhibitor response. This finding challenges the conventional focus on tumor cell biomarkers and highlights how deconvolution can reveal cell-type-specific contributions to treatment response by separating cancer and stromal cell expression signatures from bulk tumor transcriptomes.
Future directions
Transcriptomic deconvolution holds immense potential to further advance both cancer research and clinical practice. Progress in the field will depend on improving two factors: the translation of deconvolution tools into clinical applications and the continued refinement of the underlying methodologies.
Clinical application of deconvolution
Deconvolution methods are increasingly being integrated into clinical research as exploratory tools to understand tumor microenvironment dynamics and treatment responses. Currently, several clinical trials employ these approaches as secondary or correlative analyses across three main applications: (i) monitoring TME dynamics in response to treatment, such as in triple-negative breast cancer (NCT03979508105) and metastatic colorectal cancer (NCT06522919106); (ii) comparing genotypes at diagnosis and at relapse, as in early-stage follicular lymphoma (NCT05929222107); and (iii) profiling immune cell composition to inform treatment decisions, including a trial in colorectal cancer (NCT03827967108) and a prospective analysis of acute myeloid leukemia and sarcoma (NCT06764589109).
Among these trials, only CIBERSORT27 and CIBERSORTx55 were explicitly specified (NCT03979508, NCT05929222, NCT03827967105,107,108). While currently serving as secondary measures, the clinical application of deconvolution tools demonstrates growing potential for enhancing clinical decision-making, such as prognostic biomarker discovery and treatment stratification55,105,107,108), which likely reflects the nascent stage of clinical application of deconvolution tools. [Au: I wonder if you can expand a bit and provide details about the samples, methods and perhaps tumor types that are analysed as part of these clinical studies. I think the reader would also value your expert opinion on the challenges these trials face and how these might be addressed in the future (for which reason I recommend moving the discussion into the next main section entitled “Future directions”. Reply: Thank you for the suggestion. We have reorganized the section accordingly and revised to provide more details about methods and tumor types. However, the information available remains limited - for example, the sample type for deconvolution is not stated in the clinical trials, and only a subset of them specified the proposed deconvolution tool. We therefore added trial details to the best of our knowledge, provided a broad overview of challenges, and discussed the major one, applicable sample type (FFPE), in the next paragraph.] While currently serving as secondary outcome measures, the incorporation of deconvolution tools in these trials highlights both the growing recognition of their potential value and the existing barriers to clinical implementation. Key challenges include the lack of standardized protocols for different sample types, uncertainty regarding optimal method selection for specific tumor types and clinical questions, and the absence of validated clinical interpretation thresholds for deconvolution-derived metrics. The successful clinical translation of deconvolution tools will require addressing these implementation gaps through multi-disciplinary collaborative efforts: establishing standardized workflows, validating method performance across diverse clinical sample types, and establishing evidence-based guidelines for interpreting deconvolution results in clinical contexts. These efforts will be essential for transitioning these analytical tools from exploratory research to clinical decision support, enabling applications in prognostic biomarker discovery and treatment stratification.
One important ongoing step in adapting deconvolution methods for clinical use is extending their application to formalin-fixed paraffin-embedded (FFPE) samples, as this is the standard for storing clinical samples. Translating deconvolution approaches to clinical practice requires addressing practical tissue preservation challenges. FFPE samples constitute the vast majority of archived clinical specimens with long-term follow-up data, making them invaluable for translational cancer research. However, the fixation process causes RNA fragmentation, chemical modifications, and cross-linking, resulting in reduced fidelity of signals, decreased mapping ability, increased duplicate rates, 3’ bias in gene coverage, and significantly different transcriptomic profiles compared to fresh-frozen tissues110–112, complicating the deconvolution process. The developers of several deconvolution methods like Kassandra102, CIBERSORT27 and CIBERSORTx55 emphasized the potential application of these techniques to FFPE samples. The performance of CIBERSORT and CIBERSORTx across tissue preservation techniques were evaluated in their original studies27,55. These approaches could be useful when fresh-frozen tissues are not readily available, especially for retrospective bulk RNA-sequencing. [Au: Was Kassandra also tested for its ability to be used in FFPE? Please specify. The deconvolution resolution may remain limited, since success has been primarily reported in deconvolving major cell types (such as leukocytes) in FFPE samples. While initial attempts at deconvolving FFPE-derived bulk RNA sequencing data in a cancer setting have been made113, independent benchmarking of deconvolution methods for FFPE samples remains limited to non-cancer settings114. As FFPE-derived bulk RNA deconvolution is notably affected by sample quality and is still at an early stage of development, optimization strategies are primarily focused on experimental design. For FFPE transcriptomic profiling, a DV200 score, which is the percentage of RNA fragments >200 nucleotides of ≥30% has been shown to be a reliable quality control metric for accessing RNA integrity115, and protocols with extended proteinase K digestion improve extraction yields116. Researchers are advised to use either direct hybridization platforms (for example, NanoString nCounter117,118) or FFPE-optimized library preparation kits (for example, Illumina TruSeq RNA Exome, Ion AmpliSeq119 and the 3’ RNA-seq method Lexogen QuantSeq120) to accommodate degraded RNA and enable reliable expression profiling.
Once FFPE specimen transcriptomes are stably and accurately profiled, deconvolution algorithms will still need to be benchmarked for FFPE and optimized to mitigate the technical differences across platforms and tissue preservation techniques52,53,55. Studies have demonstrated that batch-correction methods like ComBat121 and limma122 can successfully integrate data across both sequencing technologies and sample types123, enabling meta-analysis of archival and fresh-frozen tissue data, while the effectiveness of batch correction may depend on FFPE sample quality metrics like DV200. Future work may involve building error models that are trained on paired fresh-frozen and FFPE samples124, or integrating molecular barcoding techniques that mitigate degradation effects114,125. Successfully adapting deconvolution for FFPE data would enable retrospective studies on the vast amount of available samples associated with long-term clinical outcomes and their incorporation into clinical workflows, offering deeper insights into disease progression and treatment response.
The incorporation of machine learning into deconvolution methodologies represents another promising direction for enhancing both accuracy and interpretability, particularly for challenging sample types like FFPE. Machine learning models can identify complex, nonlinear patterns in gene expression data that traditional approaches might not capture. Integrating machine learning into deconvolution workflows could substantially improve analytical precision and uncover new biomarkers or therapeutic targets. For example, when sufficient training data is available, machine learning can directly predict cell-type-specific outputs from raw mixed data, bypassing traditional deconvolution steps49,102. Several existing deconvolution methods leverage machine learning techniques, including the supervised learning framework in Kassandra102, ensemble deep learning in Scaden49, and tissue-specific neural networks like GBMPurity126 for specialized applications. Emerging approaches include graph neural networks for cross-platform compatibility127 and uncertainty quantification methods for improved reliability128, with substantial opportunities remaining for novel architectures and training strategies. Additionally, such models could be particularly valuable for FFPE samples, where they might learn to correct degradation-specific artifacts. However, machine learning applications present inherent risks, including overfitting, reliance on poorly annotated training data, and the ‘black box [G]’ nature of complex models129. Careful validation and development of transparent, explainable models will be essential to fully harness the potential of machine learning in cancer biology.
Methodological improvements
Advancing the technical capabilities of deconvolution methods is crucial, particularly in the context of temporal dynamics within tumors. Tumors evolve over time in response to environmental pressures, immune surveillance, and therapeutic interventions and traditional deconvolution methods provide only a static snapshot of these processes. By integrating time-series data, temporal deconvolution models could reveal how cell populations within tumors change over time, offering insights into the development of treatment resistance. However, this approach presents substantial methodological challenges, including the need to manage nonlinear dynamics and the inherent variability of longitudinal data. Addressing these challenges will be essential for accurately capturing the temporal evolution of tumors, thereby refining the precision of deconvolution analyses.
There is a long-standing interest in advancing deconvolution techniques to resolve cell-type specific gene expression for each patient sample. Most available methods provide mean expression profiles for cell types (Table 1), which can mask critical sample variation caused by intra-cell-type heterogeneity, a well-known feature of multiple cell types in tumor tissues. This limitation has already driven the development of advanced methods capable of estimating cell-type-specific expression at the patient-level and identifying statistically significant differentially expressed genes within these subpopulations, such as BayesPrism46, CIBERSORTx55, DeMix60, and DeMixT61. Future efforts are needed to refine these methods, in order to uncover cell populations with variably expressed genes that play crucial roles in treatment resistance or metastasis. Achieving this level of precision in deconvolution analysis will require novel computational approaches that balance the need for detailed resolution with the necessity of producing reliable, biologically meaningful insights. As these techniques evolve, ensuring that they elude common pitfalls, such as overfitting or misinterpretation, will be essential to their successful implementation in cancer research.
Further potential for advancement of deconvolution methods lies in the incorporation of biological principles that have not yet been fully explored. Specifically for deconvolution of cancer samples, somatic copy number alterations are an important driver of cancer cell gene expression130, and this information-rich signal is still relatively unexplored in existing deconvolution methods. Additionally, the emergence of atlases incorporating methylation or assay for transposase-accessible chromatin with sequencing (ATAC-seq) data presents an exciting opportunity to integrate transcriptomic deconvolution with epigenetic data131. Similar to tools like epiSCORE132, able to deconvolve bulk methylation from scRNA-seq, there is growing potential to leverage epigenetic data for the deconvolution of transcriptional profiles. Overall, approaches that integrate additional data types (such as DNA-seq or epigenetic data) with RNA-seq data to identify synergistic signals are likely an important avenue for further deconvolution method development75,133.
Expanding the scope of deconvolution methods to include the full diversity of RNA species offers both technical challenges and clinical opportunities. Human cells contain tens of thousands of distinct RNA molecules, and current single-cell sequencing methods do not comprehensively capture the diversity of non-coding RNAs, such as long non-coding RNAs (lncRNAs), miRNAs, and circular RNAs (circRNAs)131. Hence, extending deconvolution techniques to these RNA species could provide a more comprehensive understanding of the regulatory networks driving tumor development. However, these RNA species present unique challenges, such as diverse expression patterns and stability, complicating their analysis. Recent technological advances such as Smart-seq-total134 and parallel single-cell small RNA sequencing (PSCSR-seq)135 can overcome some of these limitations and may be used to create robust reference sets for non-coding and small RNA species. Developing specialized reference datasets and analytical frameworks will be critical for accurately deconvolving these RNA types, thereby enhancing our understanding of their roles in tumor biology and translating these insights into clinical biomarkers and therapeutic targets.
Conclusions
With more than 40 methods published over the past 15 years, the progress in the field of bulk transcriptomic deconvolution shows no sign of slowing down. Building upon the notable amount of work in the field, we have developed a guide for cancer researchers to navigate through many possible routes of deconvolution and maximally utilize bulk RNA-seq data. In the context of the TME, researchers primarily interested in understanding the immune environment in a broad sense can readily enjoy advances made by many reference-based deconvolution methods. Those who are interested in the dynamic ecosystem of tumor-stromal-immune cells may utilize either single-cell reference-based or semi-reference-based methods, depending on the specific question, available prior knowledge, and experimental means for validation for the cancer type of interest. In the context of patient subtyping or prediction of prognosis or response to treatment, although a birds-eye-view ecosystem characterization can be informative, there remain significant gaps in knowledge to be filled to accurately characterize tumor and stromal (that is, non-immune) compartments through deconvolution. These questions are currently best addressed using semi-reference-based or specially designed single-cell reference-based methods. Reference-free methods are gaining popularity again, though their application to cancer research has yet to gain momentum.
The future of transcriptomic deconvolution in cancer research is full of potential and realizing this potential will require addressing substantial remaining technical and conceptual challenges, particularly in the integration of other data modalities in a question-specific fashion. Major efforts are needed to generate biologically meaningful reference and benchmark data. By refining existing methods, developing new approaches, and thoughtfully considering the balance between innovation and practicality, the field will continue to advance our understanding of cancer biology, ultimately leading to more effective and personalized cancer therapies.
Supplementary Tables
Glossary
- Deconvolution
A computational method that separates bulk tissue RNA sequencing data into individual cell type contributions by estimating the proportion and gene expression of each cell type present in the sample.
- Expression matrix
A data structure where rows represent genes and columns represent samples, with each cell containing the measured expression level of a specific gene in a specific sample.
- Non-negative least squares
A mathematical optimization technique that finds the best-fitting solution to a linear equation system while constraining all values to be zero or positive, commonly used in deconvolution to ensure biologically meaningful cell proportions.
- Bayesian models
Statistical frameworks that incorporate prior knowledge or assumptions about parameters and update these beliefs based on observed data to make probabilistic inferences.
- Pseudo-bulk
measurement data from summing over expression counts per cell across all cells for each gene and each sample.
- Guided topic modeling approach:
A machine learning method adapted from natural language processing that identifies topics (cell types) in documents (samples) by discovering patterns of co-occurring words (genes).
- Iterated Conditional Modes (ICM)
An optimization algorithm that iteratively updates each parameter set while keeping others fixed, commonly used to find maximum likelihood estimates in complex statistical models.
- Dirichlet-multinomial mixture
A probabilistic model that combines Dirichlet distributions (which model proportions) with multinomial distributions (which model counts) to handle over-dispersed count data.
- Non-negative Matrix Factorization
A mathematical technique that decomposes a matrix into two or more matrices with non-negative elements, often used to identify hidden patterns or reduce dimensionality in data.
- Linear subspace identification
A mathematical approach that finds lower-dimensional linear spaces within high-dimensional data that capture the essential variation or structure.
- Compressed sensing techniques
Mathematical methods that reconstruct signals or data from fewer measurements than traditionally required by exploiting sparsity or structure in the data.
- Shift-invariance
A mathematical property where the output of a system remains unchanged when the input is shifted in time, space, or another dimension.
- Convex analysis
A branch of mathematics dealing with convex sets and functions, providing theoretical foundations for optimization problems with guaranteed global solutions.
- Black box
A computational method or algorithm where the internal workings are not transparent or easily interpretable, only the inputs and outputs are observable.
- Phenotypic plasticity
The ability of cells to change their observable characteristics, gene expression patterns, or functional states in response to environmental conditions without altering their genetic code.
- Capture efficiency
The proportion of RNA molecules from a cell that are successfully captured, reverse-transcribed, and sequenced during single-cell RNA sequencing experiments.
- Dropout effects
The phenomenon in single-cell RNA sequencing where genes that are expressed in a cell fail to be detected, resulting in false zero counts in the expression data.
- Multicollinearity
A statistical condition where two or more predictor variables in a model are highly correlated, making it difficult to determine the individual contribution of each variable.
- LASSO regularization
A statistical technique that adds a penalty term to regression models to prevent overfitting by forcing some coefficients to become exactly zero, effectively performing feature selection.
- Background noise
Random variation or systematic artifacts in measurement data that are not related to the biological signal of interest, providing positive baseline measurements when no biological signals are present.
- Spatial transcriptomics
A technology that measures gene expression while preserving information about the physical location of cells or tissue regions, enabling analysis of spatial organization and cell-cell interactions.
Acknowledgements
P.V.L. is a CPRIT Scholar in Cancer Research and acknowledges CPRIT grant support (RR210006). C.C. and P.V.L. were supported by the Francis Crick Institute, which receives its core funding from Cancer Research UK (CC2008), the UK Medical Research Council (CC2008), and the Wellcome Trust (CC2008). Y.D., M.D.M. and W.W. are supported by NCI (R01CA286380). S.G. and W.W. are supported by the U.S. Department of Defense (PC210079).
Footnotes
Competing interests
The authors declare no competing interests.
Author contributions
W.W. conceived the review scope and framework, an outline of the content and the figures. All authors conducted comprehensive literature searches on deconvolution methods and applications. Y.D. contributed to all sections, while S.G. contributed to the single-cell-based deconvolution section, M.D.M. contributed to the semi-reference-based methods section, C.C. contributed to the reference-free methods part, and Y.P. contributed to the clinical study descriptions and FFPE sections. S.G. and Y.D. contributed to the design and creation of figures, while Y.D. and C.C. contributed to summarizing tables and designing the structure of the tables. All authors contributed to writing the article. W.W. and P.V.L. reviewed and edited the manuscript. W.W. and P.V.L. jointly supervised the project.
Peer review information [Au: This is a placeholder; this information will be added later in the publication process.]
Nature Reviews Cancer thanks [Referee#1 name], [Referee#2 name] and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Related links
cbioportal: https://www.cbioportal.org/
Gene Expression Omnibus (GEO): https://www.ncbi.nlm.nih.gov/geo/
Curated cancer cell atlas (3CA): https://www.weizmann.ac.il/sites/3CA/
Tumor immune Single-cell Hub 2 (TISCH2): http://tisch.comp-genomics.org/
References
- 1.Regev A, et al. The human cell atlas. eLife. 2017 doi: 10.7554/eLife.27041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Alwine JC, Kemp DJ, Stark GR. Method for detection of specific RNAs in agarose gels by transfer to diazobenzyloxymethyl-paper and hybridization with DNA probes. Proc Natl Acad Sci U S A. 1977;74:5350–5354. doi: 10.1073/pnas.74.12.5350. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Higuchi R, Fockler C, Dollinger G, Watson R. Kinetic PCR Analysis: Real-time Monitoring of DNA Amplification Reactions. Bio/Technology. 1993;11:1026–1030. doi: 10.1038/nbt0993-1026. [DOI] [PubMed] [Google Scholar]
- 4.Pardue ML, Gall JG. Molecular hybridization of radioactive DNA to the DNA of cytological preparations. Proc Natl Acad Sci U S A. 1969;64:600–604. doi: 10.1073/pnas.64.2.600. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.John HA, Birnstiel ML, Jones KW. RNA-DNA hybrids at the cytological level. Nature. 1969;223:582–587. doi: 10.1038/223582a0. [DOI] [PubMed] [Google Scholar]
- 6.Fulwyler MJ. Electronic Separation of Biological Cells by Volume. Science. 1965;150:910–911. doi: 10.1126/science.150.3698.910. [DOI] [PubMed] [Google Scholar]
- 7.Arrigucci R, et al. FISH-Flow, a protocol for the concurrent detection of mRNA and protein in single cells using fluorescence in situ hybridization and flow cytometry. Nat Protoc. 2017;12:1245–1260. doi: 10.1038/nprot.2017.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Schena M, Shalon D, Davis RW, Brown PO. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 1995;270:467–470. doi: 10.1126/science.270.5235.467. [DOI] [PubMed] [Google Scholar]
- 9.Perou CM, et al. Molecular portraits of human breast tumours. Nature. 2000 doi: 10.1038/35021093. [DOI] [PubMed] [Google Scholar]
- 10.Poulsen CB, et al. Microarray-based classification of diffuse large B-cell lymphoma. Eur J Haematol. 2005;74:453–465. doi: 10.1111/j.1600-0609.2005.00429.x. [DOI] [PubMed] [Google Scholar]
- 11.Brenner S, et al. Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat Biotechnol. 2000;18:630–634. doi: 10.1038/76469. [DOI] [PubMed] [Google Scholar]
- 12.Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10:57–63. doi: 10.1038/nrg2484. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Trapnell C, et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010;28:511–515. doi: 10.1038/nbt.1621. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Weinstein JN, et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet. 2013;45:1113–1120. doi: 10.1038/ng.2764. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Patel AP, et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science. 2014;344:1396–1401. doi: 10.1126/science.1254257. [This is one of the earlier works that uses single-cell RNA-seq to demonstrate a high degree of heterogeneity in both cell types and cell states within a primary cancer type] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Puram SV, et al. Single-Cell Transcriptomic Analysis of Primary and Metastatic Tumor Ecosystems in Head and Neck Cancer. Cell. 2017;171:1611–1624.:e24. doi: 10.1016/j.cell.2017.10.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Azizi E, et al. Single-Cell Map of Diverse Immune Phenotypes in the Breast Tumor Microenvironment. Cell. 2018;174:1293–1308.:e36. doi: 10.1016/j.cell.2018.05.060. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Papalexi E, Satija R. Single-cell RNA sequencing to explore immune cell heterogeneity. Nat Rev Immunol. 2018;18:35–45. doi: 10.1038/nri.2017.76. [DOI] [PubMed] [Google Scholar]
- 19.Klein AM, et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell. 2015 doi: 10.1016/j.cell.2015.04.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Macosko EZ, et al. Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets. Cell. 2015;161:1202–1214. doi: 10.1016/j.cell.2015.05.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Hanahan D. Hallmarks of Cancer: New Dimensions. Cancer Discov. 2022;12:31–46. doi: 10.1158/2159-8290.CD-21-1059. [DOI] [PubMed] [Google Scholar]
- 22.Hölzel M, Bovier A, Tüting T. Plasticity of tumour and immune cells: a source of heterogeneity and a cause for therapy resistance? Nat Rev Cancer. 2013;13:365–376. doi: 10.1038/nrc3498. [DOI] [PubMed] [Google Scholar]
- 23.Meacham CE, Morrison SJ. Tumour heterogeneity and cancer cell plasticity. Nature. 2013;501:328–337. doi: 10.1038/nature12624. [This is an earlier review of phenotypic and functional heterogeneity among cancer cells within the same tumor, with implications for cancer stem-cell models as well as therapy resistance and disease progression] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Pérez-González A, Bévant K, Blanpain C. Cancer cell plasticity during tumor progression, metastasis and response to therapy. Nat Cancer. 2023;4:1063–1082. doi: 10.1038/s43018-023-00595-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Luo H, et al. Pan-cancer single-cell analysis reveals the heterogeneity and plasticity of cancer-associated fibroblasts in the tumor microenvironment. Nat Commun. 2022;13:6619. doi: 10.1038/s41467-022-34395-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.DuPage M, Bluestone JA. Harnessing the plasticity of CD4+ T cells to treat immune-mediated disease. Nat Rev Immunol. 2016;16:149–163. doi: 10.1038/nri.2015.18. [DOI] [PubMed] [Google Scholar]
- 27.Newman AM, et al. Robust enumeration of cell subsets from tissue expression profiles. Nat Methods. 2015;12:453–457. doi: 10.1038/nmeth.3337. [This is a seminal study to characterize the fraction of multiple immune and non-immune cell types across multiple cancer types by leveraging bulk-based references] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Racle J, Gfeller D. In: Bioinformatics for Cancer Immunotherapy: Methods and Protocols. Boegel S, editor. New York, NY: Springer US; 2020. EPIC: A Tool to Estimate the Proportions of Different Cell Types from Bulk Gene Expression Data; pp. 233–248. [DOI] [PubMed] [Google Scholar]
- 29.Wang X, Park J, Susztak K, Zhang NR, Li M. Bulk tissue cell type deconvolution with multi-subject single-cell expression reference. Nat Commun. 2019 doi: 10.1038/s41467-018-08023-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Hanahan D, Weinberg RA. Hallmarks of cancer: The next generation. Cell. 2011 doi: 10.1016/j.cell.2011.02.013. [DOI] [PubMed] [Google Scholar]
- 31.Quail DF, Joyce JA. Microenvironmental regulation of tumor progression and metastasis. Nat Med. 2013;19:1423–1437. doi: 10.1038/nm.3394. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Ma C, et al. Pan-cancer spatially resolved single-cell analysis reveals the crosstalk between cancer-associated fibroblasts and tumor microenvironment. Mol Cancer. 2023;22:170. doi: 10.1186/s12943-023-01876-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Khaliq AM, et al. Spatial transcriptomic analysis of primary and metastatic pancreatic cancers highlights tumor microenvironmental heterogeneity. Nat Genet. 2024;56:2455–2465. doi: 10.1038/s41588-024-01914-4. [DOI] [PubMed] [Google Scholar]
- 34.Valdeolivas A, et al. Profiling the heterogeneity of colorectal cancer consensus molecular subtypes using spatial transcriptomics. Npj Precis Oncol. 2024;8:1–16. doi: 10.1038/s41698-023-00488-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Wang F, et al. Single-cell and spatial transcriptome analysis reveals the cellular heterogeneity of liver metastatic colorectal cancer. Sci Adv. 2023;9:eadf5464. doi: 10.1126/sciadv.adf5464. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Zhong Y, Wan Y-W, Pang K, Chow LML, Liu Z. Digital sorting of complex tissues for cell type-specific gene expression profiles. BMC Bioinformatics. 2013;14:89. doi: 10.1186/1471-2105-14-89. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Aran D, Hu Z, Butte AJ. xCell: digitally portraying the tissue cellular heterogeneity landscape. Genome Biol. 2017;18:220. doi: 10.1186/s13059-017-1349-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Monaco G, et al. RNA-Seq Signatures Normalized by mRNA Abundance Allow Absolute Deconvolution of Human Immune Cell Types. Cell Rep. 2019;26:1627–1640.:e7. doi: 10.1016/j.celrep.2019.01.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Li T, et al. TIMER: A Web Server for Comprehensive Analysis of Tumor-Infiltrating Immune Cells. Cancer Res. 2017;77:e108–e110. doi: 10.1158/0008-5472.CAN-17-0307. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Finotello F, et al. Molecular and pharmacological modulators of the tumor immune contexture revealed by deconvolution of RNA-seq data. Genome Med. 2019;11:1–20. doi: 10.1186/s13073-019-0638-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Aliee H, Theis FJ. AutoGeneS: Automatic gene selection using multi-objective optimization for RNA-seq deconvolution. Cell Syst. 2021;12:706–715.:e4. doi: 10.1016/j.cels.2021.05.006. [DOI] [PubMed] [Google Scholar]
- 42.Jew B, et al. Accurate estimation of cell composition in bulk expression through robust integration of single-cell information. Nat Commun. 2020;11:1971. doi: 10.1038/s41467-020-15816-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Tsoucas D, et al. Accurate estimation of cell-type composition from gene expression data. Nat Commun. 2019 doi: 10.1038/s41467-019-10802-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Fan J, et al. MuSiC2: cell-type deconvolution for multi-condition bulk RNA-seq data. Brief Bioinform. 2022;23:bbac430. doi: 10.1093/bib/bbac430. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Dong M, et al. SCDC: bulk gene expression deconvolution by multiple single-cell RNA sequencing references. Brief Bioinform. 2021;22:416–427. doi: 10.1093/bib/bbz166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Chu T, Wang Z, Pe’er D, Danko CG. Cell type and gene expression deconvolution with BayesPrism enables Bayesian integrative analysis across bulk and single-cell RNA sequencing in oncology. Nat Cancer. 2022;3:505–517. doi: 10.1038/s43018-022-00356-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Tai A-S, Tseng GC, Hsieh W-P. BayICE: A Bayesian hierarchical model for semireference-based deconvolution of bulk transcriptomic data. Ann Appl Stat. 2021;15:391–411. [Google Scholar]
- 48.Kang K, Huang C, Li Y, Umbach DM, Li L. CDSeqR: fast complete deconvolution for gene expression data from bulk tissues. BMC Bioinformatics. 2021;22:262. doi: 10.1186/s12859-021-04186-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Menden K, et al. Deep learning–based cell composition analysis from tissue expression profiles. Sci Adv. 2020;6:eaba2619. doi: 10.1126/sciadv.aba2619. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Swapna LS, Huang M, Li Y. GTM-decon: guided-topic modeling of single-cell transcriptomes enables sub-cell-type and disease-subtype deconvolution of bulk transcriptomes. Genome Biol. 2023;24:190. doi: 10.1186/s13059-023-03034-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Hippen AA, et al. Performance of computational algorithms to deconvolve heterogeneous bulk ovarian tumor tissue depends on experimental factors. Genome Biol. 2023;24:239. doi: 10.1186/s13059-023-03077-7. [This study provides the first benchmark dataset for high-grade serous ovarian tumors, guiding the generation of benchmark dataset for other cancer types] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Cobos FA, et al. Effective methods for bulk RNA-seq deconvolution using scnRNA-seq transcriptomes. Genome Biol. 2023;24:177. doi: 10.1186/s13059-023-03016-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Guo S, et al. A deconvolution framework that uses single-cell sequencing plus a small benchmark dataset for accurate analysis of cell type ratios in complex tissue samples. Genome Res. 2024:gr.278822.123. doi: 10.1101/gr.278822.123. [A recent method leveraging paired bulk and scRNA-seq data to enhance deconvolution accuracy in large patient cohorts] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Garmire LX, et al. Challenges and perspectives in computational deconvolution of genomics data. Nat Methods. 2024;21:391–400. doi: 10.1038/s41592-023-02166-6. [DOI] [PubMed] [Google Scholar]
- 55.Newman AM, et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat Biotechnol. 2019;37:773–782. doi: 10.1038/s41587-019-0114-2. [This is a significant advancement in the field of digital cytometry, enabling the estimation of cell-type-specific gene expression from bulk tissue RNA profiles without the need for physical cell isolation, using single-cell RNA-seq data instead] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Yoshihara K, et al. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat Commun. 2013;4:2612. doi: 10.1038/ncomms3612. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Becht E, et al. Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression. Genome Biol. 2016;17:218. doi: 10.1186/s13059-016-1113-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Sutton GJ, et al. Comprehensive evaluation of deconvolution methods for human brain gene expression. Nat Commun. 2022;13:1358. doi: 10.1038/s41467-022-28655-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Lin Y, et al. DAISM-DNNXMBD: Highly accurate cell type proportion estimation with in silico data augmentation and deep neural networks. Patterns. 2022;3:100440. doi: 10.1016/j.patter.2022.100440. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Ahn J, et al. DeMix: deconvolution for mixed cancer transcriptomes using raw measured data. Bioinforma Oxf Engl. 2013;29:1865–1871. doi: 10.1093/bioinformatics/btt301. [DeMix is one of the earliest methods performing semi-reference-based deconvolution and able to deconvolve cancer expression profiles] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Wang Z, et al. Transcriptome Deconvolution of Heterogeneous Tumor Samples with Immune Infiltration. iScience. 2018;9:451–460. doi: 10.1016/j.isci.2018.10.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Quon G, et al. Computational purification of individual tumor gene expression profiles leads to significant improvements in prognostic prediction. Genome Med. 2013;5:29. doi: 10.1186/gm433. [ISOPureR is one of the earliest methods to perform semi-reference-based bulk deconvolution, able to deconvolve cancer expression profiles and showing better association with survival than bulk data] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Qin Y, et al. Deconvolution of heterogeneous tumor samples using partial reference signals. PLOS Comput Biol. 2020;16:e1008452. doi: 10.1371/journal.pcbi.1008452. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Bhattacharya A, Hamilton AM, Troester MA, Love MI. DeCompress: tissue compartment deconvolution of targeted mRNA expression panels using compressed sensing. Nucleic Acids Res. 2021;49:e48. doi: 10.1093/nar/gkab031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Repsilber D, et al. Biomarker discovery in heterogeneous tissue samples -taking the in-silico deconfounding approach. BMC Bioinformatics. 2010;11:27. doi: 10.1186/1471-2105-11-27. [This is one of the earliest works introducing a computational approach to address tissue heterogeneity without reference, marking a key step in the evolution of transcriptomic analysis] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Gong T, Szustakowski JD. DeconRNASeq: a statistical framework for deconvolution of heterogeneous tissue samples based on mRNA-Seq data. Bioinformatics. 2013;29:1083–1085. doi: 10.1093/bioinformatics/btt090. [One of the earliest studies to perform reference-free deconvolution for cell fraction estimation from bulk RNA-seq data] [DOI] [PubMed] [Google Scholar]
- 67.Song D, Li K, Hemminger Z, Wollman R, Li JJ. scPNMF: sparse gene encoding of single cells to facilitate gene selection for targeted gene profiling. Bioinformatics. 2021;37:i358–i366. doi: 10.1093/bioinformatics/btab273. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Zaitsev K, Bambouskova M, Swain A, Artyomov MN. Complete deconvolution of cellular mixtures based on linearity of transcriptional signatures. Nat Commun. 2019;10:2209. doi: 10.1038/s41467-019-09990-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Li Z, Wu H. TOAST: improving reference-free cell composition estimation by cross-cell type differential analysis. Genome Biol. 2019;20:190. doi: 10.1186/s13059-019-1778-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Wang N, et al. Mathematical modelling of transcriptional heterogeneity identifies novel markers and subpopulations in complex tissues. Sci Rep. 2016;6:18909. doi: 10.1038/srep18909. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Wang L, et al. A reference profile-free deconvolution method to infer cancer cell-intrinsic subtypes and tumor-type-specific stromal profiles. Genome Med. 2020;12:24. doi: 10.1186/s13073-020-0720-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Jin H, Liu Z. A benchmark for RNA-seq deconvolution analysis under dynamic testing environments. Genome Biol. 2021;22:102. doi: 10.1186/s13059-021-02290-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Mohammadi S, Zuckerman N, Goldsmith A, Grama A. A Critical Survey of Deconvolution Methods for Separating cell-types in Complex Tissues. Proc IEEE. 2017;105:340–366. [Google Scholar]
- 74.Sturm G, et al. Comprehensive evaluation of transcriptome-based cell-type quantification methods for immuno-oncology. Bioinforma Oxf Engl. 2019;35:i436–i445. doi: 10.1093/bioinformatics/btz363. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Cao S, et al. Estimation of tumor cell total mRNA expression in 15 cancer types predicts disease progression. Nat Biotechnol. 2022;40:1624–1633. doi: 10.1038/s41587-022-01342-x. [This is the first study to utilize semi-reference-based deconvolution to study tumor cell plasticity, showing tumor-cell total mRNA abundance to be an important biomarker of disease progression across 15 cancer types] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Cobos F, Alquicira-Hernandez J, Powell JE, Mestdagh P, De Preter K. Benchmarking of cell type deconvolution pipelines for transcriptomics data. Nat Commun. 2020;11:5650. doi: 10.1038/s41467-020-19015-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Vathrakokoili Pournara A, et al. CATD: a reproducible pipeline for selecting cell-type deconvolution methods across tissues. Bioinforma Adv. 2024;4:vbae048. doi: 10.1093/bioadv/vbae048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Huuki-Myers LA, et al. Benchmark of cellular deconvolution methods using a multi-assay dataset from postmortem human prefrontal cortex. Genome Biol. 2025;26:88. doi: 10.1186/s13059-025-03552-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Nguyen H, Nguyen H, Tran D, Draghici S, Nguyen T. Fourteen years of cellular deconvolution: methodology, applications, technical evaluation and outstanding challenges. Nucleic Acids Res. 2024;52:4761–4783. doi: 10.1093/nar/gkae267. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Visser KE, de & Joyce JA. The evolving tumor microenvironment: From cancer initiation to metastatic outgrowth. Cancer Cell. 2023;41:374–403. doi: 10.1016/j.ccell.2023.02.016. [DOI] [PubMed] [Google Scholar]
- 81.Ziegenhain C, et al. Comparative Analysis of Single-Cell RNA Sequencing Methods. Mol Cell. 2017;65:631–643.:e4. doi: 10.1016/j.molcel.2017.01.023. [DOI] [PubMed] [Google Scholar]
- 82.Svensson V, et al. Power analysis of single-cell RNA-sequencing experiments. Nat Methods. 2017;14:381–387. doi: 10.1038/nmeth.4220. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Yamawaki TM, et al. Systematic comparison of high-throughput single-cell RNA-seq methods for immune cell profiling. BMC Genomics. 2021;22:66. doi: 10.1186/s12864-020-07358-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Hiam-Galvez KJ, Allen BM, Spitzer MH. Systemic immunity in cancer. Nat Rev Cancer. 2021;21:345–359. doi: 10.1038/s41568-021-00347-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Thorsson V, et al. The Immune Landscape of Cancer. Immunity. 2018;48:812–830.:e14. doi: 10.1016/j.immuni.2018.03.023. [This is among the first pan-cancer immune profiling studies using deconvolution] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Gentles AJ, et al. The prognostic landscape of genes and infiltrating immune cells across human cancers. Nat Med. 2015;21:938–945. doi: 10.1038/nm.3909. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Caronni N, et al. IL-1β+ macrophages fuel pathogenic inflammation in pancreatic cancer. Nature. 2023;623:415–422. doi: 10.1038/s41586-023-06685-2. [One of the recent works that utilize single-cell reference-based deconvolution to validate novel biological findings of an immune cell subtype in the TCGA patient cohort] [DOI] [PubMed] [Google Scholar]
- 88.Ciriello G, et al. Emerging landscape of oncogenic signatures across human cancers. Nat Genet. 2013;45:1127–1133. doi: 10.1038/ng.2762. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Hoadley KA, et al. Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer. Cell. 2018;173:291–304.:e6. doi: 10.1016/j.cell.2018.03.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Parker JS, et al. Supervised Risk Predictor of Breast Cancer Based on Intrinsic Subtypes. J Clin Oncol. 2009;27:1160–1167. doi: 10.1200/JCO.2008.18.1370. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Guinney J, et al. The consensus molecular subtypes of colorectal cancer. Nat Med. 2015;21:1350–1356. doi: 10.1038/nm.3967. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Wu SZ, et al. A single-cell and spatially resolved atlas of human breast cancers. Nat Genet. 2021;53:1334–1347. doi: 10.1038/s41588-021-00911-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Li Z, et al. Presence of onco-fetal neighborhoods in hepatocellular carcinoma is associated with relapse and response to immunotherapy. Nat Cancer. 2024;5:167–186. doi: 10.1038/s43018-023-00672-2. [DOI] [PubMed] [Google Scholar]
- 94.Fox NS, Haider S, Harris AL, Boutros PC. Landscape of transcriptomic interactions between breast cancer and its microenvironment. Nat Commun. 2019;10:3116. doi: 10.1038/s41467-019-10929-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Ali HR, Chlon L, Pharoah PDP, Markowetz F, Caldas C. Patterns of Immune Infiltration in Breast Cancer and Their Clinical Implications: A Gene-Expression-Based Retrospective Study. PLoS Med. 2016;13:e1002194. doi: 10.1371/journal.pmed.1002194. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Gurjao C, et al. Intrinsic Resistance to Immune Checkpoint Blockade in a Mismatch Repair Deficient Colorectal Cancer. Cancer Immunol Res. 2019;7:1230–1236. doi: 10.1158/2326-6066.CIR-18-0683. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Bi K, et al. Tumor and immune reprogramming during immunotherapy in advanced renal cell carcinoma. Cancer Cell. 2021;39:649–661.:e5. doi: 10.1016/j.ccell.2021.02.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.Bagaev A, et al. Conserved pan-cancer microenvironment subtypes predict response to immunotherapy. Cancer Cell. 2021;39:845–865.:e7. doi: 10.1016/j.ccell.2021.04.014. [DOI] [PubMed] [Google Scholar]
- 99.Haber PK, et al. Molecular Markers of Response to Anti-PD1 Therapy in Advanced Hepatocellular Carcinoma. Gastroenterology. 2023;164:72–88.:e18. doi: 10.1053/j.gastro.2022.09.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Wong D, Yin Y. Immune micro-environment analysis and establishment of response prediction model for PD-1 blockade immunotherapy in glioblastoma based on transcriptome deconvolution. J Cancer Res Clin Oncol. 2023;149:11689–11703. doi: 10.1007/s00432-023-05026-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Fernández EA, et al. Unveiling the immune infiltrate modulation in cancer and response to immunotherapy by MIXTURE-an enhanced deconvolution method. Brief Bioinform. 2021;22:bbaa317. doi: 10.1093/bib/bbaa317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Zaitsev A, et al. Precise reconstruction of the TME using bulk RNA-seq and a machine learning algorithm trained on artificial transcriptomes. Cancer Cell. 2022;40:879–894.:e16. doi: 10.1016/j.ccell.2022.07.006. [DOI] [PubMed] [Google Scholar]
- 103.Wang L, et al. Single-cell transcriptomic-informed deconvolution of bulk data identifies immune checkpoint blockade resistance in urothelial cancer. iScience. 2024;27:109928. doi: 10.1016/j.isci.2024.109928. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104.Guo YA, et al. Transcriptome Deconvolution Reveals Absence of Cancer Cell Expression Signature in Immune Checkpoint Blockade Response. Cancer Res Commun. 2024;4:1581–1596. doi: 10.1158/2767-9764.CRC-23-0442. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.US National Library of Medicine. ClinicalTrialsgov. 2020. https://www.clinicaltrials.gov/study/NCT03979508 .
- 106.US National Library of Medicine. ClinicalTrialsgov. 2025. https://www.clinicaltrials.gov/study/NCT06522919 .
- 107.US National Library of Medicine. ClinicalTrialsgov. 2023. https://www.clinicaltrials.gov/study/NCT05929222 .
- 108.US National Library of Medicine. ClinicalTrialsgov. 2019. https://www.clinicaltrials.gov/study/NCT03827967 .
- 109.US National Library of Medicine. ClinicalTrialsgov. 2024. https://www.clinicaltrials.gov/study/NCT06764589 .
- 110.Srinivasan M, Sedmak D, Jewell S. Effect of Fixatives and Tissue Processing on the Content and Integrity of Nucleic Acids. Am J Pathol. 2002;161:1961–1971. doi: 10.1016/S0002-9440(10)64472-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 111.von Ahlfen S, Missel A, Bendrat K, Schlumpberger M. Determinants of RNA quality from FFPE samples. PloS One. 2007;2:e1261. doi: 10.1371/journal.pone.0001261. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 112.Groelz D, et al. Non-formalin fixative versus formalin-fixed tissue: A comparison of histology and RNA quality. Exp Mol Pathol. 2013;94:188–194. doi: 10.1016/j.yexmp.2012.07.002. [DOI] [PubMed] [Google Scholar]
- 113.Poudel BH, Koks S. The whole transcriptome analysis using FFPE and fresh tissue samples identifies the molecular fingerprint of osteosarcoma. Exp Biol Med. 2024;249:10161. doi: 10.3389/ebm.2024.10161. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 114.Liu Y, et al. NAR Genomics Bioinforma. Vol. 6. qae098: 2024. Evaluating cell type deconvolution in FFPE breast tissue: application to benign breast disease. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 115.Manjunath HS, et al. Gene Expression Profiling of FFPE Samples: A Titration Test. Technol Cancer Res Treat. 2022;21:15330338221129710. doi: 10.1177/15330338221129710. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 116.Abramovitz M, et al. Optimization of RNA extraction from FFPE tissues for expression profiling in the DASL assay. BioTechniques. 2008;44:417–423. doi: 10.2144/000112703. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 117.Geiss GK, et al. Direct multiplexed measurement of gene expression with color-coded probe pairs. Nat Biotechnol. 2008;26:317–325. doi: 10.1038/nbt1385. [DOI] [PubMed] [Google Scholar]
- 118.Zheng C-M, et al. Study on the use of Nanostring nCounter to analyze RNA extracted from formalin-fixed-paraffin-embedded and fresh frozen bladder cancer tissues. Cancer Genet. 2022;268–269:137–143. doi: 10.1016/j.cancergen.2022.10.143. [DOI] [PubMed] [Google Scholar]
- 119.Li W, et al. Comprehensive evaluation of AmpliSeq transcriptome, a novel targeted whole transcriptome RNA sequencing methodology for global gene expression analysis. BMC Genomics. 2015;16:1069. doi: 10.1186/s12864-015-2270-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 120.Moll P, Ante M, Seitz A, Reda T. QuantSeq 3′ mRNA sequencing for RNA quantification. Nat Methods. 2014;11, i–iii [Google Scholar]
- 121.Zhang Y, Parmigiani G, Johnson WE. NAR Genomics Bioinforma. Vol. 2. qaa078: 2020. ComBat-seq: batch effect adjustment for RNA-seq count data. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 122.Ritchie ME, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43:e47. doi: 10.1093/nar/gkv007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 123.Turnbull AK, et al. Unlocking the transcriptomic potential of formalin-fixed paraffin embedded clinical tissues: comparison of gene expression profiling approaches. BMC Bioinformatics. 2020;21:30. doi: 10.1186/s12859-020-3365-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 124.Li J, Fu C, Speed TP, Wang W, Symmans WF. Accurate RNA Sequencing From Formalin-Fixed Cancer Tissue to Represent High-Quality Transcriptome From Frozen Tissue. JCO Precis Oncol. 2018 doi: 10.1200/PO.17.00091. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 125.Marczyk M, et al. The impact of RNA extraction method on accurate RNA sequencing from formalin-fixed paraffin-embedded tissues. BMC Cancer. 2019;19:1189. doi: 10.1186/s12885-019-6363-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 126.Thomas MPH, Ajaib S, Tanner G, Bulpitt AJ, Stead LF. GBMPurity: A machine learning tool for estimating glioblastoma tumor purity from bulk RNA-sequencing data. Neuro-Oncol. 2025;27:1458–1473. doi: 10.1093/neuonc/noaf026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 127.Dai Z, et al. Deciphering Cell Type Abundance in Proteomics Data Through Graph Neural Networks. Adv Sci. :e02987. doi: 10.1002/advs.202502987. n/a. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 128.Huang J, et al. DeepDeconUQ estimates malignant cell fraction prediction intervals in bulk RNA-seq tissue. PLOS Comput Biol. 2025;21:e1013133. doi: 10.1371/journal.pcbi.1013133. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 129.Rudin C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell. 2019;1:206–215. doi: 10.1038/s42256-019-0048-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 130.Calabrese C, et al. Genomic basis for RNA alterations in cancer. Nature. 2020;578:129–136. doi: 10.1038/s41586-020-1970-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 131.Danese A, et al. EpiScanpy: integrated single-cell epigenomic analysis. Nat Commun. 2021;12:5228. doi: 10.1038/s41467-021-25131-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 132.Teschendorff AE, Zhu T, Breeze CE, Beck S. EPISCORE: cell type deconvolution of bulk tissue DNA methylomes from single-cell RNA-Seq data. Genome Biol. 2020;21:221. doi: 10.1186/s13059-020-02126-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 133.Lu Q, Liu Z, Wang X. Inferring tumor purity using multi-omics data based on a uniform machine learning framework MoTP. Brief Bioinform. 2025;26:bbaf056. doi: 10.1093/bib/bbaf056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 134.Isakova A, Neff N, Quake SR. Single-cell quantification of a broad RNA spectrum reveals unique noncoding patterns associated with cell types and states. Proc Natl Acad Sci. 2021;118:e2113568118. doi: 10.1073/pnas.2113568118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 135.Li J, Zhang Z, Zhuang Y, Wang F, Cai T. Small RNA transcriptome analysis using parallel single-cell small RNA sequencing. Sci Rep. 2023;13:7501. doi: 10.1038/s41598-023-34390-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 136.Hao Y, Yan M, Heath BR, Lei YL, Xie Y. Fast and robust deconvolution of tumor infiltrating lymphocyte from expression profiles using least trimmed squares. PLoS Comput Biol. 2019;15:e1006976. doi: 10.1371/journal.pcbi.1006976. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 137.Nadel BB, et al. The Gene Expression Deconvolution Interactive Tool (GEDIT): accurate cell type quantification from gene expression data. GigaScience. 2021;10:giab002. doi: 10.1093/gigascience/giab002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 138.Qiao W, et al. PERT: a method for expression deconvolution of human blood samples from varied microenvironmental and developmental conditions. PLoS Comput Biol. 2012;8:e1002838. doi: 10.1371/journal.pcbi.1002838. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 139.Frishberg A, et al. Cell composition analysis of bulk genomics using single-cell data. Nat Methods. 2019;16:327–332. doi: 10.1038/s41592-019-0355-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 140.Du R, Carey V, Weiss ST. deconvSeq: deconvolution of cell mixture distribution in sequencing data. Bioinforma Oxf Engl. 2019;35:5095–5102. doi: 10.1093/bioinformatics/btz444. [DOI] [PubMed] [Google Scholar]
- 141.Wang L, et al. Single-cell Transcriptomic-informed Deconvolution of Bulk Data Identifies Immune Checkpoint Blockade Resistance in Urothelial Cancer. iScience. 2024;0 doi: 10.1016/j.isci.2024.109928. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 142.Erdmann-Pham DD, Fischer J, Hong J, Song YS. Likelihood-based deconvolution of bulk gene expression data using single-cell references. Genome Res. 2021;31:1794–1806. doi: 10.1101/gr.272344.120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 143.Dimitrakopoulou K, Wik E, Akslen LA, Jonassen I. Deblender: a semi−/unsupervised multi-operational computational method for complete deconvolution of expression data from heterogeneous samples. BMC Bioinformatics. 2018;19:408. doi: 10.1186/s12859-018-2442-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 144.Tang D, Park S, Zhao H. NITUMID: Nonnegative matrix factorization-based Immune-TUmor MIcroenvironment Deconvolution. Bioinformatics. 2020;36:1344–1350. doi: 10.1093/bioinformatics/btz748. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 145.Lu Y, Chen QM, An L. NAR Genomics Bioinforma. Vol. 5. qad109: 2023. Semi-reference based cell type deconvolution with application to human metastatic cancers. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 146.Dong L, Kollipara A, Darville T, Zou F, Zheng X. Semi-CAM: A semi-supervised deconvolution method for bulk transcriptomic data with partial marker gene information. Sci Rep. 2020;10:5434. doi: 10.1038/s41598-020-62330-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 147.Wu C-T, et al. CAM3.0: determining cell type composition and expression from bulk tissues with fully unsupervised deconvolution. Bioinformatics. 2024;40:btae107. doi: 10.1093/bioinformatics/btae107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 148.Liebner DA, Huang K, Parvin JD. MMAD: microarray microdissection with analysis of differences is a computational tool for deconvoluting cell type-specific contributions from tissue samples. Bioinformatics. 2014;30:682–689. doi: 10.1093/bioinformatics/btt566. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 149.Finotello F, Trajanoski Z. Quantifying tumor-infiltrating immune cells from transcriptomics data. Cancer Immunol Immunother CII. 2018;67:1031–1040. doi: 10.1007/s00262-018-2150-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 150.Jiménez-Sánchez A, Cast O, Miller ML. Comprehensive Benchmarking and Integration of Tumor Microenvironment Cell Estimation Methods. Cancer Res. 2019;79:6238–6246. doi: 10.1158/0008-5472.CAN-18-3560. [DOI] [PubMed] [Google Scholar]
- 151.Li H, et al. A comprehensive benchmarking with practical guidelines for cellular deconvolution of spatial transcriptomics. Nat Commun. 2023;14:1548. doi: 10.1038/s41467-023-37168-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 152.Nadel BB, et al. Systematic evaluation of transcriptomics-based deconvolution methods and references using thousands of clinical samples. Brief Bioinform. 2021;22:bbab265. doi: 10.1093/bib/bbab265. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 153.Vallania F, et al. Leveraging heterogeneity across multiple datasets increases cell-mixture deconvolution accuracy and reduces biological and technical biases. Nat Commun. 2018;9:4735. doi: 10.1038/s41467-018-07242-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 154.Maron SB, et al. Determinants of Survival with Combined HER2 and PD-1 Blockade in Metastatic Esophagogastric Cancer. Clin Cancer Res. 2023;29:3633–3640. doi: 10.1158/1078-0432.CCR-22-3769. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 155.Slyper M, et al. A single-cell and single-nucleus RNA-Seq toolbox for fresh and frozen human tumors. Nat Med. 2020;26:792–802. doi: 10.1038/s41591-020-0844-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 156.Denisenko E, et al. Systematic assessment of tissue dissociation and storage biases in single-cell and single-nucleus RNA-seq workflows. Genome Biol. 2020;21:130. doi: 10.1186/s13059-020-02048-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 157.Salcher S, et al. High-resolution single-cell atlas reveals diversity and plasticity of tissue-resident neutrophils in non-small cell lung cancer. Cancer Cell. 2022;40:1503–1520.:e8. doi: 10.1016/j.ccell.2022.10.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 158.Ding J, et al. Systematic comparison of single-cell and single-nucleus RNA-sequencing methods. Nat Biotechnol. 2020;38:737–746. doi: 10.1038/s41587-020-0465-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 159.Janssen P, et al. The effect of background noise and its removal on the analysis of single-cell expression data. Genome Biol. 2023;24:140. doi: 10.1186/s13059-023-02978-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 160.Lähnemann D, et al. Eleven grand challenges in single-cell data science. Genome Biol. 2020;21:31. doi: 10.1186/s13059-020-1926-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 161.Heumos L, et al. Best practices for single-cell analysis across modalities. Nat Rev Genet. 2023;24:550–572. doi: 10.1038/s41576-023-00586-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 162.Luecken MD, et al. Benchmarking atlas-level data integration in single-cell genomics. Nat Methods. 2022;19:41–50. doi: 10.1038/s41592-021-01336-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 163.Gavish A, et al. Hallmarks of transcriptional intratumour heterogeneity across a thousand tumours. Nature. 2023;618:598–606. doi: 10.1038/s41586-023-06130-4. [DOI] [PubMed] [Google Scholar]
- 164.Han Y, et al. TISCH2: expanded datasets and new tools for single-cell transcriptome analyses of the tumor microenvironment. Nucleic Acids Res. 2023;51:D1425–D1431. doi: 10.1093/nar/gkac959. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 165.Cerami E, et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2012;2:401–404. doi: 10.1158/2159-8290.CD-12-0095. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 166.Chung W, et al. Single-cell RNA-seq enables comprehensive tumour and immune cell profiling in primary breast cancer. Nat Commun. 2017;8:15081. doi: 10.1038/ncomms15081. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 167.Barkley D, et al. Cancer cell states recur across tumor types and form specific interactions with the tumor microenvironment. Nat Genet. 2022;54:1192–1201. doi: 10.1038/s41588-022-01141-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 168.Barkley D, Yanai I. Plasticity and Clonality of Cancer Cell States. Trends Cancer. 2019;5:655–656. doi: 10.1016/j.trecan.2019.09.002. [DOI] [PubMed] [Google Scholar]
- 169.Granot Z, Jablonska J. Distinct Functions of Neutrophil in Cancer and Its Regulation. Mediators Inflamm. 2015;2015:701067. doi: 10.1155/2015/701067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 170.Hanahan D, Coussens LM. Accessories to the crime: functions of cells recruited to the tumor microenvironment. Cancer Cell. 2012;21:309–322. doi: 10.1016/j.ccr.2012.02.022. [DOI] [PubMed] [Google Scholar]
- 171.Binnewies M, et al. Understanding the tumor immune microenvironment (TIME) for effective therapy. Nat Med. 2018;24:541–550. doi: 10.1038/s41591-018-0014-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 172.Ribas A, Wolchok JD. Cancer immunotherapy using checkpoint blockade. Science. 2018;359:1350–1355. doi: 10.1126/science.aar4060. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 173.Sahai E, et al. A framework for advancing our understanding of cancer-associated fibroblasts. Nat Rev Cancer. 2020;20:174–186. doi: 10.1038/s41568-019-0238-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 174.Chen Y, McAndrews KM, Kalluri R. Clinical and therapeutic relevance of cancer-associated fibroblasts. Nat Rev Clin Oncol. 2021;18:792–804. doi: 10.1038/s41571-021-00546-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 175.Galbo PM, Jr, Zang X, Zheng D. Molecular Features of Cancer-associated Fibroblast Subtypes and their Implication on Cancer Pathogenesis, Prognosis, and Immunotherapy Resistance. Clin Cancer Res. 2021;27:2636–2647. doi: 10.1158/1078-0432.CCR-20-4226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 176.Lavie D, Ben-Shmuel A, Erez N, Scherz-Shouval R. Cancer-associated fibroblasts in the single-cell era. Nat Cancer. 2022;3:793–807. doi: 10.1038/s43018-022-00411-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 177.Liu Y, et al. Conserved spatial subtypes and cellular neighborhoods of cancer-associated fibroblasts revealed by single-cell spatial multi-omics. Cancer Cell. 2025;43:905–924.:e6. doi: 10.1016/j.ccell.2025.03.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 178.Ginhoux F, Schultze JL, Murray PJ, Ochando J, Biswas SK. New insights into the multidimensional concept of macrophage ontogeny, activation and function. Nat Immunol. 2016;17:34–40. doi: 10.1038/ni.3324. [DOI] [PubMed] [Google Scholar]
- 179.DeNardo DG, Ruffell B. Macrophages as regulators of tumour immunity and immunotherapy. Nat Rev Immunol. 2019;19:369–382. doi: 10.1038/s41577-019-0127-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 180.Luca BA, et al. Atlas of clinically distinct cell states and ecosystems across human solid tumors. Cell. 2021;184:5482–5496. doi: 10.1016/j.cell.2021.09.014. [Building upon estimated cell-type fractions and cell-type-specific expression levels, this is the first study to identify and characterize cellular states and ecosystems across patients and associate these ecotypes with survival of multiple cancer types] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 181.Tran KA, et al. Performance of tumour microenvironment deconvolution methods in breast cancer using single-cell simulated bulk mixtures. Nat Commun. 2023;14:5758. doi: 10.1038/s41467-023-41385-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 182.Schelker M, et al. Estimation of immune cell content in tumour tissue using single-cell RNA-seq data. Nat Commun. 2017;8:2032. doi: 10.1038/s41467-017-02289-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 183.Xi NM, Li JJ. Benchmarking Computational Doublet-Detection Methods for Single-Cell RNA Sequencing Data. Cell Syst. 2021;12:176–194.:e6. doi: 10.1016/j.cels.2020.11.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 184.Wen L, Tang F. Single-cell sequencing in stem cell biology. Genome Biol. 2016;17:71. doi: 10.1186/s13059-016-0941-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 185.Ivich A, et al. Missing cell types in single-cell references impact deconvolution of bulk data but are detectable. Genome Biol. 2025;26:86. doi: 10.1186/s13059-025-03506-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 186.Li B, Liu JS, Liu XS. Revisit linear regression-based deconvolution methods for tumor gene expression data. Genome Biol. 2017;18:127. doi: 10.1186/s13059-017-1256-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 187.Crow M, Gillis J. Co-expression in Single-Cell Analysis: Saving Grace or Original Sin? Trends Genet. 2018;34:823–831. doi: 10.1016/j.tig.2018.07.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 188.Kharchenko PV. The triumphs and limitations of computational methods for scRNA-seq. Nat Methods. 2021;18:723–732. doi: 10.1038/s41592-021-01171-x. [DOI] [PubMed] [Google Scholar]
- 189.Liang S, et al. Single-cell manifold-preserving feature selection for detecting rare cell populations. Nat Comput Sci. 2021;1:374–384. doi: 10.1038/s43588-021-00070-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 190.Savas P, et al. Single-cell profiling of breast cancer T cells reveals a tissue-resident memory subset associated with improved prognosis. Nat Med. 2018;24:986–993. doi: 10.1038/s41591-018-0078-7. [DOI] [PubMed] [Google Scholar]
- 191.Davies D, et al. PD-1 defines a distinct, functional, tissue-adapted state in Vδ1+ T cells with implications for cancer immunotherapy. Nat Cancer. 2024;5:420–432. doi: 10.1038/s43018-023-00690-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 192.Monti M, et al. Plasmacytoid dendritic cells at the forefront of anti-cancer immunity: rewiring strategies for tumor microenvironment remodeling. J Exp Clin Cancer Res. 2024;43:196. doi: 10.1186/s13046-024-03121-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 193.Jenkins BH, et al. Single cell and spatial analysis of immune-hot and immune-cold tumours identifies fibroblast subtypes associated with distinct immunological niches and positive immunotherapy response. Mol Cancer. 2025;24:3. doi: 10.1186/s12943-024-02191-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 194.White BS, et al. Community assessment of methods to deconvolve cellular composition from bulk gene expression. Nat Commun. 2024;15:7362. doi: 10.1038/s41467-024-50618-0. [This study presents the Tumor Deconvolution DREAM Challenge results, benchmarking methods for estimating cellular composition from bulk expression data across cancer types] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 195.Heitzer E, Haque IS, Roberts CES, Speicher MR. Current and future perspectives of liquid biopsies in genomics-driven oncology. Nat Rev Genet. 2019;20:71–88. doi: 10.1038/s41576-018-0071-5. [DOI] [PubMed] [Google Scholar]
- 196.Biswas D, et al. A clonal expression biomarker associates with lung cancer mortality. Nat Med. 2019;25:1540–1548. doi: 10.1038/s41591-019-0595-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 197.Marx V. Method of the Year: spatially resolved transcriptomics. Nat Methods. 2021;18:9–14. doi: 10.1038/s41592-020-01033-y. [DOI] [PubMed] [Google Scholar]
- 198.Ståhl PL, et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science. 2016;353:78–82. doi: 10.1126/science.aaf2403. [DOI] [PubMed] [Google Scholar]
- 199.Xu Z, et al. STOmicsDB: a comprehensive database for spatial transcriptomics data sharing, analysis and visualization. Nucleic Acids Res. 2024;52:D1053–D1061. doi: 10.1093/nar/gkad933. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 200.Chen J, Larsson L, Swarbrick A, Lundeberg J. Spatial landscapes of cancers: insights and opportunities. Nat Rev Clin Oncol. 2024 doi: 10.1038/s41571-024-00926-7. [DOI] [PubMed] [Google Scholar]
- 201.Gulati GS, D’Silva JP, Liu Y, Wang L, Newman AM. Profiling cell identity and tissue architecture with single-cell and spatial transcriptomics. Nat Rev Mol Cell Biol. 2024 doi: 10.1038/s41580-024-00768-2. [DOI] [PubMed] [Google Scholar]
- 202.Cable DM, et al. Robust decomposition of cell type mixtures in spatial transcriptomics. doi: 10.1038/s41587-021-00830-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 203.Vahid MR, et al. High-resolution alignment of single-cell and spatial transcriptomes with CytoSPACE. Nat Biotechnol. 2023;41:1543–1548. doi: 10.1038/s41587-023-01697-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 204.Geras A, et al. Celloscope: a probabilistic model for marker-gene-driven cell type deconvolution in spatial transcriptomics data. Genome Biol. 2023;24:120. doi: 10.1186/s13059-023-02951-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 205.Miller BF, Huang F, Atta L, Sahoo A, Fan J. Reference-free cell type deconvolution of multi-cellular pixel-resolution spatially resolved transcriptomics data. Nat Commun. 2022;13:2339. doi: 10.1038/s41467-022-30033-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 206.Cheng M, et al. Spatially resolved transcriptomics: a comprehensive review of their technological advances, applications, and challenges. J Genet Genomics Yi Chuan Xue Bao. 2023;50:625–640. doi: 10.1016/j.jgg.2023.03.011. [DOI] [PubMed] [Google Scholar]
- 207.Yu Q, Jiang M, Wu L. Spatial transcriptomics technology in cancer research. Front Oncol. 2022;12:1019111. doi: 10.3389/fonc.2022.1019111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 208.Yan L, Sun X. Benchmarking and integration of methods for deconvoluting spatial transcriptomic data. Bioinforma Oxf Engl. 2023;39:btac805. doi: 10.1093/bioinformatics/btac805. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 209.Li B, et al. Benchmarking spatial and single-cell transcriptomics integration methods for transcript distribution prediction and cell type deconvolution. Nat Methods. 2022;19:662–670. doi: 10.1038/s41592-022-01480-9. [DOI] [PubMed] [Google Scholar]
- 210.Chen J, et al. A comprehensive comparison on cell-type composition inference for spatial transcriptomics data. Brief Bioinform. 2022;23:bbac245. doi: 10.1093/bib/bbac245. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 211.Gaspard-Boulinc LC, Gortana L, Walter T, Barillot E, Cavalli FMG. Cell-type deconvolution methods for spatial transcriptomics. Nat Rev Genet. 2025:1–19. doi: 10.1038/s41576-025-00845-y. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.




