Abstract
Single-cell RNA sequencing has revolutionized the high-resolution characterization of cellular heterogeneity within the tumor microenvironment. It is now widely recognized that cell heterogeneity plays a major role in determining variations in key clinical phenotypes, including patient survival, tumor size, lymph node involvement, metastasis, and responses to anticancer therapies. However, a critical gap remains in the lack of comprehensive databases that systematically link single-cell data to these clinically relevant phenotypes. To address this, we developed CPRCSdb (http://www.yzbio.top/CPRCSdb/), a comprehensive database that integrates clinical phenotypes with single-cell transcriptomic heterogeneity. CPRCSdb includes over 4.05 million cells from 1053 manually curated single-cell samples and 101 integrated datasets, along with 11,032 bulk RNA-seq samples, covering 29 cancer types. Our database spans 5 disease phenotype categories and the therapy response phenotypes of 30 different drugs. We analyzed 6197 associations between bulk-level phenotypes and single-cell heterogeneity, identifying over 1.13 million clinically relevant cells. Rigorous quality control measures were applied to ensure data accuracy. For each association, CPRCSdb supports downstream analyses, including differential expression, functional enrichment, and cell–cell communication inference. We anticipate that CPRCSdb will serve as a valuable and integrative resource for investigating the mechanisms underlying tumorigenesis and cancer progression.
Keywords: Single cell database, Cell heterogeneity, Disease phenotype association
1. Introduction
Current single-cell technologies enable genome-wide transcriptional profiling at the individual cell level, allowing researchers to resolve previously inaccessible heterogeneity within cell populations [1]. Emerging evidence indicates that transcriptional diversity among tumor microenvironment cells plays a central role in shaping diverse clinical phenotypes, ranging from tumor development to patient survival outcomes [2], [3], [4], [5], [6]. For example, Zhang et al. found that in the liver cancer microenvironment, there are two clusters of macrophages with distinct molecular features. MDSC-like macrophages highly express S100A family genes, FCN1, and VCAN; whereas TAM-like macrophages highly express genes such as APOE, C1QA, and SLC40A1. Only the TAM-like macrophages are significantly associated with shorter overall survival [7]. Cellular heterogeneity is a key determinant of proliferative diversity in cancer, driving tumor growth through aberrant expression of cell cycle-related genes. As demonstrated by pan-cancer single-cell studies, gene modules including TOP2A and PCNA are highly recurrent across various tumor types. [8]. Suma et al. revealed five tumor cell clusters in T follicular helper lymphoma, with heterogeneity in TFH markers, proliferation, and tissue localization. For example, proliferative C3/C4 cells were enriched in lymph nodes, whereas cytotoxic C1/C2 clusters were mainly present in peripheral blood [9]. In colorectal cancer metastasis, it has been reported that cellular heterogeneity drives cancer stem cells to overcome niche-restricted growth and acquire the ability to grow in remote tissues [10]. Correspondingly, the response to nivolumab treatment is significantly influenced by cellular heterogeneity, which shapes the UBASH3B-NR1I2-CEACAM1-HAVCR2 signaling axis to modulate immune suppression and the tumor microenvironment [11].
The above examples demonstrate that the heterogeneity of the tumor microenvironment plays a particularly crucial role in the occurrence and progression of tumors. By employing the Scissor algorithm to integrate clinical phenotypes (e.g., patient survival, tumor stage, drug treatment response) with single-cell data, we can identify the cell subpopulations that drive these phenotypes [12]. These subpopulations will facilitate targeted therapies and biomarker discovery. However, the single-cell data analysis process used by existing databases such as CancerSCEM, TISCH2, CancerSEA, and PanglaoDB is to perform unsupervised clustering and analyze the marker genes of each cluster to define the functions of cell subpopulations. They all lack the ability to identify the cell subpopulations that drive the disease phenotypes. To address these limitations, we developed a comprehensive database that integrates key clinical phenotypes such as survival, tumor stage, and drug response with single-cell transcriptomic heterogeneity, accompanied by detailed functional annotations (Supplementary Table S1). Furthermore, we validated the consistency of inferred phenotype-associated cell subpopulations on independent datasets (Supplementary Figure S1) as well as their robustness to batch-effect correction across datasets (Supplementary Figure S2). In summary, Scissor is highly effective in identifying cell subpopulations that drive phenotypes, making it a valuable tool that ensures the precision and reliability of our database.
Here, we introduce CPRCSdb (http://www.yzbio.top/CPRCSdb/), a curated and comprehensive cancer single-cell database that systematically maps clinical phenotypes from bulk samples into individual cells, enabling integrative analysis of transcriptional heterogeneity and clinical phenotypes (Fig. 1). In brief, CPRCSdb analyzes data spanning over 4.05 million cells (derived from 1053 manually collected single-cell samples and 101 integrated datasets), together with 11,032 bulk RNA-seq samples, covering a total of 29 cancer types. Our database covers five clinical phenotypes, namely patient survival status, tumor stage, tumor size, lymph node involvement, and distant metastasis, as well as 30 anti-cancer drug response phenotypes. Following rigorous quality control of both single-cell and bulk RNA-seq data, we performed 6197 association analyses between bulk phenotypes and single-cell data using the Scissor method. For each association, CPRCSdb offers a series of downstream analyses, such as differential expression analysis, as well as enrichment analysis, between prognosis-associated cell subpopulations. To further understand the interactions among phenotype-associated subpopulations across different cell types, we performed cell type annotation and cell-cell communication analysis. Additionally, six online gene enrichment analysis tools are provided to help users explore the phenotype-associated cell subpopulations. In summary, CPRCSdb is a user-friendly platform designed for querying, analyzing, and visualizing how single-cell heterogeneity drives cancer phenotypes. We anticipate that CPRCSdb will serve as a valuable resource for investigating the mechanisms underlying tumorigenesis and cancer development.
Fig. 1.
Database content and analytical framework of CPRCSdb. The database integrates bulk and single-cell RNA-seq data. A unified computational pipeline enables the identification of phenotype-associated cell populations, and a user-friendly interface supports the exploration of how tumor single-cell heterogeneity drives phenotypic outcomes.
2. Materials and methods
2.1. Data collection and manual curation
We initiated data collection by searching the Gene Expression Omnibus (GEO) database using keyword combinations such as “scRNA”, “single-cell RNA”, “cancer”, or “tumor”. This search yielded 197 original single-cell RNA sequencing (scRNA-seq) datasets, encompassing 1271 individual single-cell samples. For each dataset, we downloaded the “Series Matrix File(s)” which contain detailed sample information, including the tissue of origin, disease type, and sequencing platform. To restrict our analysis to human cancers, we applied a filtering strategy to exclude non-human samples:
-
a)
studies based on mouse-derived tumors;
-
b)
studies related to non-cancer diseases;
-
c)
studies that did not utilize RNA sequencing.
After initial curation, we observed fewer cancer types in our dataset compared to TCGA. To expand cancer type diversity, we performed targeted searches with additional keywords and added more samples. As a result, our final dataset includes representative samples from 29 distinct cancer types.
2.2. Single cell data preprocessing and quality control
-
a)
Cells with fewer than 400 detected genes were removed.
-
b)
Cells with mitochondrial contamination were excluded based on the distribution pattern observed in mitochondrial violin plots [13].
-
c)
Cells were normalized using the LogNormalize method, in which gene expression values were scaled relative to the total expression of each cell and then log-transformed.
-
d)
Highly variable genes (HVGs) were identified using the VST method and were subsequently used for downstream dimensionality reduction. [14].
-
e)
Principal component analysis (PCA) was performed on the scaled expression values of the HVGs, and the top principal components were selected for clustering and visualization [15]. Cell clustering was carried out using the Louvain algorithm based on a shared nearest neighbor (SNN) graph constructed from the PCA results. Uniform Manifold Approximation and Projection (UMAP) was applied to visualize cellular heterogeneity in a two-dimensional space.
-
f)
All of the above steps were accomplished using the R package Seurat (5.0.3).
2.3. Bulk data and clinical phenotype processing workflow
-
a)
Using the GDCquery and GDCdownload functions from the R package TCGAbiolinks (2.30.4), we retrieved RNA-seq expression profiles and corresponding clinical information from the TCGA database. Drug response information was obtained from the website: https://portal.gdc.cancer.gov/.
-
b)
Remove the adjacent tissue samples from the original data and retain only the tumor samples. For multiple tumor samples from the same patient, calculate their average expression values to represent the expression profile of that patient.
-
c)
Survival data: Merge the fields of vital_status, days_to_death and days_to_last_follow_up, and set death as the endpoint for survival.
-
d)
TNM staging: We simplified the TNM staging by binarizing its individual components. Tumor size (T) was categorized as T1/T2 vs. T3/T4. Lymph node involvement (N) was categorized as N0 vs. N1/N2/N3. Distant metastasis (M) was categorized as M0 vs. M1. Stage was categorized as stage I/II vs. stage III/ IV.
-
e)
Drug response data: Information on bcr_patient_barcode, drug_name and measure_of_response was extracted from TCGA clinical drug files (such as nationwidechildrens.org_clinical_drug_kich.txt). The measure_of_response variable was categorized as resistant for records that were “Stable Disease” or “Clinical Progressive Disease” and as sensitive for those that were “Partial Response” or “Complete Response”. Uniformly standardize the brand names and ingredient names of drugs in the original file to common names. Only drugs that have at least 3 resistant and 3 sensitive samples in each cancer project are retained for the subsequent association analysis. The raw data is available in the “Anticancer Drugs” section of the website's Browse page, while the processed data and results of the drug-phenotype association analysis are displayed on the “Search -> Drug page”.
2.4. Phenotype-associated cell subpopulations
To link cell subpopulations with specific phenotypes, we employed the Scissor (2.1.0) algorithm [12]. Quantile normalization was applied to both single-cell data and bulk data to remove batch effects [16]. The clinical phenotype data are divided into two types in total: one is survival data, and the other is categorical data. If it is survival data, the Cox model is called through the family parameter of the Scissor function. If it is categorical data, then the logistic regression model is called. Finally, “Scissor+ ” and “Scissor-” are designated as cells with positive and negative associations to phenotype progression, respectively.
2.5. Differential gene expression analysis
For each association between single-cell data and a phenotype, we performed differential analysis on the inferred phenotype-associated subpopulations (“Scissor+” vs. “Scissor-”) using the FindMarkers function in Seurat (5.0.3) with the default Wilcoxon rank sum test. Differentially expressed genes were defined using thresholds of FDR < 0.05 and absolute FC > 2. The results are presented as interactive volcano plots on the website, with the data available for download in a table format.
2.6. Gene set enrichment analysis
Firstly, we obtained the annotated gene sets from MSigDB database [17], including Gene Ontology [18], KEGG pathway [19], and Curated Cancer Cell Atlas (3CA) [20]. Then, we extracted the gene sets from differential expression analysis and color-coded them according to the direction of gene upregulation or downregulation. Finally, we performed a hypergeometric test to assess the enrichment of differential genes in annotated gene sets [21].
2.7. Automated cell-type annotation
Cell type annotation was performed using the SingleR (2.4.1) algorithm, a computational tool for automated cell type labeling in single-cell RNA sequencing data [22]. Given that CPRCSdb contains only cancer-derived single-cell profiles, we used immune-cell-specific reference datasets from SingleR, including Blueprint/ENCODE [23], [24], the Database of Immune Cell Expression (DICE) [25], the Novershtern hematopoietic dataset [26], and the Monaco immune dataset [27], to ensure robust and biologically interpretable cell type annotation.
2.8. Inference of cell-cell communication
To investigate intercellular communication networks within the tumor microenvironment, we employed CellChat (1.6.1), a computational tool that infers and visualizes cell-cell interactions based on ligand-receptor interaction profiles [28]. First, we merged the Scissor labels with SingleR annotations to define more refined cell subpopulations, such as “T cell (Poor)”, denoting a T-cell subpopulation associated with poor prognosis. Then, we employed CellChat to infer intercellular communication, including signal directionality and ligand-receptor interactions, among these subpopulations. Finally, the results are presented on the website as interactive network diagrams and downloadable tables.
2.9. User-friendly online enrichment tool
CPRCSdb provides a suite of six web-based tools for gene set enrichment analysis across clinical phenotypes, including overall survival, tumor stage, tumor size, lymph node status, metastasis, and drug response. After a user submits a gene set on the website, the system returns its enrichment results against the genes upregulated in phenotype-positively-associated cells. The enrichment of the user-submitted gene set in phenotype-associated genes was measured by the hypergeometric test P-value.
2.10. Web-based database development
CPRCSdb was developed as a web-based database system using the Django framework in Python, providing a robust and scalable foundation for both frontend presentation and backend data management. The database backend is managed through Django’s Object-Relational Mapping (ORM) system, which enables efficient interaction with a relational database (e.g., PostgreSQL or MySQL), ensuring data integrity, security, and ease of maintenance. The user interface was designed with responsiveness and interactivity in mind, leveraging several modern JavaScript libraries and associated CSS frameworks. Bootstrap 5 was employed to ensure a responsive layout and to style interactive UI components such as buttons, navigation bars, and modal dialogs. Tabular data are rendered using DataTables.js, which supports dynamic sorting, filtering, and pagination for enhanced usability. Interactive and publication-quality visualizations—including bar plots, scatter plots, heatmaps, and network diagrams—are generated using Plotly.js, Apache ECharts, and Chart.js, enabling users to explore gene expression patterns, enrichment results, and cell–cell communication networks through dynamic charts. Font Awesome provides scalable vector icons that improve visual navigation and user experience. The system is deployed on a 2-core, 2 GB RAM Ubuntu 22.04 LTS (64-bit) server, optimized for lightweight performance. To ensure efficient and secure web serving, Nginx is configured as a reverse proxy to handle static files and forward dynamic requests, while Gunicorn acts as the WSGI HTTP server to run the Django application in production mode. This architecture ensures reliable performance, scalability, and compatibility with modern web standards.
3. Result
3.1. Web interface for data browsing
CPRCSdb is a comprehensive single-cell RNA sequencing database encompassing 29 cancer types, dedicated to elucidating the relationship between tumor cellular heterogeneity and clinical phenotypes. The Browse page provides access to single-cell RNA-seq, bulk RNA-seq, and drug response data through three primary navigation buttons (Fig. 2A):
-
a)
scRNA-seq: This section provides access to all collected single-cell samples within CPRCSdb, totaling 1053 samples. Sample details include GSM ID, GSE ID, Organism, Project Name, and Tissue type.
-
b)
Bulk RNA-seq: This section presents statistics on the bulk RNA-seq samples and associated clinical phenotypes for each cancer project in CPRCSdb.
-
c)
Anticancer Drugs: This section contains all drug response data downloaded from TCGA. Each row corresponds to a patient’s response to a specific anticancer drug. And the “Go to Drug Detail” button allows users to access the results of drug-phenotype association analysis derived from the single-cell data.
Fig. 2.
The main functions and usages of CPRCSdb. (A) The Browse page provides navigation to the single-cell RNA-seq, bulk RNA-seq, and drug response data modules. (B) Two search methods: one for scRNA-seq and bulk RNA-seq data, and another for drug data. (C) A table containing GSM ID, Project Name, QC status, and other metadata. (D) Sample overview, bulk-to-single-cell label mapping, differential expression and enrichment analysis, and cell-cell communication analysis. (E) Bulk sample details, single-cell sample information, and differential survival analysis. (F) Drug response analysis results. (G) Phenotype association enrichment analysis tool. (H) Data download.
By integrating these diverse datasets, CPRCSdb facilitates a deeper understanding of how cellular heterogeneity influences clinical outcomes, offering valuable insights into tumor biology and potential therapeutic strategies.
3.2. Online search and sample information portal
CPRCSdb provides two search methods, one for scRNA and Bulk RNA data, and another for drug data (Fig. 2B).
3.2.1. Fuzzy matching search for single-cell datasets and bulk projects
Users can search the database by entering keywords such as GSM ID, cancer name, tissue name, etc. Search results are displayed in a table format including GSM ID, Project Name, QC status, etc. Clicking on the GSM ID leads to a detailed single-cell sample page with four sections: Sample Overview, Bulk to Single-Cell Label Mapping, Differential and Enrichment Analysis, and Cell-Cell Communication Analysis (Fig. 2C-D). Clicking on the project abbreviation leads to a bulk sample detail page containing three sections: Bulk Sample Details, Single-Cell Sample Information, and Differential Survival Analysis (Fig. 2E).
3.2.2. Exploring drug-cancer associations through dual search paths
Given the complex many-to-many relationship between drugs and cancer projects, we define two complementary approaches for querying drug phenotype annotations. The first involves searching by drug to identify associated cancer projects, and the second involves searching by cancer project to identify the anticancer drugs used. On the drug-focused detail page, basic drug information, single-cell sample details, and drug response analysis results are provided (Fig. 2F).
3.3. Phenotype association enrichment analysis tool
Users can paste a gene set into the input box and click “Start Analysis” to identify relevant single-cell samples in CPRCSdb. The “Reset” button clears the input, while “Show Example” populates it with a predefined gene set for demonstration (Fig. 2G, Fig. 4).
Fig. 4.
Online enrichment tool of CPRCSdb. (A) Gene set input. (B) Enrichment results table. (C) Integrated drug analysis outputs, including phenotype associations, response heatmaps, and differential analysis.
3.4. Documentation and data download portal
The “Help” page of CPRCSdb provides users with information on the data storage of the CPRCSdb database, how to use the CPRCSdb database, and how to use the online analysis tools. The “Download” page provides access to analysis results and processed data. The download section of CPRCSdb provides users with single-cell clinical phenotype association labels, cell type annotation labels, bulk sample expression profiles, and drug information. Moreover, across all pages of the website, a download button is provided wherever a table is displayed (Fig. 2H).
3.5. Case studies
Case
Study of Phenotype-Associated Cell Identification
To demonstrate the analytical capabilities and biological utility of CPRCSdb, we conducted a case study using single-cell RNA sequencing data from patients with bladder urothelial carcinoma. This example illustrates the workflow for querying disease-specific samples, performing phenotype-associated cell identification, and interpreting molecular and cellular mechanisms underlying clinical outcomes.
We initiated the analysis by searching for “Bladder” in the sample search module of CPRCSdb. From the resulting dataset, we selected sample GSM5360669 for in-depth investigation. The “Detail” page for a single-cell sample is accessible by clicking its GSM ID and begins with the “Sample overview” section, which provides essential metadata, including Project Name, Tissue Type, Disease Type, PMID, and cell/gene counts before and after quality control (Fig. 3A).
Fig. 3.
Online search interface of CPRCSdb. (A) Search interface with an example query for "Bladder" tissue and the corresponding results table. (B) The section displays cell subpopulations associated with five phenotypes and cell type annotations based on four reference datasets. (C) Multi-analysis results: differential expression, functional enrichment, and cell-cell communication.
Next, the “Bulk-to-ScRNA-seq Label Transfer” section provides two methods for identifying cell subpopulations. One is the identification of phenotype-driving cell subpopulations, a highlight of CPRCSdb, which by default shows subpopulations associated with overall survival. The other is automated cell type annotation, common to single-cell databases, which by default displays annotations based on the Blueprint-Encode reference dataset (Fig. 3B).
In the analysis of cell subpopulations associated with overall survival, we found that poor-prognosis cells (Scissor+) compared to good-prognosis cells (Scissor-) exhibited upregulation of genes including KRT5 and YAP1. Both of these genes are known drivers involved in the pathogenesis of bladder cancer (Fig. 3C, Supplementary Dataset S1). For instance, YAP1 has been shown to drive tumor progression and immune evasion via the IL-6/STAT3 signaling axis and dysregulation of CXCL chemokines [29], while transgenic overexpression of COX-2 under the KRT5 promoter induces transitional cell hyperplasia and carcinoma in murine bladder models [30].
Functional enrichment analysis of downregulated genes in poor-prognosis cells highlighted significant depletion in immune-related pathways, most notably KEGG_ANTIGEN_PROCESSING_AND_PRESENTATION (Fig. 3C, Supplementary Dataset S2). This pathway is essential for MHC class I-mediated antigen presentation and immune recognition; its suppression is a well-documented mechanism of immune escape in cancer, often mediated through loss-of-function mutations or epigenetic silencing of genes such as B2M and HLA family members [31].
Cell type annotation of the overall survival-associated subpopulations showed predominant enrichment of CD8⁺ T cells and natural killer cells in the good-prognosis group (Fig. 3C, Supplementary Dataset S3). This finding is consistent with previous studies indicating that, within the tumor immune microenvironment, both NK cells and CD8⁺ T cells serve as critical cytotoxic lymphocytes capable of directly eliminating abnormal cells, such as infected or tumor cells [32], [33]. In contrast, cells linked to poor prognosis were primarily composed of epithelial cells [34]. Notably, signaling from CD8⁺ T and NK cells (associated with good prognosis) to epithelial cells (associated with poor prognosis) was extremely weak. In contrast, reverse signaling from the poor-prognosis epithelial cells to the good-prognosis CD8⁺ T and NK cells was significantly stronger. This asymmetric communication pattern suggests that tumor-associated epithelial cells may actively suppress immune surveillance by secreting immunomodulatory ligands or cytokines, thereby facilitating immune evasion [35].
Case
Study of Phenotype Enrichment Analysis
To further demonstrate CPRCSdb's ability to link cellular heterogeneity with drug treatment response phenotypes, we applied the drug phenotype enrichment analysis tool for illustration (Fig. 4A). Given that “immune-cold” tumors are frequently associated with therapy resistance due to the accumulation of immunosuppressive cell types, we used a panel of myeloid-derived suppressor cell (MDSC) marker genes reported in the study by Pritam Sadhukhan et al. to identify relevant samples [29], [36].
We analyzed the top-ranked candidate, TCGA-CESC_GSM5917944_Paclitaxel, from the MDSC marker gene enrichment results. This entry maps the Paclitaxel response data from bulk TCGA-CESC patients to single cells in sample GSM5917944. (Fig. 4B). Our results indicated that cells linked to paclitaxel resistance were primarily enriched within the CD8⁺ T cell subpopulations, whereas B cells were more strongly associated with drug sensitivity. This unexpected finding-that cytotoxic T cells correlate with resistance rather than response-may indicate the presence of an exhausted or dysfunctional CD8⁺ T cell state in the tumor microenvironment, which could impair effective antitumor immunity (Fig. 4C, Supplementary Dataset S4-6). This observation is consistent with the original study, which reported that CD8⁺ T cells in cervical cancer exhibit features of functional impairment and exhaustion [37].
4. Discussion
Cancer remains one of the leading causes of death worldwide. Complex molecular and cellular heterogeneity underlies key aspects of tumorigenesis, including tumor dynamics, heterogeneous treatment responses, and patient overall survival. A deeper understanding of how distinct cancer cell subpopulations drive tumor progression, influence clinical outcomes, and contribute to treatment resistance is essential for advancing precision oncology and developing more effective therapeutic strategies.
Despite the rapid growth of single-cell cancer atlases, existing databases lack systematic links between cell subpopulations and key clinical phenotypes such as tumor stage, metastasis status, survival outcomes, or drug response. To address the lack of such resources, we developed the Cancer-Phenotype-Related Cancer Cell Subpopulations Database (CPRCSdb), a comprehensive resource that systematically maps single-cell data to clinical phenotypes across 29 cancer types. The platform provides a suite of user-friendly interactive interfaces to help users explore phenotype-driving cell subpopulations. Users can investigate these subpopulations through downstream differential expression analysis, functional enrichment analysis, and cell-cell communication analysis.
In summary, CPRCSdb serves as a powerful resource for investigating the relationship between cancer cell subpopulations and clinical phenotypes. We believe this platform will facilitate the discovery of phenotype-associated cell subpopulations, advance the understanding of tumor heterogeneity, and ultimately support the development of personalized cancer therapies.
CRediT authorship contribution statement
Jinxiu Deng: Data curation. Jiayi Zhao: Data curation. Zhaomeng Liu: Data curation. Han Zhang: Supervision, Software. Yuxiang Yan: Visualization, Validation, Formal analysis, Data curation. shang desi: Writing – review & editing, Supervision, Project administration, Funding acquisition. Yizhen Gong: Writing – original draft, Visualization, Software, Investigation, Formal analysis, Data curation, Conceptualization. Anqi Chen: Visualization. You Chen: Visualization. Li Ling: Data curation. Guanghao Yang: Visualization.
Funding
This work was supported by the National Natural Science Foundation of China [62272211], the Natural Science Foundation of Hunan Province [2023JJ30535], and the Clinical Research 4310 Program of the University of South China [20224310NHYCG05].
Declaration of Competing Interest
The manuscript is not submitted to print and electronic manuscripts elsewhere, and there is no economic benefit (except for the author's basic academic career) that may lead to the appearance of a conflict of interest. We are glad to take this opportunity to submit our work to show our platform. We are very grateful for your editorial attention and suggestions for this manuscript.
Footnotes
Supplementary data associated with this article can be found in the online version at doi:10.1016/j.csbj.2025.11.033.
Appendix A. Supplementary material
Supplementary material
Supplementary material
Supplementary material
Supplementary material
Supplementary material
Supplementary material
Data availability
The research community can access information freely in the CPRCSdb without registration or logging in. The URL of CPRCSdb is http://www.yzbio.top/CPRCSdb/ The data related to CPRCSdb, including single-cell expression profiles, cell annotations, and the results of downstream analyses, have also been deposited on Zenodo (https://zenodo.org/uploads/17454198).
References
- 1.Potter S.S. Single-cell RNA sequencing for the study of development, physiology and disease. Nat Rev Nephrol. 2018;14(8):479–492. doi: 10.1038/s41581-018-0021-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Mangena V., Chanoch-Myers R., Sartore R., Paulsen B., Gritsch S., Weisman H., et al. Glioblastoma cortical organoids recapitulate cell-state heterogeneity and intercellular transfer. Cancer Discov. 2025;15(2):299–315. doi: 10.1158/2159-8290.CD-23-1336. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Baldassarre G., LdlS I., Vallette F.M. Death-ision: the link between cellular resilience and cancer resistance to treatments. Mol Cancer. 2025;24(1):144. doi: 10.1186/s12943-025-02339-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Li H., Zandberg D.P., Kulkarni A., Chiosea S.I., Santos P.M., Isett B.R., et al. Distinct CD8(+) T cell dynamics associate with response to neoadjuvant cancer immunotherapies. Cancer Cell. 2025;43(4):757–775. doi: 10.1016/j.ccell.2025.02.026. e8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Pei G., Min J., Rajapakshe K.I., Branchi V., Liu Y., Selvanesan B.C., et al. Spatial mapping of transcriptomic plasticity in metastatic pancreatic cancer. Nature. 2025;642(8066):212–221. doi: 10.1038/s41586-025-08927-x. [DOI] [PubMed] [Google Scholar]
- 6.Shi Q., Chen Y., Li Y., Qin S., Yang Y., Gao Y., et al. Cross-tissue multicellular coordination and its rewiring in cancer. Nature. 2025;643(8071):529–538. doi: 10.1038/s41586-025-09053-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Zhang Q., He Y., Luo N., Patel S.J., Han Y., Gao R., et al. Landscape and dynamics of single immune cells in hepatocellular carcinoma. Cell. 2019;179(4):829–845. doi: 10.1016/j.cell.2019.10.003. e20. [DOI] [PubMed] [Google Scholar]
- 8.Barkley D., Moncada R., Pour M., Liberman D.A., Dryg I., Werba G., et al. Cancer cell states recur across tumor types and form specific interactions with the tumor microenvironment. Nat Genet. 2022;54(8):1192–1201. doi: 10.1038/s41588-022-01141-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Suma S., Suehara Y., Fujisawa M., Abe Y., Hattori K., Makishima K., et al. Tumor heterogeneity and immune-evasive T follicular cell lymphoma phenotypes at single-cell resolution. Leukemia. 2024;38(2):340–350. doi: 10.1038/s41375-023-02093-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Canellas-Socias A., Sancho E., Batlle E. Mechanisms of metastatic colorectal cancer. Nat Rev Gastroenterol Hepatol. 2024;21(9):609–625. doi: 10.1038/s41575-024-00934-z. [DOI] [PubMed] [Google Scholar]
- 11.Zeng F., Zhang Q., Tsui Y.M., Ma H., Tian L., Husain A., et al. Multimodal sequencing of neoadjuvant nivolumab treatment in hepatocellular carcinoma reveals cellular and molecular immune landscape for drug response. Mol Cancer. 2025;24(1):110. doi: 10.1186/s12943-025-02314-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Sun D., Guan X., Moran A.E., Wu L.Y., Qian D.Z., Schedin P., et al. Identifying phenotype-associated subpopulations by integrating bulk and single-cell sequencing data. Nat Biotechnol. 2022;40(4):527–538. doi: 10.1038/s41587-021-01091-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Satija R., Farrell J.A., Gennert D., Schier A.F., Regev A. Spatial reconstruction of single-cell gene expression data. Nat Biotechnol. 2015;33(5):495–502. doi: 10.1038/nbt.3192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Hafemeister C., Satija R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol. 2019;20(1):296. doi: 10.1186/s13059-019-1874-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Butler A., Hoffman P., Smibert P., Papalexi E., Satija R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol. 2018;36(5):411–420. doi: 10.1038/nbt.4096. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Bolstad B.M., Irizarry R.A., Astrand M., Speed T.P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003;19(2):185–193. doi: 10.1093/bioinformatics/19.2.185. [DOI] [PubMed] [Google Scholar]
- 17.Castanza A.S., Recla J.M., Eby D., Thorvaldsdottir H., Bult C.J., Mesirov J.P. Extending support for mouse data in the molecular signatures database (MSigDB) Nat Methods. 2023;20(11):1619–1620. doi: 10.1038/s41592-023-02014-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Ashburner M., Ball C.A., Blake J.A., Botstein D., Butler H., Cherry J.M., et al. Gene ontology: tool for the unification of biology. Gene Ontol Consort Nat Genet. 2000;25(1):25–29. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Kanehisa M. Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28(1):27–30. doi: 10.1093/nar/28.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Gavish A., Tyler M., Greenwald A.C., Hoefflin R., Simkin D., Tschernichovsky R., et al. Hallmarks of transcriptional intratumour heterogeneity across a thousand tumours. Nature. 2023;618(7965):598–606. doi: 10.1038/s41586-023-06130-4. [DOI] [PubMed] [Google Scholar]
- 21.Yu G., Wang L.G., Han Y., He Q.Y. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS. 2012;16(5):284–287. doi: 10.1089/omi.2011.0118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Aran D., Looney A.P., Liu L., Wu E., Fong V., Hsu A., et al. Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat Immunol. 2019;20(2):163–172. doi: 10.1038/s41590-018-0276-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Martens J.H., Stunnenberg H.G. BLUEPRINT: mapping human blood cell epigenomes. Haematologica. 2013;98(10):1487–1489. doi: 10.3324/haematol.2013.094243. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Consortium EP. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489(7414):57–74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Schmiedel B.J., Singh D., Madrigal A., Valdovino-Gonzalez A.G., White B.M., Zapardiel-Gonzalo J., et al. Impact of Genetic Polymorphisms on Human Immune Cell Gene Expression. Cell. 2018;175(6):1701–1715. doi: 10.1016/j.cell.2018.10.022. e16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Novershtern N., Subramanian A., Lawton L.N., Mak R.H., Haining W.N., McConkey M.E., et al. Densely interconnected transcriptional circuits control cell states in human hematopoiesis. Cell. 2011;144(2):296–309. doi: 10.1016/j.cell.2011.01.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Monaco G., Lee B., Xu W., Mustafah S., Hwang Y.Y., Carre C., et al. RNA-Seq signatures normalized by mRNA abundance allow absolute deconvolution of human immune cell types. Cell Rep. 2019;26(6):1627–1640. doi: 10.1016/j.celrep.2019.01.041. e7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Jin S., Guerrero-Juarez C.F., Zhang L., Chang I., Ramos R., Kuan C.H., et al. Inference and analysis of cell-cell communication using CellChat. Nat Commun. 2021;12(1):1088. doi: 10.1038/s41467-021-21246-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Sadhukhan P., Feng M., Illingworth E., Sloma I., Ooki A., Matoso A., et al. YAP1 induces bladder cancer progression and promotes immune evasion through IL-6/STAT3 pathway and CXCL deregulation. J Clin Invest. 2024;135(2) doi: 10.1172/JCI171164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Klein R.D., Van Pelt C.S., Sabichi A.L., Dela Cerda J., Fischer S.M., Furstenberger G., et al. Transitional cell hyperplasia and carcinomas in urinary bladders of transgenic mice with keratin 5 promoter-driven cyclooxygenase-2 overexpression. Cancer Res. 2005;65(5):1808–1813. doi: 10.1158/0008-5472.CAN-04-3567. [DOI] [PubMed] [Google Scholar]
- 31.Trombetta E.S., Mellman I. Cell biology of antigen processing in vitro and in vivo. Annu Rev Immunol. 2005;23:975–1028. doi: 10.1146/annurev.immunol.22.012703.104538. [DOI] [PubMed] [Google Scholar]
- 32.Chu J., Gao F., Yan M., Zhao S., Yan Z., Shi B., et al. Natural killer cells: a promising immunotherapy for cancer. J Transl Med. 2022;20(1):240. doi: 10.1186/s12967-022-03437-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Li Q., Lin L., Shou P., Liu K., Xue Y., Hu M., et al. MHC class Ib-restricted CD8(+) T cells possess strong tumoricidal activities. Proc Natl Acad Sci USA. 2023;120(43) doi: 10.1073/pnas.2304689120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Pastushenko I., Brisebarre A., Sifrim A., Fioramonti M., Revenco T., Boumahdi S., et al. Identification of the tumour transition states occurring during EMT. Nature. 2018;556(7702):463–468. doi: 10.1038/s41586-018-0040-3. [DOI] [PubMed] [Google Scholar]
- 35.Oliveira G., Wu C.J. Dynamics and specificities of T cells in cancer immunotherapy. Nat Rev Cancer. 2023;23(5):295–316. doi: 10.1038/s41568-023-00560-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Wu B., Zhang B., Li B., Wu H., Jiang M. Cold and hot tumors: from molecular mechanisms to targeted therapy. Signal Transduct Target Ther. 2024;9(1):274. doi: 10.1038/s41392-024-01979-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Qu X., Wang Y., Jiang Q., Ren T., Guo C., Hua K., et al. Interactions of Indoleamine 2,3-dioxygenase-expressing LAMP3(+) dendritic cells with CD4(+) regulatory T cells and CD8(+) exhausted T cells: synergistically remodeling of the immunosuppressive microenvironment in cervical cancer and therapeutic implications. Cancer Commun (Lond) 2023;43(11):1207–1228. doi: 10.1002/cac2.12486. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplementary material
Supplementary material
Supplementary material
Supplementary material
Supplementary material
Supplementary material
Data Availability Statement
The research community can access information freely in the CPRCSdb without registration or logging in. The URL of CPRCSdb is http://www.yzbio.top/CPRCSdb/ The data related to CPRCSdb, including single-cell expression profiles, cell annotations, and the results of downstream analyses, have also been deposited on Zenodo (https://zenodo.org/uploads/17454198).




