Skip to main content
Neoplasia (New York, N.Y.) logoLink to Neoplasia (New York, N.Y.)
. 2026 Mar 27;75:101300. doi: 10.1016/j.neo.2026.101300

Identification of tumor initiating cells and early marker genes in histologically normal colonic mucosa that lead to neoplastic transformation

Sangeeta Jaiswal a, Stephanie The b, Tse-Shao Chang c, Jiaqi Shi d, Thomas D Wang a,c,e,
PMCID: PMC13054087  PMID: 41903466

Abstract

Background & Aims

Colorectal cancer (CRC) remains a leading cause of cancer‑related morbidity and mortality worldwide. Although the adenoma–carcinoma sequence and its genetic drivers are well described, the earliest cellular and molecular events initiating tumorigenesis within histologically normal colonic epithelium remain poorly defined. This study aims to identify tumor‑initiating cells (TICs), distinguish them from normal stem‑like cells (nSTMs), and delineate early transcriptional and signaling programs using single‑cell RNA sequencing (scRNA‑seq) from paired normal‑appearing and transformed human colonic tissues.

Methods

Fresh biopsies from histologically normal mucosa and matched polyps, including tubular adenomas, sessile serrated adenomas, and adenocarcinomas, were collected from seven subjects. Single‑cell transcriptomes were generated using the 10x Genomics platform and analyzed with Seurat, Monocle2, CytoTRACE, GSEA/GSVA, RNA velocity, InferCNV, CellChat, and NicheNet. Spatial validation was performed using RNA‑FISH.

Results

We resolved 51,054 high‑quality single‑cell transcriptomes into 33 clusters. Tumor-specific stem-like (tSTM) and deep crypt secretory (tDCS) populations were enriched in adenomas. Subclustering of tSTM identified TIC-like subsets predominantly derived from histologically normal mucosa that localized to the root of lineage trajectories leading to polyp-enriched tSTM states. Compared to nSTMs, TICs exhibited enhanced stemness potential, early epithelial–mesenchymal transition (EMT) and interferon signaling, suppression of oxidative phosphorylation, and distinct genomic and signaling features, indicating early neoplastic reprogramming. ETS2, SLC12A2, and LEFTY1 were identified as TIC‑specific markers; SOD3 and GPRC5A increased along the TIC‑to‑tSTM trajectory. RNA‑FISH confirmed candidate marker localization. Independent validation using the COLONMAP dataset (30 polyps, 35 normal samples) demonstrated that TIC-like cells were predominantly enriched in tubular adenomas but were scarce in serrated lesions. Across this independent cohort, TIC marker genes showed reproducible upregulation in TIC-like populations, supporting the robustness of these observations across cohorts.

Conclusions

Our results identify TICs as the origin of neoplastic stem‑like states in the conventional tubular adenoma pathway and define early transcriptional, metabolic, and microenvironmental reprogramming events that distinguish TICs from nSTMs. In contrast to serrated pathways described in other atlases, our data support a stem‑like expansion model for tubular adenomas and nominate biomarkers with translational potential for early CRC detection and intervention.

Keywords: Tumor-initiating cells; Colorectal cancer; Single-cell RNA-seq; Epithelial-to-mesenchymal transition; ETS2, SOD3, GPRC5A

Graphical abstract

Image, graphical abstract

Introduction

Colorectal cancer (CRC) contributes substantially to the worldwide health care burden. Globally, over 1.9 million cases are diagnosed each year, leading to more than 900,000 deaths annually [1]. In the U.S., about 152,810 people are diagnosed yearly, and annual mortality is about 53,010 [2]. The adenoma-carcinoma sequence is widely accepted as the underlying molecular process that leads to sporadic CRC development [3]. A series of genetic mutations occur in normal colonic mucosa that result in spontaneous formation of adenomas followed by invasive cancer [4]. Inactivation of tumor suppressor genes, such as APC, leads to dysregulated WNT signaling [5], and activation of oncogenes, such as KRAS, stimulates the MAPK pathway [6]. Sequential genetic and epigenetic changes over time then drive proliferative changes that may result in adenocarcinoma [7]. Thus, new approaches to detect CRC at an early stage, when treatment options are more effective, are urgently needed. Such approaches may be enabled by defining precursor cell populations, identifying early marker genes, and clarifying molecular pathways that initiate malignant transformation.

Previously, bulk RNA sequencing methods have been used primarily to investigate CRC molecular genetics [8]. Transcriptome profiling, biomarker discovery, cancer heterogeneity characterization, and investigation of therapeutic resistance mechanisms have been performed using mucosal tissues. However, only average gene expression levels across diverse cell populations are measured [9]. Recently, cancer stem cells (CSCs) have been implicated to play a key role in CRC initiation and growth [[10], [11], [12]]. These self-renewable, pluripotent cells have an innate capacity to regenerate as well as initiate tumors. Unlike established CSCs that sustain growth within overt tumors, tumor-initiating cells (TICs) are hypothesized to represent early precursor states that arise within histologically normal epithelium and give rise to neoplastic stem-like populations. Marker genes for CSCs in CRC have been reported, and include CD44, CD133, LGR5, DCLK1, CD166, CD26, and CD24 [[13], [14], [15], [16], [17], [18]]. Single-cell RNA sequencing (scRNA-seq) is an emerging approach that provides gene expression at the level of individual cells and can provide a more detailed analysis of cellular diversity [[19], [20], [21]]. This approach can be used to identify rare cell subpopulations, such as CSCs, subtle transcriptional variations, and temporal gene expression dynamics [22,23]. This method provides an opportunity to distinguish marker genes in premalignant versus malignant epithelium that may drive cancer initiation.

Tumor initiation within histologically normal colonic epithelium is a complex, multistep process involving the emergence of precursor cell states, transitional programs, and progressive microenvironmental remodeling [[24], [25], [26]]. While single-cell RNA sequencing (scRNA-seq) has been widely applied to characterize colorectal tumor heterogeneity, build molecular atlases, and define cancer stem cell states, few studies have directly interrogated the earliest transcriptional events that precede histologic transformation. In this study, we apply scRNA-seq to paired human colorectal polyps and adjacent normal-appearing mucosa to address three specific objectives: (1) to identify tumor-initiating cell–like populations within histologically normal epithelium, (2) to define the early transcriptional, metabolic, and signaling programs associated with their progression toward neoplastic stem-like states, and (3) to validate candidate TIC markers and pathway specificity using spatial approaches and independent single-cell datasets.

Methods

Human subject and tissue collection

Fresh human colonic tissues were obtained from 7 adult patients undergoing routine screening colonoscopy at Michigan Medicine under IRB approved protocol HUM00102771 (IRBMED). Informed written consent was obtained from all participants prior to enrollment. For each subject, paired biopsies were collected from colonic polyps and adjacent normal appearing mucosa. Specimen ID and corresponding pathology is provided in Table S1. Each specimen was deidentified prior to processing to ensure patient confidentiality. All authors had full access to the study data and approved the final version of the manuscript.

Tissue processing and histological evaluation

Biopsied colonic tissues were bisected upon collection. One portion was immediately processed for scRNA-seq to preserve transcriptomic integrity while the other was fixed in 10% neutral buffered formalin and paraffin-embedded for histological evaluation. H&E staining was performed on 5-μm tissue sections, and diagnoses were confirmed by an expert GI pathologist (JS).

Single-cell suspension preparation and sequencing

Single-cell suspensions were generated from freshly collected colon tissues using the Neural Tissue Dissociation Kit (Miltenyi Biotec, #130-092-628) according to the manufacturer’s protocol. Following dissociation, single cells were encapsulated into nanodroplets using the Chromium Controller (10X Genomics), and single-cell libraries were constructed using the Chromium Single Cell 5′ Library & Gel Bead Kit (10X Genomics). High-throughput sequencing of the resulting libraries was performed on the NovaSeq 6000 sequencer to enable transcriptome-wide profiling at single-cell resolution.

Data preprocessing and integration

Raw sequencing reads were aligned and quantified using Cell Ranger (10X Genomics) against the GRCh38 human reference genome [27]. Processed data were analyzed using Seurat v3.1.0 within the R environment [28,29]. Standard quality control filtering was applied to exclude cells with fewer than 200 detected genes or with mitochondrial gene content <25% of total UMI counts. Following quality control, data were normalized, log-transformed, and integrated across all patient samples using the Seurat integration pipeline to correct for inter-sample batch effects. Batch correction was evaluated qualitatively through t-SNE visual inspection, confirming that cells from different patient samples were well mixed and did not segregate by batch.

Clustering and cell type annotation

Dimensionality reduction was performed using principal component analysis (PCA), with the top 18 principal components selected for downstream clustering based on variance explained. Unsupervised clustering was then applied to groups of transcriptionally distinct cell populations. Cluster identities were visualized using both UMAP and t-SNE projections to assess spatial separation. Differentially expressed genes (DEGs) for each cluster were identified using the Seurat function FindAllMarkers. Cluster annotation was guided by the expression of established canonical marker genes and led to the identification of major cell types.

Identification of tumor-specific clusters

Cell clusters were annotated based on composition across tissue types and the expression of canonical marker genes to identify tumor-associated epithelial subpopulations. Clusters enriched in polyp-derived epithelial cells were examined for differential abundance compared to clusters derived from histologically normal mucosa. Marker genes associated with intestinal stem cells, e.g. OLFM4 and LGR5, and secretory lineages, e.g. REG4 and MUC2, were used to classify stem-like and deep crypt secretory (DCS) cell populations. Corresponding clusters with similar marker profiles in both normal and polyp were annotated as normal stem-like (nSTM) and normal DCS (nDCS) populations. Immunofluorescence (IF) staining was performed on FFPE sections using antibodies against OLFM4 and REG4 to validate gene expression at the protein level. Confocal microscopy was used to image fluorescence signal and confirm spatial localization of marker expression in epithelial crypt.

Gene set enrichment analysis (GSEA)

Differentially expressed genes (DEGs) between tumor-specific and normal epithelial subtypes, specifically, tumor stem-like (tSTM) versus normal stem-like (nSTM), and tumor deep crypt secretory (tDCS) versus normal DCS (nDCS) cells, were subjected to gene set enrichment analysis (GSEA) using the clusterProfiler package in R [30]. Gene sets for Hallmark pathways were obtained from the Molecular Signatures Database (MSigDB) [[31], [32], [33], [34]]. DEGs with adjusted p-values <0.05 were ranked and used to compute enrichment scores. Enrichment was assessed using normalized enrichment scores (NES) and false discovery rate (FDR)-adjusted q-values. Visualization of enriched pathways was performed using dot plots to illustrate functional profiles of tumor-associated versus normal epithelial cell populations.

CytoTRACE analysis

CytoTRACE (Cellular Trajectory Reconstruction Analysis using gene Counts and Expression) was performed to infer the differentiation potential of single cells using a pre-processed Seurat object [35]. The Seurat object, containing filtered and normalized single-cell RNA-seq data, was converted to a gene expression matrix using the as.matrix() function on the raw counts slot (arkov_object[[“RNA”]]@counts). This matrix was used as input for the CytoTRACE() function from the CytoTRACE R package (v0.3.3). Default parameters were applied unless stated otherwise. The resulting CytoTRACE scores, which estimate cellular plasticity based on transcriptional diversity, were mapped back onto the Seurat object metadata. These scores were then visualized on UMAP embeddings to reveal differentiation gradients across clusters. Higher CytoTRACE scores indicated less differentiated, more stem-like cell states, and were used to support downstream trajectory analyses.

InferCNV analysis

The copy number variation (CNV) score in the stem cell population cells was calculated based on the single-cell transcriptomic profiles using InferCNV (https://github.com/broadinstitute/inferCNV (ver 1.22.0) [36]. Cells from cluster 11 and 12 obtained from normal specimens were selected as references. For the inferCNV analysis, the following parameters were used: “denoise,” default hidden arkov model settings, and a value of 0.1 as the “cutoff” value. Finally, the subclusters with relatively higher CNV scores were considered malignant cells. CNV scores were calculated at the single-cell level as the mean absolute deviation of inferred CNV signal across genes for each cell. Comparisons between TICs and tSTM populations were performed by aggregating per-cell CNV scores within each cluster or condition, without additional per-cluster normalization.

Subclustering and tumor-initiating cell (TIC) identification

Epithelial clusters were re-clustered into transcriptionally distinct subpopulations using Seurat to investigate the transcriptional evolution of tumor-specific stem-like (tSTM) cells. Subclusters were annotated based on tissue origin, and specific subpopulations predominantly derived from histologically normal epithelium were flagged for further analysis as candidate tumor-initiating cells (TICs). Differential gene expression analysis was performed between TIC-enriched subclusters and the main tSTM population to identify early molecular events associated with tumor initiation. GSVA was subsequently conducted to examine pathway alterations associated with this transition, focusing on Hallmark gene sets. Genes showing progressive upregulation along the TIC-to-tSTM continuum were selected for further analysis to identify early transformation markers. Expression trajectories of candidate genes were visualized using pseudotime mapping.

Pseudotime trajectory analysis

Trajectory inference was performed using the Monocle 2 R package to investigate transcriptional transitions during epithelial transformation [37]. For pseudotime analysis, stem cell-associated clusters were extracted. Subclusters within the tSTM population were also isolated to evaluate the potential origin and differentiation trajectories of tumor-initiating cells (TICs) to further resolve lineage dynamics. Dimensionality reduction was carried out using the DDRTree algorithm, and cells were ordered along inferred trajectories using the orderCells function. Differential gene expression across pseudotime was computed using differentialGeneTest. Principal component-based visualization was used to map transcriptional transitions across pseudotemporal space. Gene expression changes and pathway activity along trajectory components were used to characterize phenotypic shifts associated with early tumorigenesis.

GSVA and correlation with pseudotime

GSVA was performed to quantify cell-level pathway activity across pseudotime trajectories with a focus on key Hallmark pathways such as EMT and oxidative phosphorylation (PHOS). GSVA scores were computed for each cell using predefined gene sets from the Molecular Signatures Database (MsigDB). Pearson’s correlation was calculated between GSVA scores and the primary trajectory component to evaluate the relationship between pathway activity and transcriptional progression. Gene expression trends for candidate early transformation markers were visualized along pseudotime. These analyses were used to characterize temporal dynamics of transcriptional reprogramming during the transition from tumor-initiating cells (TICs) to tumor stem-like (tSTM) cells.

RNA velocity analysis

RNA velocity analysis was conducted using the VeloVAE framework to infer directional transcriptional dynamics and predict future cell state transitions. Spliced and unspliced transcript count matrices were generated using the Kallisto|Bustools pipeline with a pre-built human reference index optimized for RNA velocity inference. The resulting matrices were processed through VeloVAE, a variational autoencoder-based model that estimates latent time, kinetic parameters, and RNA velocity vectors across cells. Following standard preprocessing, dimensionality reduction and clustering were re-applied to ensure alignment between velocity-derived trajectories and existing cell annotations. The inferred velocity field and latent temporal ordering were used to assess lineage progression and validate pseudotime-based trajectories of tumor-initiating cell populations.

RNA in situ hybridization

RNA fluorescence in situ hybridization (RNA-FISH) was performed to validate the spatial expression of early transformation markers in both normal and polyp. FFPE sections were cut at 5 μm thickness and mounted on Superfrost Plus glass slides. Sections were deparaffinized, subjected to heat-mediated antigen retrieval, and hybridized with RNA probes. ViewRNA™ Tissue Fluorescence Assay (Thermo Scientific, QVT0646B) was performed to detect RNA expression. RNA probes for SOD3 and ETS2 (VX06. Assay ID: VA1-3004554-VT and VX01, Assay ID: VA6-3168063VT) were obtained from Thermo Scientific. FISH assay was performed per the manufacturer’s protocol. Nuclei were counterstained with DAPI for cellular localization. Fluorescence imaging was conducted using a confocal microscope equipped with a 40 × oil-immersion objective to assess marker expression patterns in situ. RNA-FISH was used to corroborate scRNA-seq–based identification of early transformation signatures in morphologically normal and polyp tissue compartments.

Analysis of publicly available data

The QC-filtered data from the Colorectal Molecular Atlas Project [38] was downloaded from the HTAN data portal: https://data.humantumoratlas.org. For this study, processed Seurat object for discovery datasets were downloaded. All the downstream processing was performed according to the methods described in previous sections.

Cellchat analysis

Intercellular communication was inferred using the R package CellChat [39], which models signaling interactions based on single-cell transcriptomic data and a curated ligand–receptor interaction database. Briefly, normalized gene expression data and cluster labels were used to create CellChat objects for each condition of interest. CellChat computes communication probabilities for ligand–receptor pairs between cell populations by integrating expression data with known interaction networks, accounting for complex ligand and receptor structures and cofactors. Significant signaling pathways and interactions were identified using permutation testing and compared between tumor-initiating cell (TIC) and normal stem-like (nSTM) conditions. Visualizations of communication networks and pathway-specific interactions were generated using built-in CellChat functions. This approach enabled quantitative inference and comparison of intercellular signaling landscapes from scRNA-seq data. Accordingly, inferred signaling differences between TIC and nSTM populations should be interpreted as putative communication programs that may represent stress-associated signaling, niche formation, or a combination of both.

Transcription factor activity inference

Transcription factor (TF) activity was inferred from single-cell RNA-seq data using the DoRothEA regulon collection coupled with the VIPER algorithm. TF-target interactions with high confidence (levels A–C) were used to estimate TF activities by evaluating the expression of their downstream targets rather than TF expression alone, yielding proxy activity scores for each TF across cells. Briefly, we subset the Seurat object to include TIC and nSTM cells and extracted normalized expression data. DoRothEA human regulons were filtered for confidence levels A–C and provided as the input network for the run_viper() function, which computes normalized enrichment scores representing TF activity. The resulting TF activity matrix was incorporated back into the Seurat object as a new assay, followed by scaling and dimensionality reduction to visualize TF activity patterns. Differential TF activity between TIC and nSTM populations was assessed using Seurat’s differential expression framework applied to the TF activity assay.

NicheNet ligand activity analysis

To infer which ligands expressed by TIC and nSTM cells may regulate gene expression in fibroblasts, we applied the NicheNet approach, which integrates prior knowledge of ligand–receptor interactions and ligand–target relationships with gene expression data to prioritize active ligands and their downstream targets. NicheNet predicts ligand activity by assessing how well the predicted targets of each ligand explain the observed expression patterns in a receiver cell population, allowing identification of candidate signaling drivers of intercellular communication. Briefly, TIC, nSTM, and fibroblast subsets were extracted from the Seurat object, and expressed genes were defined based on a minimum detection threshold. A curated ligand–receptor network (lr_network) and ligand–target regulatory prior (ligand_target_matrix) were used to identify ligands expressed in sender populations whose receptors are expressed in fibroblasts. For each ligand set, we performed ligand activity prediction using the predict_ligand_activities() function, ranking ligands by their predicted regulatory potential on a gene set of interest in fibroblasts.

Statistical analysis

All analyses were performed in R (v4.1.0) unless otherwise specified. Cells were analyzed as nested within patients, and no statistical test treated individual patients as independent replicates unless explicitly stated. Differentially expressed genes (DEGs) were identified using the FindAllMarkers function in Seurat (v3.1.0), which applies the Wilcoxon rank-sum test with Benjamini–Hochberg correction for multiple testing. Genes were required to be expressed in at least 10% of cells in either cluster (min.pct = 0.1) and to show an absolute log fold change greater than 0.25 (logfc.threshold = 0.25). Genes with adjusted p < 0.05 and average log2 fold-change > 0.25 were considered significant. GSEA was performed using clusterProfiler, with significance defined as FDR-adjusted q < 0.05. GSVA scores were correlated with pseudotime using Pearson’s correlation. ROC analyses were performed using the pROC package. For each gene, area-under-the-curve (AUC), sensitivity, and specificity across thresholds were calculated, and the optimal threshold was determined by Youden’s index. Visualizations were performed with ggplot2 and Seurat functions. Thresholds and statistical metrics are reported in figures and supplementary tables. ROC analyses were conducted using per-cell measurements, and resulting performance metrics reflect cell-level discrimination rather than patient-level classification.

Results

Single-cell transcriptomic profiling identifies tumor-associated epithelial subpopulations

Single-cell RNA sequencing (scRNA-seq) was performed on paired colonic biopsies from 7 human subjects and captured both polyps and adjacent histologically normal mucosa, Fig. 1A. The polyp cohort contained diverse histopathologic subtypes, including tubular adenomas, sessile and traditional serrated adenomas, and adenocarcinoma, as confirmed by pathology, Fig. S1, Table S1. After quality control, batch correction, and integration, Fig. S2A,B, 51,054 high-quality single-cell transcriptomes were obtained, including 31,376 normal and 19,678 polyp, Fig. 1B. After clustering, 33 transcriptionally distinct cell populations were identified, Table S2. These clusters were annotated into major epithelial, stromal, and immune cell types and used to investigate tumor-associated transcriptional reprogramming, Table S3. Clusters 0 and 10 were markedly enriched in polyp-derived epithelial cells compared to normal, and defined tumor-associated subpopulations, Fig. 1B,C. Dot plot analysis revealed distinct transcriptional programs across clusters, Fig. 1D, with cluster 0 exhibiting high expression of OLFM4 and LGR5, consistent with a tumor-specific stem-like (tSTM) epithelial phenotype. Cluster 10 showed elevated REG4 and MUC2, characteristic of a tumor-specific deep crypt secretory (tDCS) identity. In contrast, clusters 11 and 12 (normal stem-like, nSTM) and cluster 1 (normal deep crypt secretory, nDCS) were present in both normal and polyp tissues, reflecting homeostatic epithelial populations. Immunofluorescence validated strong upregulation of OLFM4 and REG4 in adenomatous epithelium relative to minimal expression in paired normal mucosa, Fig. 1E.

Fig. 1.

Fig 1 dummy alt text

Single-cell transcriptomic profiling of human colon. A) Paired biopsies from 7 subjects were divided for scRNA-seq and pathology review. Polyps included premalignant lesions and one adenocarcinoma. B) Integration of 51,054 high-quality cells identified 33 transcriptionally distinct clusters with notable enrichment of clusters 0 and 10 in polyps. C) Frequency analysis confirmed tumor-specific enrichment of clusters 0 and 10. D) Dot plot revealed high expression of OLFM4, LGR5, REG4, and MUC2 in clusters 0 and 10, consistent with a proliferative, stem-like phenotype. E) Immunofluorescence staining validated upregulation of OLFM4 and REG4 in adenoma compared to minimal expression in matched normal tissue (arrows). OLFM4 and REG4 immunofluorescence highlights tSTM/tDCS identity, indicating tumor-associated stem-like cells rather than TIC-specific expression.

Subclustering and lineage trajectory analyses identify TIC precursors of tSTM cells

To investigate the developmental origin of tumor-specific stem-like cells, tSTM (cluster 0) was further resolved into 8 epithelial subclusters, Fig. 2A. Among these, subclusters 4 and 6 were predominantly derived from histologically normal mucosa and identified as candidate tumor-initiating cells (TICs), whereas subcluster 0 was enriched in polyp tissue, Fig. 2B. Subcluster 4 exhibited the highest stemness potential using CytoTRACE analysis with intermediate levels in subclusters 6, Fig. 2C,D. Monocle 2 trajectory analysis of TIC (sub 4 and sub 6) and one of the tSTM clusters (sub 0) revealed a lineage continuum in which subclusters 4 and 6 mapped to early states and progressed directionally toward sub0, representing a polyp-enriched epithelial branch, Fig. 3A,B. Projection of CytoTRACE scores along pseudotime confirmed that cells at the trajectory root exhibited the greatest stem cell potential, Fig. 3C,D. RNA velocity analysis was performed to further support this model, and showed transcriptional flow from subclusters 4 and 6 toward subcluster 0, consistent with a unidirectional differentiation pathway from early-stage TICs to tumor-specific stem-like states, Fig. 3E,F.

Fig. 2.

Fig 2 dummy alt text

Identification of tumor initiating cells (TICs). A) Integrated t-SNE visualization of single cells colored by Seurat-defined subclusters (sub0–sub7) illustrate transcriptionally distinct cell populations. B) The same t-SNE embedding colored by experimental condition (N versus P) highlights condition-enriched regions with notable enrichment within subclusters 4 and 6 (outlined). These subclusters were operationally defined as TICs based on tissue origin. C) CytoTRACE scores projected onto the t-SNE embedding where higher scores indicate greater developmental potency (less differentiated states) and lower scores indicate more differentiated states. D) Boxplots summarizing CytoTRACE scores across subclusters to demonstrate heterogeneity in inferred differentiation potential with subclusters 4 and 0 exhibiting higher median potency compared with more differentiated clusters such as subcluster 6.

Fig. 3.

Fig 3 dummy alt text

Trajectory inference and lineage progression of tSTM subclusters. A) Monocle 2 trajectory analysis revealed differentiation paths for subclusters 0, 4, and 6. Subclusters 4 and 6 are positioned at early stages, and subcluster 0 represents a polyp-enriched epithelial branch. B) Pseudotime ordering demonstrated a continuum from early to late states with terminal branches corresponding to more differentiated populations. C) CytoTRACE mapping along the trajectory confirmed the highest stemness scores at the trajectory root. D) Cells with the least differentiated state localized to the trajectory origin. E) RNA velocity was performed on the integrated dataset, showing a directional flow from subcluster 4 toward subclusters 6 and 0, consistent with a unidirectional differentiation trajectory; patient-specific variation was not assessed. F) Pseudotime heatmap further validated the temporal progression from early (purple) to late (yellow) states, consistent with RNA velocity and transcriptional activity.

Trajectory analysis reveals progressive transcriptional reprogramming

To explore how TICs transition toward more aggressive phenotypes, we reconstructed their transcriptional trajectories along the principal component continuum. The progression of these clusters along component 1, Fig. S3A revealed a coordinated increase in epithelial–mesenchymal transition (EMT) activity accompanied by a decline in oxidative phosphorylation (OXPHOS), Fig. S3B,C, consistent with a gradual shift toward mesenchymal-like and metabolically reprogrammed states. Gene set enrichment analysis of TICs further supported this trend, showing downregulation of OXPHOS and upregulation of EMT and interferon-α signaling pathways, indicative of a stress-responsive, pro-tumorigenic transcriptional program, Fig. S3D. Trajectory analysis reinforced these observations, revealing a progressive reorganization of cellular and metabolic pathways that accompanies the evolution of cancer stem cell–like states from TICs, Fig. S3E,F. Along the principal component 1 trajectory, OXPHOS-related genes (ATP5MC2, COX7C) showed a coordinated decrease, whereas EMT/stemness-associated genes (CD44, TGFBI) increased, peaking mid-trajectory, consistent with a transition from metabolic to mesenchymal-like programs in tumor-initiating cells, Fig. S3G. Together, these findings suggest that metabolic reprogramming and activation of stress signaling are integral features of the early transition from TICs to stem cell–like populations, potentially linking EMT dynamics to the establishment of tumor-initiating potential.

Early transformation markers revealed by pseudotime analysis and validated by RNA-FISH

Early transformation markers were identified by differential gene expression analysis and expression along trajectory. Two genes, SOD3 and GPRC5A, rose steadily along pseudotime, and highlight their role as candidate molecular indicators of neoplastic progression, Fig. 4A-C. The expression of SOD3 and GPRC5A was found to be significantly higher in tSTM compared to TIC, Fig. 4D. RNA-FISH analysis confirmed spatial upregulation of SOD3 in adenomatous crypts with minimal expression in adjacent normal mucosa, Fig. 4E.

Fig. 4.

Fig 4 dummy alt text

Pseudotime dynamics and tissue-level validation of SOD3 and GPRC5A expression. Trajectory inference analysis showing gene expression of A) SOD3 and B) GPRC5A projected onto the inferred pseudotime manifold. Cells are colored by log-transformed expression levels (log10[value + 0.1]) with numbered nodes indicating major trajectory branch points. C) Smoothed expression trends of SOD3 (top) and GPRC5A (bottom) along pseudotime, stratified by condition (N versus P), demonstrating progressive upregulation along the trajectory, with enrichment in the P condition. Note that SOD3 and GPRC5A reflect trajectory-associated progression along pseudotime rather than TIC-specific identity. D) Violin plots comparing expression levels of SOD3 (top) and GPRC5A (bottom) between TIC and tSTM states, showing significantly higher expression in the more advanced tSTM state. Horizontal bars denote median expression. E) Representative RNA in situ hybridization images (SOD3 RNA, TRITC, red; nuclei, DAPI, blue) and corresponding H&E staining in polyp and normal tissues. SOD3 signal is increased in polyp epithelium compared with normal mucosa (arrows), corroborating transcriptomic findings. Scale bars as indicated.

CNV analysis reveals genomic instability in tumor-associated cells

Copy number variation (CNV) from the single-cell transcriptomes was evaluated using InferCNV to provide orthogonal evidence of neoplastic transformation. nSTM cells originating from normal epithelium were used as reference. Epithelial clusters from normal mucosa showed minimal CNV alterations, consistent with genomic stability, Fig. S4A. Polyp-derived tSTM and nSTM populations from the observation clusters displayed widespread chromosomal amplifications and deletions, Fig. S4B. Notably, TICs (Cluster 0_N) did not exhibit significant chromosomal aberrations, suggesting that CNV acquisition occurs downstream of TIC emergence. Accordingly, the mean CNV score was higher in the tumor cluster compared with normal clusters, Fig. S4C. These findings demonstrate that tumor-associated epithelial subtypes are defined not only by transcriptional and pathway reprogramming but also by underlying genomic instability, highlighting their central role in early neoplastic progression.

TICs are mostly associated with tubular adenoma

Previous work by Chen et al., 2021 reported that serrated polyps originate through metaplastic processes. The dataset used in our study included samples representing multiple histologic subtypes of colorectal polyps. To investigate whether the TIC associated clusters identified in our analysis were linked to particular histologic subtypes, we performed a correlation analysis. A Chi-square test revealed a significant enrichment of TIC-high cells in tubular adenomas, suggesting that tubular adenomas may arise from the expansion of stem-like cell populations, Fig. 5A. To validate these findings, we analyzed an independent scRNA-seq dataset from the COLONMAP study, which includes transcriptomic profiles from 30 polyp and 35 normal (NL) colorectal specimens. Histologically, the cohort comprised 14 adenomas (AD), 10 serrated lesions (6 hyperplastic polyps and 4 sessile serrated polyps) (SER), and 6 specimens with unclassified histology (UNC). The dataset was preprocessed to include cell-type annotations, Fig. 5B. Consistent with our primary dataset, two distinct epithelial clusters, ASC (adenoma stem-like cells) and SSC (serrated stem-like cells), were enriched in polyp specimens, Fig. 5C. Notably, ASCs were predominantly enriched in tubular adenomas, whereas SSCs were more abundant in serrated polyps, mirroring the histologic distinctions observed in Chen et al., 2021. Using the “AddModuleScore” function in Seurat, we calculated a TIC-associated gene signature score for each cell in the COLONMAP dataset, identifying TIC-high cells based on their module scores rather than via label transfer, Fig. 5D. The majority of TICs were predominantly derived from adenoma specimens, Fig. 5E. A subsequent Chi-square correlation analysis again demonstrated a significant association between TIC-high cells and tubular adenomas, reinforcing the notion that tubular adenomas likely originate from stem cell expansion, Fig. 5F.

Fig. 5.

Fig 5 dummy alt text

Enrichment of TIC-high epithelial states across colorectal lesion types and epithelial subpopulations. A) Heatmap showing enrichment scores of TIC-high and TIC-low transcriptional programs across histopathologic categories (ADC, AD, NL, SSA, TSA), indicating preferential enrichment of the TIC-high program in adenocarcinoma and adenoma. B) UMAP visualization of epithelial cells colored by major epithelial subtypes, including absorptive (ABS), ascending colon–like (ASC), colonocyte (CT), enteroendocrine (EE), goblet (GOB), secretory stem cell (SSC), stem/transit-amplifying (STM), tuft/TA (TAC), and tuft (TUF) populations. C) Stacked bar plots showing the proportion of cells from each histopathologic category (adenomas; AD, normal; NL, serrated; SER, unconfirmed histology; UNC) within individual epithelial subtypes, highlighting differential disease composition across lineages. D) UMAP embedding highlighting TIC-high (red) and TIC-low (gray) cells, demonstrating localization of TIC-high cells within specific epithelial compartments. E) Overall proportion of TIC-high cells across histopathologic categories, with the highest frequency observed in adenomas (AD) compared with normal (NL) and serrated (SER) lesions. F) Heatmap summarizes enrichment scores of TIC-high and TIC-low programs across disease categories to confirm strong association of the TIC-high state with adenomatous lesions and relative depletion in normal epithelium.

Integrated characterization of cell cycle, transcriptional programs, and cell–cell communication in TIC vs nSTM cells

Cell cycle differences between nSTM and TIC populations

We conducted a multi‑layered comparison between TICs and nSTM cells within normal epithelium to define distinct cellular states. TICs exhibited a more quiescent cell cycle profile relative to nSTMs, with significantly lower S‑phase and G2/M‑phase scores (Wilcoxon rank‑sum tests, p < 2.2  ×  10⁻¹⁶ for both). The distribution of cell cycle phases also differed markedly (χ²(2) = 1296.6, p < 2.2  ×  10⁻¹⁶), as nSTMs were predominantly assigned to S and G2/M phases, whereas TICs were enriched in G1, Fig. 6A–C.

Fig. 6.

Fig 6 dummy alt text

TIC cells exhibit reduced proliferation, distinct transcriptional programs, and rewired intercellular signaling compared with non-STM cells. Violin plots showing A) S-phase and B) G2/M-phase scores for non-STM (nSTM) and TIC cells, demonstrating significantly reduced cell-cycle activity in TICs (P = 2.2 × 10⁻¹⁶, Wilcoxon test). C) Proportional distribution of cell-cycle phases (G1, S, G2/M) in nSTM and TIC populations, confirming enrichment of TICs in G1 and depletion of cycling states. This quiescent state is consistent with tumor-initiating cell biology, reflecting stem-like properties that enable stress resistance and long-term regenerative potential D) Gene set enrichment analysis (GSEA) comparing TIC and STM transcriptional programs. TICs are enriched for inflammatory, hypoxia, NF-κB, KRAS, EMT, and TGF-β signaling, whereas STM cells are enriched for proliferative programs including MYC targets, DNA repair, mitotic spindle, E2F targets, and G2/M checkpoint pathways. E) Dot plot of transcription factor activity and expression in nSTM versus TIC cells, highlighting preferential activation of differentiation- and stress-associated factors in TICs and cell-cycle regulators in nSTM cells. F) Differential CellChat analysis of outgoing and incoming signaling pathways between TIC and STM populations, revealing altered communication via MK, MIF, APP, laminin, and related pathways in TICs. G) Heatmap showing associations between key transcription factors and ligand-mediated signaling pathways, indicating coordinated regulation of TIC-enriched signaling states. H) Pathway–ligand significance matrix linking EMT and hypoxia programs with laminin, thrombospondin (THBS), MK, and MIF signaling in TICs. I) Integrated regulatory network summarizing interactions among TIC-associated ligands, transcription factors, and hallmark pathways (EMT and hypoxia), illustrating a coordinated, non-proliferative, stress-adapted TIC state.

Differential gene expression and pathway enrichment

Differential expression analysis revealed distinct transcriptional programs between TICs and nSTMs (Fig. S5A). Gene Set Enrichment Analysis (GSEA) using Hallmark pathways showed that inflammatory and hypoxia-related signatures, including TNFA_SIGNALING_VIA_NFKB and HYPOXIA, were enriched in TICs. In contrast, cell cycle–associated programs such as G2M_CHECKPOINT, E2F_TARGETS, and MITOTIC_SPINDLE were enriched in nSTMs, Fig. 6D, Table S4). These results highlight the transcriptional divergence between quiescent TICs and proliferative nSTMs.

Transcription factor activity differences

We inferred transcription factor (TF) activities using the DoRothEA/VIPER framework. This analysis identified significant differential activity between TICs and nSTMs. Proliferation-linked TFs (e.g., E2F2, E2F3, E2F4, FOXM1, TFDP1) were attenuated in TICs, whereas TFs such as RUNX1, RARA, SMAD3, FOXO1 exhibited divergent activity patterns consistent with distinct cellular programs, Fig. 6E, Fig. S5B.

Cell–cell communication landscape

CellChat analysis revealed distinct intercellular signaling profiles between TICs and nSTMs, Fig. S6. Pathways including APP, LAMININ, THBS, MIF, and MK were among the most differentially engaged, Fig. 6F. Focusing on the APP signaling pathway, TICs exhibited stronger interaction strength with most target populations compared with nSTMs. Notably, APP–CD74 signaling to fibroblasts was absent in nSTMs but present in TICs, Fig. S7A,B. Quantification of the communication probability for the fibroblast APP–CD74 interaction revealed a higher probability from TICs (0.0370, p = 0.05) than from nSTMs (0.0323, p = 0.24), indicating stronger TIC–fibroblast signaling, Fig. S7C. Consistently, APP expression was significantly higher in TICs compared with nSTMs, while the APP-associated receptor ITGB1 was comparable between the populations, Fig. S7D. Expression of APLP2, a homolog of APP, did not differ between TICs and nSTMs, suggesting that reduced APP ligand availability rather than receptor expression or compensation by related family members underlies the loss of APP signaling from nSTMs to fibroblasts. To assess ligand-driven regulation of fibroblast gene expression, we performed targeted NicheNet analysis using TICs and nSTMs as sender populations and fibroblasts as receivers. Ligands expressed in TICs or nSTMs with cognate receptors detected in fibroblasts were prioritized, and ligand regulatory activity was inferred based on their ability to predict fibroblast variable gene expression. Comparison of predicted ligand activities revealed largely overlapping regulatory potentials between TIC and nSTM-derived ligands, with no major differences in top-ranked ligands influencing fibroblast transcriptional programs, Fig. S7E. Collectively, these results support that TICs engage distinct signaling pathways with the microenvironment, notably enhanced APP-mediated interactions with fibroblasts.

Integration of TF, cell–cell communication, and hallmark pathways

Integration of TF activity, differential signaling pathways, and enriched Hallmark programs revealed coordinated regulatory modules. Scaled associations between TFs and selected signaling pathways showed that classic cell cycle TFs link to specific communication axes, whereas other TFs associate with pathways reflecting stress and differentiation programs, Fig. 6G. Overlap analysis between CellChat ligand–receptor gene sets and Hallmark EMT and HYPOXIA pathways further supported functional links between intercellular signaling and core transcriptional processes, Fig. 6H. Finally, an integrated network combining differential TF activity, selected CellChat pathways, and Hallmark programs illustrated that key regulators (e.g., E2F family members, FOXM1, TFDP1) associate with both communication pathways (APP, LAMININ, THBS, MIF, MK) and enriched pathways such as epithelial–mesenchymal transition and hypoxia, reflecting coordinated shifts in regulatory and signaling states distinguishing TICs from nSTMs, Fig. 6I. Together, these analyses demonstrate that TICs and nSTMs occupy distinct molecular states defined by coordinated differences in cell cycle progression, transcriptional regulation, and intercellular communication.

TIC-specific biomarkers identified and validated by statistical and spatial analyses

To identify markers that specifically distinguish TICs from other epithelial cells in normal epithelium, we performed a differential gene expression analysis on normal epithelial cells within Cluster 0. These TIC-specific markers are distinct from the broader TIC gene signature used to score cells, providing a focused set of genes that uniquely define TIC identity in normal tissue. A differential gene expression analysis was performed on normal epithelial cells. Comparative analysis of TICs versus normal epithelial cells revealed strong enrichment of ETS2, SLC12A2, and LEFTY1 within TIC populations, Fig. 7A,B. Each gene demonstrated high diagnostic performance in distinguishing TICs from normal epithelium with sensitivities ranging from 0.73 to 0.86, Table S5. Spatial validation by RNA-FISH confirmed the presence of TICs in normal crypt epithelium marked by elevated ETS2 transcript levels, Fig. 7C. Notably, TICs were also identified in polyp specimens, Fig. 7C. The COLONMAP dataset was analyzed to evaluate the diagnostic performance of TIC marker genes. Supporting our primary data, the ETS2, SLC12A2 and LEFTY1 showed higher expression on TICs in comparison with normal epithelium, Fig. 7D, Table S5. Statistical analysis showed high sensitivity and specificity for the detection of TICs for these genes, Fig. 7E.

Fig. 7.

Fig 7 dummy alt text

Identification and validation of TIC-associated marker genes ETS2, SLC12A2, and LEFTY1. A) Violin plots showing single-cell expression levels of ETS2, SLC12A2, and LEFTY1 in normal epithelial cells (nEpi) versus TICs, demonstrating significant upregulation of all three genes in the TIC population. Horizontal bars indicate median expression. B) ROC curves evaluating the ability of ETS2, SLC12A2, and LEFTY1 to distinguish TICs from non-TIC epithelial cells, with areas under the curve (AUCs) of 0.82, 0.80, and 0.79, respectively. C) Representative RNA in situ hybridization images showing ETS2 RNA expression (TRITC, red) in polyp and normal colonic tissues, with increased signal in polyp epithelium compared with normal mucosa. Nuclei are counterstained with DAPI (blue). Insets and arrowheads highlight epithelial regions with elevated expression. Scale bars as indicated. D) Violin plots showing expression of ETS2, SLC12A2, and LEFTY1 in an independent validation cohort, confirming enrichment in TICs relative to nEpi cells. E) ROC curves from the validation cohort demonstrating robust discriminatory performance of SLC12A2 (AUC = 0.82), ETS2 (AUC = 0.76), and LEFTY1 (AUC = 0.75) for identifying TICs.

Discussion

In this study, we applied single cell RNA sequencing (scRNA-seq) to paired biopsies of colonic adenomas and adjacent normal mucosa from seven patients to investigate the earliest cellular and molecular events in colorectal tumorigenesis. Our analysis uncovered two tumor associated epithelial populations, including a tumor specific stem like (tSTM, cluster 0) state marked by OLFM4 and LGR5, and a tumor specific deep crypt secretory (tDCS, cluster 10) state characterized by REG4 with limited MUC2 expression. Subclustering of the tSTM population revealed eight epithelial subsets, among which subclusters predominantly derived from histologically normal mucosa emerged as candidate tumor initiating cells (TICs). Although our dataset did not directly measure canonical Wnt activity (e.g., β catenin targets), the enrichment of LGR5 and OLFM4 in TICs is consistent with Wnt/β catenin pathway engagement, as LGR5 is a well characterized Wnt target and core marker of intestinal stem cells and has been repeatedly identified as a defining feature of intestinal stem cells and colorectal cancer stem cells [40,41]. CytoTRACE, pseudotime ordering, and RNA velocity consistently placed TICs at the root of lineage trajectories progressing toward polyp enriched tSTM cells, supporting their role as precursors of neoplastic stem like states.

The transcriptional trajectory from TICs to tSTM cells was marked by progressive reprogramming, including activation of epithelial–mesenchymal transition (EMT) and suppression of oxidative phosphorylation, consistent with mounting evidence that metabolic plasticity and EMT are integral to early neoplastic transformation and cancer stem like states in colorectal tissues [[42], [43], [44], [45], [46]]. GSVA and GSEA analyses further revealed enrichment of proliferative, inflammatory, and stress associated pathways, including E2F, MYC, KRAS, and TNFα/NFκB signaling. Copy number variation (CNV) profiling provided orthogonal evidence of genomic instability in polyp tSTM cells relative to their normal counterparts, while TICs exhibited largely stable genomes, consistent with their origin from normal epithelium. Together, these observations provide a framework for understanding early cellular and molecular events in colorectal neoplasia and generate testable hypotheses regarding the mechanisms driving the transition from TICs to tumor-like stem cells.

Pseudotime and differential expression analyses identified SOD3 and GPRC5A as early transformation markers, validated by RNA FISH in adenomatous crypts. Comparative analyses of TICs versus normal epithelium identified ETS2, SLC12A2, and LEFTY1 as robust TIC specific biomarkers with high diagnostic sensitivity and specificity. To address specimen heterogeneity and validate these findings, we analyzed an independent scRNA-seq dataset from the COLONMAP study, comprising 30 polyp and 35 normal colorectal specimens across multiple histologic subtypes [38]. This analysis confirmed that TIC like cells were enriched predominantly in tubular adenomas and that ETS2, SLC12A2, and LEFTY1 were consistently upregulated in these cells. By including specimens with varying histologies and from independent patients, the COLONMAP validation strengthens the generalizability of our observations, mitigating limitations associated with small cohort size and heterogeneity in the primary dataset.

The COLONMAP validation is particularly important in light of recent work by Chen et al., who generated a comprehensive single cell atlas of human colorectal precancers that revealed distinct origins and microenvironmental programs for conventional adenomas versus serrated polyps [38]. Their study showed that conventional adenomas arise from Wnt driven expansion of stem cells, whereas serrated polyps derive from differentiated cells through gastric metaplasia, with divergent immune microenvironments associated with these precancer pathways. Our data align with these observations by demonstrating that TICs in the conventional adenoma pathway resemble normal stem like cells and follow a progression toward stem like tumor states. The strong enrichment of TICs in tubular adenomas in both our primary and COLONMAP cohorts supports a model in which stem like expansion, rather than differentiated or metaplastic processes, underlies early transformation in this route, distinguishing the conventional adenoma pathway from serrated neoplasia.

Importantly, our study extends prior work by evaluating intercellular communication dynamics between TICs and the surrounding microenvironment. CellChat analysis revealed that TICs engage enhanced APP–CD74 signaling with fibroblasts, which was largely absent in nSTM cells. Specifically, APP–CD74 interactions were significantly more probable from TICs than nSTMs, while expression of APP itself was elevated in TICs and the receptor ITGB1 remained comparable between populations. Expression of the APP homolog APLP2 did not differ, suggesting that loss of APP ligand availability, rather than receptor downregulation or compensatory family member expression, underlies the absence of APP signaling from nSTMs. These findings indicate that TICs not only acquire intrinsic transcriptional and metabolic reprogramming but also actively remodel their local microenvironment via ligand mediated crosstalk, which may support early neoplastic progression and stem cell niche establishment. Collectively, these insights highlight the possibility that early stromal modulation may represent a targetable axis for intervention in the initial stages of colorectal neoplasia.

Functionally, TIC–fibroblast APP signaling may reinforce tumor initiating programs by influencing fibroblast behavior, consistent with prior studies linking APP–CD74 interactions to cellular adhesion, proliferation, and inflammatory responses. Our integrated analysis of transcription factor activity, signaling pathways, and Hallmark programs further revealed coordinated regulatory modules in TICs, where classic cell cycle TFs were coupled to communication pathways and stress/differentiation associated TFs aligned with EMT and hypoxia signatures. This underscores the interplay between intrinsic regulatory networks and extrinsic microenvironmental signaling in establishing early tumorigenic states.

In summary, our study maps the earliest molecular events in colorectal tumorigenesis, revealing the emergence of TICs from normal epithelium, their progressive reprogramming toward stem-like and mesenchymal states, and their selective engagement of microenvironmental signaling such as APP–CD74. Validation in the COLONMAP cohort confirms these patterns across patient samples and histologic subtypes. These findings highlight TIC-associated programs as potential targets for early detection, prevention, and intervention, and motivate future functional studies using organoid or xenograft models to dissect their role in adenoma initiation and progression.

Abbreviations

  • AUC – Area Under the Curve

  • CRC – Colorectal Cancer

  • CSC – Cancer Stem Cell

  • DCS – Deep Crypt Secretory

  • DEG – Differentially Expressed Gene

  • EMT – Epithelial-to-Mesenchymal Transition

  • FFPE – Formalin-Fixed Paraffin-Embedded

  • FISH (RNA-FISH) – RNA Fluorescence In Situ Hybridization

  • GSEA – Gene Set Enrichment Analysis

  • GS – Gene Signature

  • GSVA – Gene Set Variation Analysis

  • H&E – Hematoxylin & Eosin

  • MSigDB – Molecular Signatures Database

  • nDCS – Normal Deep Crypt Secretory

  • NES – Normalized Enrichment Score

  • nSTM – Normal Stem

  • PCA – Principal Component Analysis

  • OXPHOS – Oxidative Phosphorylation

  • R – R Statistical Computing Environment

  • RNA-FISH – RNA Fluorescence In Situ Hybridization

  • scRNA-seq – single cell RNA sequencing

  • STM – Stem

  • tDCS – Tumor-Specific Deep Crypt Secretory

  • TIC – Tumor Initiating Cell

  • t-SNE – t-distributed Stochastic Neighbor Embedding

  • tSTM – Tumor-Specific Stem

  • UMAP – Uniform Manifold Approximation and Projection

  • UMI – Unique Molecular Identifier

Declarations

Ethics approval and consent to participate

Fresh human colon specimens were obtained with informed written consent from patients undergoing routine colonoscopy at the University of Michigan Hospital. All patient reports and human tissues were deidentified prior to the study. Patient specimens were collected with the approval of the Michigan Medicine IRB under protocol HUM00102771.

Consent for publication

Not Applicable

Availability of data and materials

The raw transcriptome data will be deposited in the Genome Sequence Archive (GSA). All other relevant data are available on request from the authors. The code used to generate the graphic presentation is available on GitHub (https://github.com/tstephie/Jaiswal_scRNA-seq_colon).

Funding

This study was funded in part by the National Institutes of Health (NIH) U01 CA230669 and R01 CA249851 (TDW), and R37CA262209 (JS).

Model for early colorectal tumor initiation and progression

Schematic illustrating a proposed stepwise trajectory from histologically normal colonic mucosa to adenoma through the emergence of tumor-initiating cells (TICs) and subsequent tumor-specific stem-like cells (tSTMs). Within normal mucosa, a subset of epithelial cells acquires a TIC state characterized by enhanced stemness, early activation of epithelial–mesenchymal transition (EMT) programs, and suppression of oxidative phosphorylation. TICs then progress toward tSTMs, marked by upregulation of early transformation markers (e.g., SOD3, GPRC5A), acquisition of copy number variations (CNVs), and increased interaction with the microenvironment. Notably, tSTMs engage fibroblasts through APP–CD74–mediated signaling, contributing to niche remodeling and aberrant epithelial expansion, ultimately culminating in adenoma formation.

CRediT authorship contribution statement

Sangeeta Jaiswal: Conceptualization, Formal analysis, Investigation, Visualization, Validation, Writing – original draft. Stephanie The: Formal analysis, Investigation, Validation, Visualization. Tse-Shao Chang: Data curation, Project administration. Jiaqi Shi: Funding acquisition, Supervision. Thomas D Wang: Supervision, Writing – review & editing.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

We thank MC Shivamadhu for technical support and University of Michigan Advanced Genomic core for RNA sequencing.

Footnotes

Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.neo.2026.101300.

Appendix. Supplementary materials

mmc1.docx (2.6MB, docx)
mmc2.xlsx (9.2KB, xlsx)
mmc3.csv (2.5MB, csv)
mmc4.xlsx (9.9KB, xlsx)
mmc5.csv (5.6KB, csv)
mmc6.xlsx (10.4KB, xlsx)

References

  • 1.Morgan E., Arnold M., Gini A., Lorenzoni V., Cabasag C.J., Laversanne M., Vignat J., Ferlay J., Murphy N., Bray F. Global burden of colorectal cancer in 2020 and 2040: incidence and mortality estimates from GLOBOCAN. Gut. 2023;72:338–344. doi: 10.1136/gutjnl-2022-327736. [DOI] [PubMed] [Google Scholar]
  • 2.Siegel R.L., Wagle N.S., Cercek A., Smith R.A., Jemal A. Colorectal cancer statistics, 2023. CA Cancer J. Clin. 2023;73:233–254. doi: 10.3322/caac.21772. [DOI] [PubMed] [Google Scholar]
  • 3.Fearon E.R., Vogelstein B. A genetic model for colorectal tumorigenesis. Cell. 1990;61:759–767. doi: 10.1016/0092-8674(90)90186-i. [DOI] [PubMed] [Google Scholar]
  • 4.Leslie A., Carey F.A., Pratt N.R., Steele R.J. The colorectal adenoma–carcinoma sequence. British J. Surg. 2002;89:845–860. doi: 10.1046/j.1365-2168.2002.02120.x. [DOI] [PubMed] [Google Scholar]
  • 5.Zhao H., Ming T., Tang S., Ren S., Yang H., Liu M., Tao Q., Xu H. Wnt signaling in colorectal cancer: pathogenic role and therapeutic target. Mol. Cancer 22;21:144. [DOI] [PMC free article] [PubMed]
  • 6.Yuan J., Dong X., Yap J., Hu J. The MAPK and AMPK signalings: interplay and implication in targeted cancer therapy. J. Hematol. Oncol. 2020;13:113. doi: 10.1186/s13045-020-00949-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Armaghany T., Wilson J.D., Chu Q., Mills G. Genetic alterations in colorectal cancer. Gastrointest. Cancer Res. 2012;5:19–27. [PMC free article] [PubMed] [Google Scholar]
  • 8.Lu D., Li X., Yuan Y., Li Y., Wang J., Zhang Q., Yang Z., Gao S., Zhang X., Zhou B. Integrating TCGA and single-cell sequencing data for colorectal cancer: a 10-gene prognostic risk assessment model. Discov. Onc. 2023;14:168. doi: 10.1007/s12672-023-00789-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Kim Z., Lee J., Yoon Y.E., Yun J.W. Unveiling prognostic RNA biomarkers through a multi-cohort study in colorectal cancer. Int. J. Mol. Sci. 2024;25:3317. doi: 10.3390/ijms25063317. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Ayob A.Z., Ramasamy T.S. Cancer stem cells as key drivers of tumour progression. J. Biomed. Sci. 2018;25:20. doi: 10.1186/s12929-018-0426-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Li J., Wu Z., Zhao L., Liu Y., Su Y., Gong X., Liu F., Zhang L. The heterogeneity of mesenchymal stem cells: an important issue to be addressed in cell therapy. Stem Cell Res. Ther. 2023;14:381. doi: 10.1186/s13287-023-03587-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Wang A., Chen L., Li C., Zhu Y. Heterogeneity in cancer stem cells. Cancer Lett. 2015;357:63–68. doi: 10.1016/j.canlet.2014.11.040. [DOI] [PubMed] [Google Scholar]
  • 13.Jahangiri L. Cancer stem cell markers and properties across gastrointestinal cancers. Curr. Tissue Microenviron. Rep. 2023;4:77–89. [Google Scholar]
  • 14.Jalil A.T., Abdulhadi M.A., Al Jawadri A.M.H., Talib H.A., Al-Azzawi A.K.J., Zabibah R.S., Ali A. Cancer stem cells in colorectal Cancer: implications for targeted immunotherapies. J. Gastrointest. Cancer. 2023;54:1046–1057. doi: 10.1007/s12029-023-00945-0. [DOI] [PubMed] [Google Scholar]
  • 15.Zhao Q., Zong H., Zhu P., Su C., Tang W., Chen Z., Jin S. Crosstalk between colorectal CSCs and immune cells in tumorigenesis, and strategies for targeting colorectal CSCs. Exp. Hematol. Oncol. 2024;13:6. doi: 10.1186/s40164-024-00474-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Huang E.H., Hynes M.J., Zhang T., Ginestier C., Dontu G., Appelman H., Fields J.Z., Wicha M.S., Boman B.M. Aldehyde dehydrogenase 1 is a marker for normal and malignant human colonic stem cells (SC) and tracks SC overpopulation during colon tumorigenesis. Cancer Res. 2009;69:3382–3389. doi: 10.1158/0008-5472.CAN-08-4418. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Hirsch D., Barker N., McNeil N., Hu Y., Camps J., McKinnon K., Clevers H., Ried T., Gaiser T. LGR5 positivity defines stem-like cells in colorectal cancer. Carcinogenesis. 2013;35:849–858. doi: 10.1093/carcin/bgt377. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Ihemelandu C., Naeem A., Parasido E., Berry D., Chaldekas K., Harris B.T., Rodriguez O., Albanese C. Clinicopathologic and prognostic significance of LGR5, a cancer stem cell marker in patients with colorectal cancer. Colorectal. Cancer. 2019;8:Crc11. doi: 10.2217/crc-2019-0009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Huang D., Ma N., Li X., Gou Y., Duan Y., Liu B., Xia J., Zhao X., Wang X., Li Q., Rao J., Zhang X. Advances in single-cell RNA sequencing and its applications in cancer research. J. Hematol. Oncol. 2023;16:98. doi: 10.1186/s13045-023-01494-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Chisanga D., Keerthikumar S., Pathan M., Ariyaratne D., Kalra H., Boukouris S., Mathew N.A., Al Saffar H., Gangoda L., Ang C.S., Sieber O.M., Mariadason J.M., Dasgupta R., Chilamkurti N., Mathivanan S. Colorectal cancer atlas: an integrative resource for genomic and proteomic annotations from colorectal cancer cell lines and tissues. Nucleic. Acids. Res. 2016;44:D969–D974. doi: 10.1093/nar/gkv1097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Anaparthy N., Ho Y.J., Martelotto L., Hammell M., Hicks J. Single-cell applications of next-generation sequencing. Cold. Spring. Harb. Perspect. Med. 2019;9 doi: 10.1101/cshperspect.a026898. a026898. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Zhao J., Shi Y., Cao G. The application of single-cell RNA sequencing in the inflammatory tumor microenvironment. Biomolecules. 2023;13:344. doi: 10.3390/biom13020344. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Choi Y.H., Kim J.K. Dissecting cellular heterogeneity using single-cell RNA sequencing. Mol. Cells. 2019;42:189–199. doi: 10.14348/molcells.2019.2446. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.O’Brien C.A., Pollett A., Gallinger S., Dick J.E. A human colon cancer cell capable of initiating tumour growth in immunodeficient mice. Nature. 2007;445:106–110. doi: 10.1038/nature05372. [DOI] [PubMed] [Google Scholar]
  • 25.Ricci-Vitiani L., Lombardi D.G., Pilozzi E., Biffoni M., Todaro M., Peschle C., De Maria R. Identification and expansion of human colon-cancer-initiating cells. Nature. 2007;445:111–115. doi: 10.1038/nature05384. [DOI] [PubMed] [Google Scholar]
  • 26.Dalerba P., Dylla S.J., Park I.K., Liu R., Wang X., Cho R.W., Hoey T., Gurney A., Huang E.H., Simeone D.M., et al. Phenotypic characterization of human colorectal cancer stem cells. Proc. Natl. Acad. Sci. u S. a. 2007;104:10158–10163. doi: 10.1073/pnas.0703478104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Zheng G.X., Terry J.M., Belgrader P., Ryvkin P., Bent Z.W., Wilson R., Ziraldo S.B., Wheeler T.D., McDermott G.P., Zhu J., Gregory M.T., Shuga J., Montesclaros L., Underwood J.G., Masquelier D.A., Nishimura S.Y., Schnall-Levin M., Wyatt P.W., Hindson C.M., Bharadwaj R., Wong A., Ness K.D., Beppu L.W., Deeg H.J., McFarland C., Loeb K.R., Valente W.J., Ericson N.G., Stevens E.A., Radich J.P., Mikkelsen T.S., Hindson B.J., Bielas J.H. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 2017;8 doi: 10.1038/ncomms14049. Jan 16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Hao Y., Hao S., Andersen-Nissen E., Mauck W.M., Zheng S., Butler A., Lee M.J., Wilk A.J., Darby C., Zager M., Hoffman P., Stoeckius M., Papalexi E., Mimitou E.P., Jain J., Srivastava A., Stuart T., Fleming L.M., Yeung B., Rogers A.J., McElrath J.M., Blish C.A., Gottardo R., Smibert P., Satija R. Integrated analysis of multimodal single-cell data. Cell. 2021;184:3573–3587. doi: 10.1016/j.cell.2021.04.048. 3rd. e29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Team RC . R Foundation for Statistical Computing; 2022. R: A language and environment for statistical computing. [Google Scholar]
  • 30.Wu T., Hu E., Xu S., Chen M., Guo P., Dai Z., Feng T., Zhou L., Tang W., Zhan L., Fu X., Liu S., Bo X., Yu G. clusterProfiler 4.0: a universal enrichment tool for interpreting omics data. Innovation. 2021;2 doi: 10.1016/j.xinn.2021.100141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Subramanian A., Tamayo P., Mootha V.K., Mukherjee S., Ebert B.L., Gillette M.A., Paulovich A., Pomeroy S.L., Golub T.R., Lander E.S., Mesirov J.P. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. u S. a. 2005;102:15545–15550. doi: 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Liberzon A., Subramanian A., Pinchback R., Thorvaldsdóttir H., Tamayo P., Mesirov J.P. Molecular signatures database (MSigDB) 3.0. Bioinformatics. 2011;27:1739–1740. doi: 10.1093/bioinformatics/btr260. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Ashburner M., Ball C.A., Blake J.A., Botstein D., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., et al. Gene ontology: tool for the unification of biology. Nat. Genet. 2000;25:25–29. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Consortium Gene Ontology. The Gene ontology resource: enriching a GOld mine. Nucleic. Acids. Res. 2021;49 doi: 10.1093/nar/gkaa1113. D325-d334. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Gulati G.S., Sikandar S.S., Wesche D.J., Manjunath A., Bharadwaj A., Berger M.J., Ilagan F., Kuo A.H., Hsieh R.W., Cai S., Zabala M., Scheeren F.A., Lobo N.A., Qian D., Yu F.B., Dirbas F.M., Clarke M.F., Newman A.M. Single-cell transcriptional diversity is a hallmark of developmental potential. Science (1979) 2020;367(6476):405–411. doi: 10.1126/science.aax0249. Jan 24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Tickle T., Tirosh I., Georgescu C., Brown M., Haas B. Klarman Cell Observatory, Broad Institute of MIT and Harvard; Cambridge, MA, USA: 2019. inferCNV of the Trinity CTAT Project [Internet]https://github.com/broadinstitute/inferCNV Available from: [Google Scholar]
  • 37.Trapnell C., Cacchiarelli D., Grimsby J., Pokharel P., Li S., Morse M., Lennon N.J., Livak K.J., Mikkelsen T.S. Rinn JL: the dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechol. 2014;32:381–386. doi: 10.1038/nbt.2859. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Chen B., Scurrah C.R., McKinley E.T., Simmons A.J., Ramirez-Solano M.A., Zhu X., Markham N.O., Heiser C.N., Vega P.N., Rolong A., Kim H., Sheng Q., Drewes J.L., Zhou Y., Southard-Smith A.N., Xu Y., Ro J., Jones A.L., Revetta F., Berry L.D., Niitsu H., Islam M., Pelka K., Hofree M., Chen J.H., Sarkizova S., Ng K., Giannakis M., Boland G.M., Aguirre A.J., Anderson A.C., Rozenblatt-Rosen O., Regev A., Hacohen N., Kawasaki K., Sato T., Goettel J.A., Grady W.M., Zheng W., Washington M.K., Cai Q., Sears C.L., Goldenring J.R., Franklin J.L., Su T., Huh W.J., Vandekar S., Roland J.T., Liu Q., Coffey R.J., Shrubsole M.J., Lau K.S. Differential pre-malignant programs and microenvironment chart distinct paths to malignancy in human colorectal polyps. Cell. 2021;184(26) doi: 10.1016/j.cell.2021.11.031. Dec 226262-6280.e26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Jin S., Guerrero-Juarez C.F., Zhang L., et al. Inference and analysis of cell-cell communication using CellChat. Nat. Commun. 2021;12:1088. doi: 10.1038/s41467-021-21246-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.He S., Zhou H., Zhu X., Hu S., Fei M., Wan D., Gu W., Yang X., Shi D., Zhou J., Zhou J., Zhu Z., Wang L., Li D., Zhang Y. Expression of Lgr5, a marker of intestinal stem cells, in colorectal cancer and its clinicopathological significance. Biomed. PharmacOther. 2014;68(5):507–513. doi: 10.1016/j.biopha.2014.03.016. JunEpub 2014 Mar 31. PMID: 24751002. [DOI] [PubMed] [Google Scholar]
  • 41.Haegebarth A., Clevers H. Wnt signaling, lgr5, and stem cells in the intestine and skin. Am. J. Pathol. 2009;174(3):715–721. doi: 10.2353/ajpath.2009.080758. MarEpub 2009 Feb 5. PMID: 19197002; PMCID: PMC2665733. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Mani S.A., Guo W., Liao M.J., Eaton E.N., Ayyanan A., Zhou A.Y., Brooks M., Reinhard F., Zhang C.C., Shipitsin M., Campbell L.L., Polyak K., Brisken C., Yang J., Weinberg R.A. The epithelial-mesenchymal transition generates cells with properties of stem cells. Cell. 2008;133:704–715. doi: 10.1016/j.cell.2008.03.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Zhi Y., Mou Z., Chen J., He Y., Dong H., Fu X., Wu Y. B7H1 Expression and epithelial-to-mesenchymal transition phenotypes on colorectal cancer stem-like cells. PLoS. One. 2015;10 doi: 10.1371/journal.pone.0135528. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Chang Y.W., Su Y.J., Hsiao M., Wei K.C., Lin W.H., Liang C.L., Chen S.C., Lee J.L. Diverse targets of β-catenin during the epithelial-mesenchymal transition define cancer stem cells and predict disease relapse. Cancer Res. 2015;75:3398–3410. doi: 10.1158/0008-5472.CAN-14-3265. [DOI] [PubMed] [Google Scholar]
  • 45.Radisky D.C., LaBarge M.A. Epithelial-mesenchymal transition and the stem cell phenotype. Cell Stem Cell. 2008;2:511–512. doi: 10.1016/j.stem.2008.05.007. [DOI] [PubMed] [Google Scholar]
  • 46.Scheel C., Weinberg R.A. Phenotypic plasticity and epithelial-mesenchymal transitions in cancer and normal stem cells? Int. J. Cancer. 2011;129:2310–2314. doi: 10.1002/ijc.26311. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

mmc1.docx (2.6MB, docx)
mmc2.xlsx (9.2KB, xlsx)
mmc3.csv (2.5MB, csv)
mmc4.xlsx (9.9KB, xlsx)
mmc5.csv (5.6KB, csv)
mmc6.xlsx (10.4KB, xlsx)

Data Availability Statement

The raw transcriptome data will be deposited in the Genome Sequence Archive (GSA). All other relevant data are available on request from the authors. The code used to generate the graphic presentation is available on GitHub (https://github.com/tstephie/Jaiswal_scRNA-seq_colon).


Articles from Neoplasia (New York, N.Y.) are provided here courtesy of Neoplasia Press

RESOURCES