Skip to main content
Discover Oncology logoLink to Discover Oncology
. 2025 Jun 5;16:1006. doi: 10.1007/s12672-025-02843-2

Multi-omics exploration of CAV1+ tumor epithelial subcelltype in oral squamous cell carcinoma and its impact on the immune microenvironment

Yan’an Shi 1,2,#, Yiting Shao 3,#, Bin Xia 1,2, Wenying Yang 2,4, Yan Wang 2,5, Lifu Yu 1,2,, Yonghui Zhang 2,5,
PMCID: PMC12141713  PMID: 40471478

Abstract

Background

Oral squamous cell carcinoma originating from the gingiva and buccal mucosa (OSCC-GB) is closely associated with complex molecular mechanisms and immune evasion phenomena within the tumor microenvironment. This study aims to reveal the characteristics of tumor epithelial cell subcelltypes and their roles in tumor progression.

Methods

We organized and analyzed single-cell RNA sequencing (scRNA-seq) data from Oral squamous cell carcinoma (OSCC), categorizing tumor epithelial cells into six subcelltypes. Utilizing spatial transcriptomics, we investigated the spatial distribution of these subcelltypes. The focus was on the high malignancy tumor epithelial subcelltype, aiming to identify specific transcription factors and analyze their effects on cell status and function. Furthermore, we examined the interactions between the high malignancy tumor epithelial subcelltype and immune cells within the tumor immune microenvironment, along with the underlying mechanisms.

Results

Our findings indicate that the CAV1+ epithelial subcelltype (CAV1+EIP) exhibits the highest copy number variation scores, significantly correlating with poor patient prognosis. Analysis of specific transcription factors reveals that high expression of LHX1 and ATF1 is associated with malignant features, while IGF2BP1, a target gene of these transcription factors, shows a negative correlation with immune regulatory pathways. Further investigations demonstrate that CAV1+EIP interacts with T cells through the NECTIN1-CD96 signaling network, potentially leading to immune evasion and tumor progression.

Conclusion

This study elucidates the critical role of the CAV1+EIP in the development of OSCC and its interplay with the immune microenvironment. These findings provide new insights into how tumor cells evade immune surveillance and guide future immunotherapeutic strategies.

Supplementary Information

The online version contains supplementary material available at 10.1007/s12672-025-02843-2.

Keywords: OSCC, CAV1, Tumor immune microenvironment

Introduction

OSCC is the most common malignant tumor in the head and neck region [1], particularly represented by OSCC-GB. This type predominantly occurs in the gingival and buccal areas of the oral cavity, including the buccal vestibule, buccal mucosa, and the region behind the molars [2, 3]. Patients with OSCC-GB exhibit high rates of recurrence and mortality, typically requiring surgical excision followed by radiation and chemotherapy [4, 5]. Most patients are diagnosed at advanced stages (Stage III or IV), and despite aggressive multimodal treatment, the risks of local recurrence and death remain markedly elevated [5]. In recent years, the incidence of oral cancer has not significantly declined, with the overall 5-year survival rate for advanced oral cancer being approximately 60%. Survival rates further decrease when metastasis is involved. This situation underscores the urgent need for in-depth research on OSCC-GB and the development of new therapeutic strategies [6].

The tumor microenvironment is a complex ecosystem composed of heterogeneous tumor cells, stromal cells, and various immune cells. It plays a crucial role in determining the type of immune response, the immune trajectory of the tumor, and its ultimate fate [7]. scRNA-seq serves as an important tool for studying the composition and heterogeneity of tumor cells and is widely applied in cancer research. scRNA-seq analysis provides high-resolution molecular phenotypes that help reveal cell populations, functional heterogeneity based on specific markers, and the molecular characteristics, signaling pathways, and dynamics involved in cancer progression [8]. Additionally, network analysis of scRNA-seq data can uncover extensive communication networks among immune cell populations [9].

The carcinogenic dynamics of oral cancer result from the interaction between the tumor and the host, particularly the immune system [10]. Cytokines are key regulators of the tumor microenvironment and chronic pro-tumor inflammation. Research has shown a significant presence of immunosuppressive cytokines in the microenvironment of oral cancer; for example, TNF-α and IL-8 are markedly overexpressed in OSCC, significantly promoting malignant transformation [11, 12]. In the OSCC microenvironment, certain tumor cells express high levels of immunosuppressive molecules such as PD-L1, which can bind to PD-1 on T cells, thereby inhibiting T cell activity [13, 14]. Immunosuppressive cytokines enhance the immune evasion capabilities of tumors by mechanisms that include inhibiting the activation of effector T cells, promoting the proliferation of regulatory cells, affecting the function of dendritic cells, and facilitating immune escape by tumor cells.

In this study, we aimed to gain a deeper understanding of the cellular heterogeneity and malignancy of tumor epithelial cells by analyzing scRNA-seq data from OSCC-GB. We assessed and identified the malignant features of different cell subcelltypes and utilized spatial transcriptomics to explore the spatial distribution of these subcelltypes within the tumor microenvironment. Furthermore, we focused on the molecular characteristics of high-malignancy tumor cell subcelltypes, particularly the role of specific transcriptional regulators, to elucidate how these factors influence cell status and function. Additionally, we investigated the interactions between high-malignancy tumor cell subcelltypes and immune cells within the tumor immune microenvironment, especially their potential impact on immune responses. This research not only provides significant theoretical foundations for understanding the complex molecular mechanisms and immune regulation in OSCC-GB but also lays the groundwork for the development of future targeted therapeutic strategies.

Methods

Data preprocessing and integration

We downloaded the scRNA-seq dataset GSE215403 from the Gene Expression Omnibus (GEO) database, which includes samples from 12 patients with OSCC-GB tissues. This dataset was specifically selected as it contains only OSCC samples from gingiva and buccal mucosa, matching our research focus on these oral cavity subsites. The availability of raw sequencing data for reprocessing was another key determining factor. The raw FASTQ files were aligned and quantified using the STARsolo tool, with the reference genome being GRCh38, resulting in the generation of a gene-cell sparse matrix file. Next, we loaded the data into the Seurat 4.4 environment for preprocessing. The initial filtering criteria included a feature count (nFeature_RNA) per cell between 200 and 6000, and a mitochondrial gene transcript proportion lower than 25%. After filtration, the data was normalized using the LogNormalize method, and 2000 highly variable genes were selected via variance stabilization transformation (VST). Based on these highly variable genes, we performed dimensionality reduction through principal component analysis (PCA), selecting the top 30 principal components for subsequent analyses. To eliminate technical batch effects between different samples, we applied the Harmony algorithm based on sample batch information for batch correction, generating corrected embedded spatial data. After integrating the data, we further reduced dimensions and visualized the data using the UMAP algorithm, ensuring significant reduction of technical biases between samples, which provided a high-quality data foundation for subsequent single-cell clustering and differential analysis.

Gene regulatory network (GRN) analysis

We employed the pySCENIC tool to explore the cellular states of OSCC and their potential gene regulatory networks. pySCENIC infers transcriptional regulatory networks by integrating information on the binding sites of transcription factors (TFs) with cis-regulatory elements, such as gene promoters or enhancers, thereby revealing the regulatory mechanisms behind gene expression.

First, we constructed an initial gene regulatory network using gene co-expression analysis (GRNBoost2 algorithm) to identify relationships between transcription factors and their potential target genes. Next, using the cisTarget tool and a species-specific database of cis-regulatory elements (such as the human transcription factor binding site dataset), we selected significantly enriched target gene modules to refine and optimize the scope of the regulons, ultimately generating a reliable transcription factor regulatory network.

Subsequently, we utilized the AUCell algorithm to calculate the activity scores of each regulon in individual cells, reflecting their levels of activity at the single-cell level. AUCell evaluates the activity of regulons by ranking gene expression and matching it with regulon target gene sets, thereby quantifying the role of transcription factor regulation in cellular states. Additionally, we explored the expression specificity of specific transcription factor regulatory networks based on the interactions between regulons.

Reference mapping-based projection analysis of OSCC-GB single-cell data

In this study, we utilized the single-cell atlas of gingival tissues from the Disco database as a reference dataset, employing a reference mapping approach to project our constructed OSCC-GB single-cell data onto the gingival cell atlas. This strategy aimed to elucidate the correspondence and biological significance between OSCC-GB cells and healthy gingival cells. The gingival single-cell transcriptomic atlas was downloaded from the Disco database, preprocessed, and annotated, containing gene expression data of various cell types present in the gingival tissue, thereby providing a high-quality reference framework for subsequent projection analyses.

In our analysis, we used a list of highly variable genes consistent with the Disco atlas to map the gene expression matrix of the normalized OSCC-GB single-cell dataset into the gene space of the reference atlas, ensuring that both datasets were compared in the same dimensional space. Using the FindTransferAnchors function in Seurat, we identified anchors between the OSCC-GB dataset and the gingival reference atlas. This step calculated the correspondence between the two datasets by comparing the local similarity of cells in the target and reference data, with default parameter settings to ensure the stability of the analysis.

Subsequently, we employed the TransferData function to project the cell types and associated characteristics from the reference atlas into the OSCC-GB dataset based on the identified anchors, predicting the types and states of OSCC-GB cells. Additionally, the projection results generated a unified embedded space, further integrating the target data with the reference atlas. Finally, we used UMAP for dimensionality reduction and visualization to display the projection distribution of OSCC-GB data onto the gingival reference atlas, thereby assessing the similarities between different cell types and their biological relevance.

Spatial transcriptomic data analysis of OSCC based on RCTD

This study employs Robust Cell Type Decomposition (RCTD) to analyze spatial transcriptomics data, demonstrating the spatial distribution of cell subcelltypes defined in single-cell data. The spatial transcriptomics data is sourced from the GSE208253 dataset in the GEO database, which includes both a gene expression matrix and spatial location information. The processed OSCC-GB single-cell dataset encompasses all cell populations and includes detailed definitions of epithelial subcelltypes, serving as a reference dataset for analysis.

Initially, we conducted quality control and normalization on both the spatial transcriptomics data and the single-cell data, extracting highly variable genes consistent with the single-cell dataset to align the feature spaces. Subsequently, we utilized the RCTD method to project the cell types from the single-cell data onto the spatial transcriptomics data, inferring the relative abundance of each cell type at various spatial points. We performed independent projection analyses for all epithelial subcelltypes. Additionally, we referenced the annotation results available on the authors’ data exploration platform (http://www.pboselab.ca/spatial_OSCC/) to integrate and calibrate the projection results, ensuring the accuracy and biological relevance of our analysis.

Ultimately, we illustrated the spatial distribution patterns of cell populations and epithelial subcelltypes using spatial heatmaps. We validated the functional characteristics of specific cell populations in the spatial dimension through differential gene analysis. This analysis provides significant insights into the spatial heterogeneity of cells in OSCC and their regulatory mechanisms.

Analysis of chromosomal copy number variations in OSCC based on InferCNV

This study employs the R package inferCNV (https://github.com/broadinstitute/inferCNV) to infer chromosomal copy number variations (CNVs) in somatic cells of OSCC-GB. We utilized the gene expression matrix from single-cell RNA sequencing data as input, defining T cells, NK cells, and macrophages as the normal reference group, while the remaining cells served as the test group for inferring chromosomal changes in tumor cells. Gene location information is based on the GRCh38 genome version, with genes ordered according to their physical chromosomal locations.

The analysis begins by constructing an analysis object using the CreateInfercnvObject function, followed by executing the run function with default parameters. These parameters include gene expression centering and sliding window smoothing to filter for chromosomal regions with significant changes. The results are presented in a heatmap format, where increases in copy number are indicated in red and decreases in blue. The x-axis represents the chromosomal positions of genes, while the y-axis shows the expression of each single cell or cell population.

By integrating existing tumor genomic research findings, we further validate and interpret the inferred CNV patterns, analyzing key genes involved in these chromosomal alterations and their impact on OSCC progression. This analysis aims to reveal the genomic instability of tumor cells and its biological significance.

GSEA and GO functional enrichment analysis of epithelial cell subcelltypes

To explore the biological functional characteristics of different epithelial cell subcelltypes, we conducted Gene Set Enrichment Analysis (GSEA) on six epithelial cell subcelltypes. Initially, we utilized the FindMarkers function from the Seurat package to rank the marker genes of these six subcelltypes based on log2 fold change (log2FC). Subsequently, we employed the clusterProfiler R package to perform GSEA analysis on the marker genes of each subcelltype, selecting the HALLMARK database as the background gene set.

We calculated the enrichment scores of gene sets in specific pathways using the GSEA function and applied the Benjamini–Hochberg method for adjustment to control the false discovery rate (P.adjust < 0.05). Finally, through the visualization of enrichment results, we identified significantly enriched pathways and biological processes. Additionally, for the significant differential genes corresponding to the six subcelltypes, we selected notable marker genes (p-value < 0.001) using the Wilcoxon rank-sum test and performed Gene Ontology (GO) enrichment analysis.

By integrating functional annotations, we further elucidated the potential roles of epithelial cell subcelltypes in the tumor microenvironment, providing a solid basis for understanding the heterogeneity and functional characteristics of epithelial cells in OSCC.

Prognostic analysis of marker genes in epithelial cell subcelltypes

To explore the potential relationship between epithelial cell subcelltypes and clinical prognosis in OSCC patients, we extracted the marker genes for each epithelial cell subcelltype and applied a stringent filter with a threshold of p < 0.001 to retain highly significant marker genes for subsequent analyses. We then utilized the Gene Set Variation Analysis (GSVA) algorithm to calculate the scores for each subcelltype in TCGA-OSCC samples based on the filtered marker genes, thereby quantifying the distribution of each subcelltype’s activity within the samples.

By combining the computed GSVA scores with the clinical information of patients (including overall survival time and survival status), we performed Kaplan–Meier survival analysis in the TCGA-OSCC cohort to evaluate the impact of varying levels of activity in different epithelial cell subcelltypes on patient prognosis.

Prognostic analysis of key genes based on TCGA data

To assess the prognostic value of key genes in OSCC patients, we utilized the TCGA-HNSCC dataset (https://xena.ucsc.edu/) for our analysis. We filtered the dataset to include only samples originating from the oral cavity, acquiring the RNA expression matrix and corresponding clinical information for the patients, including survival time and survival status. The specific analytical workflow is as follows:

First, based on the selected key genes, we extracted their expression data from the chosen samples and calculated a risk score for each patient. Subsequently, patients were divided into high-risk and low-risk groups. We employed Kaplan–Meier survival analysis to compare the overall survival rates (Overall Survival, OS) between the two patient groups, plotting survival curves and assessing the significance of survival differences using the log-rank test (p < 0.05). Additionally, we used the Cox proportional hazards model to calculate the hazard ratio (HR) for the key genes, evaluating their impact on patient prognosis.

Differential expression target gene screening and gene correlation analysis of transcription factors

To identify the intersection between target genes of transcription factors and differentially expressed genes (DEGs) in TCGA-OSCC, we collected lists of target genes associated with relevant transcription factors from the AnimalTFDB database. Subsequently, we applied thresholds (p < 0.05, log2FC > 1) to screen for 5405 differentially expressed genes from the TCGA-OSCC database.

To analyze the expression of genes in the intersection, we conducted intersection analysis using a Venn diagram. To evaluate the correlation between the intersection genes and the expression levels of other genes in the TCGA-OSCC dataset, we employed Pearson correlation coefficients. This correlation coefficient ranges from − 1 to 1, where 1 indicates a perfect positive correlation, − 1 indicates a perfect negative correlation, and 0 denotes no correlation.

Finally, we converted the computed Pearson correlation coefficients into log fold change (LogFC) values and performed GSEA to explore the biological pathways that are positively or negatively associated with the intersection genes.

Analysis of intercellular signal communication

In this study, we employed the CellChat R package to analyze signal communication between various cell populations. First, we constructed a CellChat object using the processed single-cell expression data and annotated the cell types. Subsequently, we utilized CellChat’s built-in signaling pathway database to automatically identify the communication signaling networks and their ligand-receptor pairs.

Furthermore, by calculating the activity of each signaling pathway, we assessed the communication strength among different cell populations and generated network graphs and heatmaps to visually represent the communication patterns and signal distribution. For critical signaling pathways, we extracted relevant ligand-receptor relationships and quantitatively evaluated their specificity among the various cell populations.

All data analyses were performed using R version 4.4, employing the Wilcoxon test to compare the signal intensities and communication frequencies between different cell populations. The significance level was set at p < 0.05, and all visualizations were generated using R packages such as ggplot2 and pheatmap.

Result

Integration and cell type analysis of OSCC-GB single-cell transcriptomic data

This study presents a comprehensive re-analysis of the OSCC-GB single-cell transcriptomic data, incorporating a total of 50,809 high-quality cells that passed quality control. To ensure the reliability of the clustering results and their biological significance, we rigorously tested the resolution parameter within a range of 0.2–1.2. We evaluated the distribution of cell subcelltypes, the effectiveness of UMAP visualization, and the biological interpretability of the clusters. Ultimately, a resolution parameter of 0.5 was selected, as it effectively balanced the number of cell subcelltypes with the accuracy of subsequent analyses (Fig. 1a).

Fig. 1.

Fig. 1

Integration Analysis of OSCC-GB Single-Cell Data. a Decision Tree of Cell Clustering. The decision tree illustrates the cell clustering capability at different resolutions. As the resolution increases, cells are subdivided into more subcelltypes. The circles in the decision tree represent individual cell populations. b UMAP Plot. The UMAP plot displays the distribution of cells after dimensionality reduction. Each point represents a single cell, distinguished by different colors that indicate various cell populations, primarily including T/NK cells, epithelial cells, macrophages, fibroblasts, B/plasma cells, endothelial cells, mast cells, and dendritic cells (DC). c Heatmap. The heatmap showcases the expression patterns of the top five marker genes (TOP5) within each cell population, aiding in the identification of the biological characteristics of each cell type. Cell clusters are indicated on the X-axis, while gene names appear on the Y-axis. The intensity of the colors represents the average gene expression levels, with yellow denoting higher expression intensity. d Proportions of Cell Types. This figure displays the proportion of each cell type within the single-cell data. The Y-axis represents the proportion of cell counts, while the X-axis depicts the different cell types

Based on the expression profiles of known cell marker genes, we classified the cells into nine major types (Fig. 1b): T/NK cells (marker genes: CD3D, CD2, CD3E, CD3G, TRAC), epithelial cells (PKP1, S100A14, SFN, GJB2, KRT14), macrophages (LYZ, FPR3, FCGR2A, C1QC, AIF1), fibroblasts (PRRX1, COL1A2, DCN, COL3A1, PDGFRB), endothelial cells (SOX18, VWF, CDH5, PCAT19, CLEC14A), B cells/plasma cells (CD79A, MS4A1, BANK1, IGHG1, FCRL5), mast cells (TPSB2, TPSAB1, CPA3, MS4A2, HDC), and dendritic cells (DC) (SCT, PLD4, PLAC8, GZMB, LILRA4). The distribution of each cell type within the UMAP embedding space is clearly illustrated (Fig. 1c).

Further analysis of cell composition ratios revealed that epithelial cells (35.15%) and T/NK cells (34.42%) constituted the predominant cell populations in OSCC-GB. Macrophages accounted for 11.33%, while B/plasma cells comprised 7.00%. The proportions of fibroblasts (4.89%) and endothelial cells (3.98%) were relatively low, with mast cells and dendritic cells exhibiting the smallest proportions at 1.79 and 1.44%, respectively (Fig. 1d). These findings highlight the heterogeneity of cell types within OSCC-GB and their primary compositional characteristics, laying a critical foundation for future exploration of the mechanisms by which these cells operate in the tumor microenvironment.

Projection results of OSCC-GB single-cell transcriptomic data based on reference mapping

In this study, we projected the OSCC-GB single-cell transcriptomic data onto the gingival reference data from the DISCO database to explore the spatial mapping relationships of tumor sample cell populations within healthy gingival tissue. The gingival reference data in the DISCO database provided detailed annotations for various cell types and their subtypes (Fig. 2a). Through Reference Mapping analysis, the OSCC-GB data was successfully embedded within the gingival reference space, showing a high degree of consistency between the distribution of cell types and the annotations in the reference map, further validating the accuracy of our annotation results for OSCC-GB cell populations.

Fig. 2.

Fig. 2

Cell Annotation Results Based on the Gingival Map Projection. a Gingival Single-Cell Transcriptomic Map. The UMAP plot illustrates the distribution of cell clusters in the gingiva, with different colors representing distinct cell types and their subtypes, identifying a total of 33 cell clusters. b Distribution of OSCC-GB Cell Clusters. The UMAP plot displays the distribution of cell clusters within OSCC-GB, where each color represents a unique cell cluster, including T/NK cells, epithelial cells, macrophages, fibroblasts, B/plasma cells, endothelial cells, mast cells, and DC. c Reference Map Projection. This panel presents the projection of data onto the reference map, visually depicting the relative spatial relationships between the query data and the reference data. Different colors indicate various cell types and their subtypes

The projection results indicated that the main cell types from OSCC-GB, including epithelial cells, T/NK cells, macrophages, B/plasma cells, fibroblasts, endothelial cells, mast cells, and DC, accurately clustered within their corresponding regions in the reference map (Fig. 2b). Notably, T/NK cells and epithelial cells from OSCC-GB were extensively mapped to their respective regions in the reference gingival map, displaying the highest density (Fig. 2c). Fibroblasts projected into both pericytes/vascular smooth muscle cells and fibroblasts, consistent with previous research findings, indicating that we identified subcelltypes of fibroblasts from different sources, all exhibiting characteristics of fibroblasts.

Additionally, UMAP visualization (Fig. 2b, c) allowed for a clear observation of the distribution patterns of OSCC-GB cell populations in the reference map, particularly the significant clustering of tumor-associated cells (such as epithelial cells, T/NK cells, and macrophages), reflecting the complexity and heterogeneity of the tumor microenvironment. These findings not only reveal the biological characteristics of OSCC-GB data within the context of healthy tissue but also support the reliability of our cell type annotations based on marker genes.

Subdivision of epithelial cell subcelltypes and functional enrichment analysis

We conducted a detailed subdivision of epithelial cells in OSCC-GB, categorizing them into six subcelltypes through high-resolution dimensionality reduction analysis. Specifically, we tested various dimensionality reduction methods, including Uniform Manifold Approximation and Projection (UMAP), Force-Directed Graph (FDG), t-Distributed Stochastic Neighbor Embedding (t-SNE), and Uniform Manifold Approximation and Projection for Regulatory Activity Space (UMAPRAS), which is based on transcription factor activity (Fig. 3a). The results demonstrated that the UMAPRAS method exhibited optimal separation and visualization among the subcelltypes, with clear independent distributions of the six subcelltypes in the UMAPRAS embedding space, highlighting the heterogeneity of different epithelial cell subgroups.

Fig. 3.

Fig. 3

Identification of Malignant Epithelial Cell Subcelltypes in OSCC-GB. a Left Panel 1: UMAP analysis reveals the diversity of epithelial cells in OSCC-GB, categorizing them into six subcelltypes (numbered 0–5), with each color corresponding to a distinct subcelltype, illustrating their distribution within the cellular state space. Left Panel 2: Dimensionality reduction of epithelial cell subcelltypes was performed using force-directed graph analysis. Left Panel 3: The t-SNE algorithm was applied to reduce the dimensionality of the epithelial cell subcelltype data in OSCC-GB. Left Panel 4: UMAP based on a transcription factor regulatory network (umapRAS) was used to analyze the dimensionality reduction of epithelial cell subcelltypes. b The average expression and expression proportions of marker genes across different cell subcelltypes are shown, with the X-axis representing distinct cell subcelltypes, the Y-axis representing different marker genes, the color intensity indicating average expression levels, and the size of the points reflecting the expression proportions. c The umapRAS dimensionality reduction results demonstrate that epithelial cells in OSCC-GB can be categorized into six distinct subcelltypes: IGKC+EIP, CXCL11+EIP, KRT16+EIP, PTPRC+EIP, KRT13+EIP, and CAV1+EIP cells. d The GSEA results indicate that key biological processes are activated in the CAV1+EIP of OSCC-GB. The size of the points represents the number of enriched genes, with deeper red colors indicating lower P.adjust values and higher enrichment significance. e The GO enrichment analysis results reveal specific functions of the CAV1+EIP in OSCC-GB

Further analysis involved calculating the marker genes for each epithelial cell subcelltype and extracting the top five marker genes for each subgroup to illustrate their unique gene expression characteristics (Fig. 3b). Based on the expression patterns of these marker genes, we named the six subcelltypes IGKC+EIP, CXCL11+EIP, KRT16+EIP, PTPRC+EIP, KRT13+EIP, and CAV1+EIP cells. These designations reflect the significant expression characteristics of their key marker genes, such as the high expression of IGKC in IGKC+EIP cells and the notable upregulation of CXCL11 in CXCL11+EIP cells (Fig. 3c).

Overall, the analysis reveals the complexity and functional heterogeneity of epithelial cells in OSCC. The UMAPRAS method provides crucial support for visualizing and interpreting the relationships among subcelltypes. Through systematic exploration of marker genes, we further delineated the characteristics of different subcelltypes, laying a foundation for subsequent investigations into the roles of epithelial cells in tumorigenesis and development.

Additionally, by conducting functional enrichment analyses of various epithelial cell subcelltypes, we uncovered their specific signal pathway activations and molecular functional characteristics. The results from GSEA indicated that the CAV1+EIP exhibited significant activation of pathways associated with epithelial-mesenchymal transition (EMT), hypoxia, coagulation, and the complement system (Fig. 3d). Further analysis through GO enrichment revealed that this subcelltype is closely linked to cadherin binding, ribosomal structure composition, and cell adhesion functions (Fig. 3e).

In the CXCL11+EIP subcelltype, significant activation was observed in biological processes, including the IL6/JAK/STAT3 signaling pathway, epithelial-mesenchymal transition, apical junctions, and interferon responses. GO analysis indicated associations with ribosomal structure composition and cadherin binding functions (Fig. S1). The IGKC+EIP subcelltype displayed activation of the hypoxia and P53 signaling pathways, while GO analysis suggested its relation to rRNA binding functions (Fig. S1).

Both the KRT13+EIP and KRT16+EIP subcelltypes showed activation of the KRAS signaling pathway. GO enrichment analysis revealed that the KRT13+EIP subcelltype was associated with oxidoreductase activity (Fig. S1), whereas the KRT16+EIP subcelltype was linked to ribosomal structure composition and epidermal structure formation (Figure. S1). The GSEA results for the PTPRC+EIP subcelltype demonstrated significant activation of the G2M checkpoint and E2F target gene signaling pathways, with GO analysis suggesting potential links to single-stranded DNA binding and DNA catalytic activity (Fig. S1).

This series of analyses highlights the functional heterogeneity among different epithelial cell subcelltypes in OSCC, each activating specific signaling pathways and molecular functions that reflect their diverse roles in the tumor microenvironment. These findings provide critical insights for further exploring the potential mechanisms underlying the roles of epithelial cell subcelltypes in tumor progression.

Malignancy analysis of epithelial cell subcelltypes

We employed the inferCNV tool to analyze chromosome copy number variation (CNV) in epithelial cell subcelltypes, using T/NK cells and macrophages as normal references. The results revealed that CAV1+EIP cells exhibited the most significant chromosome copy number variation among the epithelial cell subcelltypes, as indicated by the abundance of red (indicating increased copy number) and blue (indicating decreased copy number) signals in the heatmap. This phenomenon suggests that CAV1+EIP cells may exhibit greater genomic instability in OSCC, reflecting a higher accumulation of mutations in their genome and indicating that this subcelltype might possess a more severe degree of malignancy (Fig. 4a).

Fig. 4.

Fig. 4

Assessment of Malignancy in Epithelial Cell Subgroups. a The CNV heatmap illustrates the copy number variation scores within the epithelial cell subgroups. In this heatmap, red indicates chromosomal amplification, while blue represents chromosomal deletion. The X-axis denotes chromosome numbers, and the Y-axis represents the cell clusters involved in the analysis. T/NK cells and macrophages serve as reference cell populations to ensure the accuracy of the analysis. b The nomogram showing the contribution scores of each epithelial cell subgroup to patient prognosis includes OS rates at 1 year, 3 years, and 5 years. A nomogram is a widely used prognostic tool in oncology that predicts the probability of future specific clinical events. Users can identify each patient’s predictive variable values along the variable axis, extending upward to the point axis to obtain scores for each variable by locating the intersection points. By summing all scores, the total score can be found on the Total Points axis to determine the final probability of specific clinical events. c A combined analysis of patient age, sex, tumor stage, and epithelial cell subgroup scores is presented in a nomogram predicting the survival rates of OSCC-GB patients at 1 year, 3 years, and 5 years

Other epithelial cell subcelltypes, such as IGKC+EIP, CXCL11+EIP, KRT16+EIP, PTPRC+EIP, and KRT13+EIP, also showed varying degrees of copy number variation; however, the intensity and range of these variations were lower than those of CAV1+EIP cells. Additionally, we observed that fibroblasts exhibited relatively low chromosome copy number variation, consistent with the stability seen in the reference cells (T/NK and macrophages).

We further investigated the potential relationship between epithelial cell subcelltypes and clinical prognosis in OSCC patients. Utilizing scores calculated by the GSVA algorithm, we explored the impact of epithelial cell subcelltypes on prognosis. The prognostic nomogram illustrated that populations with high expression of CAV1+EIP and IGKC+EIP had higher hazard scores, suggesting that these two subcelltypes may be associated with poorer prognoses (Fig. 4b). To comprehensively assess the influence of clinical information and epithelial cell subcelltypes on prognosis, we constructed a multivariable Cox proportional hazards model. This model combined the analysis of patients’ age, gender, tumor stage, and epithelial cell subcelltype scores. The analysis indicated that after controlling for variables such as age, gender, and tumor stage, epithelial cell subcelltype scores remained an independent prognostic factor, with CAV1+EIP cells contributing the most to the risk scores, significantly increasing the patients’ risk of poorer prognosis. Furthermore, the nomogram analysis quantified the contributions of each factor to prognosis, revealing that age and tumor stage dominated the overall risk assessment, while the high-risk scores for CAV1+EIP cells stood out significantly among epithelial cell subcelltypes (Fig. 4c).

Overall, these results indicate that the distribution of epithelial cell subcelltypes is closely associated with the prognosis of OSCC patients, particularly that the CAV1+EIP may represent greater genomic instability and a more severe degree of malignancy, leading to poorer survival rates.

Spatial transcriptomic data reveals tumor cell spatial distribution characteristics in OSCC

In the spatial transcriptomic data of OSCC, we mapped the cellular landscape of OSCC-GB to the spatial tissue architecture of the tumor. Building upon Rohit’s research report, the tumor regions were classified into “Tumor Core (TC),” “Transitory,” “Leading Edge (LE),” and “Non-tumor” areas, each representing different stages of tumor transformation progress. Analysis of the spatial distribution of signaling revealed that epithelial cells are predominantly concentrated in the “TC”, “Transitory,” and “LE” regions, while B/plasma cells and fibroblasts primarily occupy the majority of the “Non-tumor” area (Fig. 5a, d). Additionally, in the “Non-tumor” region, macrophages, endothelial cells, and T/NK cells displayed widespread positive signals, suggesting that these cells play significant functional roles in the tumor microenvironment (Fig. 5b, e).

Fig. 5.

Fig. 5

Identification of Cellular Spatial Relationships Using Spatial Transcriptomics. a, d illustrate the zonation schematic of OSCC-GB tumor locations, including TC, Transitory, LE, and Non-tumor regions. b, e present various cell types found in OSCC-GB samples, including T/NK cells, epithelial cells, macrophages, fibroblasts, B cells/plasma cells, endothelial cells, mast cells, and DC. The intensity of the colors above the figures represents the expression levels of each cell cluster; darker colors indicate higher expression intensity. c, f utilize spatial transcriptomics technology to display the distribution and spatial proximity of the IGKC+EIP, CXCL11+EIP, KRT16+EIP, PTPRC+EIP, KRT13+EIP, and CAV1+EIP within epithelial cells in OSCC-GB

We further conducted an in-depth analysis of the spatial distribution patterns of tumor epithelial cell subgroups and found that different epithelial cell subgroups exhibit distinct spatial localization characteristics within the tumor regions. The KRT16+EIP subgroup is mainly distributed in the “TC” region, which may be closely related to the proliferation and metabolic characteristics of epithelial cells in that area. In contrast, the CAV1+EIP, previously identified as having the highest malignancy among the tumor epithelial subgroups, is primarily located in the “Transitory” and “LE” regions, showing significant spatial exclusivity with the KRT16+EIP subgroup. This relatively independent spatial distribution further suggests that the CAV1+EIP subgroup may play a unique role at the tumor margin and in transition zones.

It is important to note that the tumor’s margin and transition zones are often considered critical drivers of tumor progression and are principal sites for tumor invasion and metastasis. The enrichment of CAV1+EIPin these areas aligns with our previous findings from single-cell data analysis, which indicated higher genomic instability and malignancy in this subgroup. This further supports the notion that CAV1+EIP subgroup may be an important cell type driving malignant progression in tumors. Furthermore, these spatial distribution characteristics imply that the tumor epithelial cell subgroups might have relatively independent functional roles in different microenvironment regions of the tumor, collectively contributing to the complexity and heterogeneity of the OSCC tumor microenvironment (Fig. 5d, f).

CAV1+EIP characterization and its clinical significance

To further analyze the molecular characteristics of the CAV1+EIP, we conducted a specific transcription factor enrichment analysis based on a GRN for tumor epithelial cell subcelltypes (Fig. 6a), focusing on key transcription factors within the CAV1+EIP. The results indicated that the transcription factors LHX1, ATF1, ZBTB25, HOXA7, and PLAG exhibited significant specificity in the CAV1+EIP subgroup (Fig. 6a).

Fig. 6.

Fig. 6

Identification of Tumor Epithelial Cell Subcelltype-Specific Transcription Factors. a presents a regulatory activity map of key transcription factors across various cell populations, highlighting the top five transcription factors that are most specific to enriched cell types. b depicts a survival analysis based on the TCGA-OSCC dataset, utilizing Kaplan–Meier curves to illustrate the relationship between the key transcription factors LHX1, ATF1, ZBTB25, HOXA7, and PLAG1 in the CAV1+EIP and the survival prognosis of OSCC-GB patients. Patients were divided into two groups: the blue group represents low expression, while the red group indicates high expression. The Y-axis displays the survival probability of patients, and the X-axis represents patient survival time. c shows the ROC curve, which illustrates the diagnostic predictive capabilities of each transcription factor for OSCC-GB patients. This assessment evaluates the performance of these transcription factors in predicting 1-year, 3-year, and 5-year survival rates; a larger AUC indicates stronger predictive capability. d provides a box plot representing the gene expression levels of the key transcription factors LHX1, ATF1, ZBTB25, HOXA7, and PLAG1 across different clinical stages of OSCC. The X-axis denotes the clinical stages, while the Y-axis reflects the expression levels of different transcription factors in log2 (TPM + 1) format. e displays a prognostic nomogram for the five key transcription factors—LHX1, ATF1, ZBTB25, HOXA7, and PLAG1—related to the tumor’s T and N staging, age, and pathological staging at 1, 3, and 5 years

To assess the clinical value of these transcription factors, we explored their prognostic significance as independent genes in the TCGA-OSCC cohort. Kaplan–Meier survival risk model analysis revealed that these transcription factors were correlated with the survival prognosis of OSCC patients to varying degrees (Fig. 6b). Notably, LHX1 (Hazard Ratio, HR = 1.51, p = 0.018) and ATF1 (HR = 1.53, p = 0.009) demonstrated significant high-risk characteristics, suggesting that their high expression may be associated with poorer survival rates. HOXA7 (HR = 1.48, p = 0.017) and PLAG (HR = 1.39, p = 0.045) also exhibited significant risk correlations, further supporting their potential role in the malignant progression of OSCC. Although the risk ratio for ZBTB25 (HR = 1.32, p = 0.125) did not reach statistical significance, the observed trend suggests it may influence the prognosis of certain patient groups.

To further validate the predictive capabilities of the five key transcription factors (LHX1, ATF1, ZBTB25, HOXA7, and PLAG) for the prognosis of OSCC patients, we conducted a time-dependent Receiver Operating Characteristic (ROC) curve analysis. This analysis assessed the performance of these factors in predicting 1-year, 3-year, and 5-year survival rates (Fig. 6c). The results indicated that all five transcription factors exhibited high time-dependent Area Under Curve (AUC) values, demonstrating good predictive power for patient survival risk. Specifically, the AUC values for ZBTB25 at 1, 3, and 5 years were 0.507, 0.597, and 0.577, respectively. For PLAG, the AUC values at the same time points were 0.546, 0.532, and 0.545, indicating a stable prognostic effect. ATF1 exhibited the strongest predictive capability among all the transcription factors, with AUC values of 0.555, 0.560, and 0.587, respectively. HOXA7 performed well in predicting 3-year and 5-year survival, with AUC values of 0.547 and 0.563, while the 1-year AUC was 0.496, indicating moderate predictive ability. Although LHX1 had an AUC of 0.454 at 5 years, its 1-year and 3-year AUC values were 0.578 and 0.533, respectively, suggesting it may hold some prognostic value for specific patient groups.

We also examined the gene expression levels of these transcription factors across different clinical stages (Fig. 6d). The results indicated that the expression levels of all transcription factors were significantly higher than those in the normal control group. Specifically, LHX1 was significantly elevated in clinical stages I, II, III, and IV compared to the normal control group (p < 0.001). Although ATF1 did not show statistical significance in stages I and III, it exhibited significantly higher expression levels in stages II and IV (p < 0.001). ZBTB25 did not demonstrate statistical significance in stage I; however, its expression was significantly greater than in the control group in stages II, III, and IV (p < 0.001). PLAG1 showed no statistical significance in stages I and II, but in stage III (p < 0.05) and stage IV (p < 0.01), its expression was significantly higher than that of the control group. Finally, HOXA7 exhibited significantly higher expression levels than the control group in stages I and III (p < 0.001), as well as in stages II (p < 0.01) and IV (p < 0.0001).

In addition, we performed a prognostic nomogram analysis correlating the transcription factors LHX1, ATF1, ZBTB25, HOXA7, and PLAG1 with patients’ tumor staging (including T staging, N staging), age, and pathological staging (Fig. 6e). This analysis aimed to evaluate the potential roles of these transcription factors in patient prognosis. The results revealed that T4 staging had the highest score, indicating a close association with poor prognosis. Additionally, we found that high expression levels of LHX1, ATF1, ZBTB25, HOXA7, and PLAG1 were correlated with worse patient outcomes, particularly as high expression levels of LHX1 and ATF1 typically corresponded to the highest scores. This suggests that elevated expression of these transcription factors is positively correlated with decreased survival rates. The increased scores of highly expressed transcription factors in the prognostic nomogram reflect a significant reduction in patient survival rates.

IGF2BP1 as a common differentially expressed target gene

By using a Venn diagram to determine the intersection between the target genes of the transcription factors ATF1 and LHX1 and the DEGs in the TCGA-OSCC dataset, we identified IGF2BP1 as the only overlapping gene (Fig. 7a). This gene exhibited significant expression across all analyzed groups, with expression levels in tumor tissues markedly higher than those in the normal control group (Fig. 7b). Statistical tests indicated P < 0.001, suggesting it has potential biological significance in the tumor microenvironment. Further analysis revealed that IGF2BP1 also demonstrated a significant increasing expression trend across different tumor T stages (Fig. 7c). Specifically, the expression levels of IGF2BP1 in stages T1 to T4 are significantly higher than those in the corresponding normal groups, and there is an upward trend in expression with increasing tumor stage. This suggests that IGF2BP1 may be involved in the tumor progression process.

Fig. 7.

Fig. 7

Investigation of IGF2BP1 Gene Function in OSCC. a presents an analysis of the potential target gene intersections between the transcription factors ATF1 and LHX1 and the DEGs from TCGA-OSCC. The results indicate that there is only one common target in the intersection. b shows a box plot illustrating the expression differences of the IGF2BP1 gene between the tumor group and the control group, where red represents the tumor group and the other color represents the control group. The x-axis indicates the groups, while the y-axis represents expression values. c displays the gene expression levels of IGF2BP1 across different tumor stages. The x-axis corresponds to clinical staging, and the y-axis presents the expression level of IGF2BP1 in log2 (TPM + 1) format. d presents the results of GSEA based on the correlation coefficients related to the IGF2BP1 gene, indicating that the immune regulatory pathways show a negative correlation with IGF2BP1 expression

In the Kaplan–Meier survival analysis, although IGF2BP1 expression significantly increases, the results indicate that it does not have a significant impact on patient survival [P = 0.449; hazard ratio (HR) = 0.88 (0.64–1.22)]. This shows a lack of statistically significant correlation between IGF2BP1 expression levels and prognosis. The correlation analysis of IGF2BP1 was calculated using the Pearson correlation coefficient, revealing a significant negative correlation between IGF2BP1 and the expression of multiple genes (Fig. 7d). Through GSEA, we found that genes negatively correlated with IGF2BP1 expression are involved in immune regulatory pathways, such as T cell receptor signaling and cytokine signaling pathways. These pathways also exhibited negative correlations, indicating that IGF2BP1 may play a role in tumor progression and immune evasion by regulating these immune pathways.

Construction of CAV1+EIP signal network

To explore the potential communication relationship between the CAV1+EIP and immune cells, we conducted a systematic signal network analysis using CellChat. In the constructed signal network, the proportion of secreted signaling factors was 61.8%, ECM-receptor interactions accounted for 21.7%, and cell-mediated contacts represented 16.5% (Fig. 8a). We analyzed communication relationships among multiple cell populations, including B cells/plasma cells, CAV1+EIP, CXCL11+EIP, DC, endothelial cells, fibroblasts, IGKC+EIP, KRT13+EIP, KRT16+EIP, macrophages, mast cells, PTPRC+EIP, and T/NK cells.

Fig. 8.

Fig. 8

Identification of Intercellular Communication Relationships in OSCC-GB. a A pie chart depicting the proportion of signaling molecule sources is presented for data visualization in the cell communication analysis. b This illustration shows the number of signaling interactions between different cell types, where circles represent cell clusters and lines indicate communication signals. Thicker lines correspond to stronger interaction signals. c The intensity of signaling interactions among different cell types has been analyzed. Circles continue to represent cell clusters, while lines represent communication signals, with line thickness reflecting the strength of interaction signals. d The heatmap illustrates all cellular signaling pathways in OSCC-GB, identifying a total of 92 cell-to-cell signaling pathways. Darker colors on the heatmap indicate stronger communication flow intensity. e An analytical diagram of the NECTIN signaling pathway network in OSCC-GB is presented. f The diagram displays the communication probabilities among different cell types within the NECTIN signaling network, with darker colors representing higher communication intensity. g Each data point corresponds to a specific cell type (or cell subtype). The horizontal position of a data point reflects the strength of outgoing interactions, while the vertical position indicates the strength of incoming interactions. The size of the data point represents the number of enriched cells. h The top-left graph analyzes the relative contributions of the ligand-receptor pairs NECTIN2-TIGIT, NECTIN1-CD96, and NECTIN1-NECTIN4 to signaling pathway communication, represented as the ratio of inferred network communication probability for each ligand-receptor pair to the total pathway communication probability. The top-right graph displays the NECTIN1-CD96 signaling pathway network analysis. The bottom-left graph shows the NECTIN2-TIGIT signaling pathway network analysis, while the bottom-right graph depicts the NECTIN1-NECTIN4 signaling pathway network analysis. i The violin plot illustrates the expression levels of immune-related proteins such as NECTIN2, TIGIT, CD96, NECTIN1, and NECTIN4 in different cell types (or cell subtypes)

The results indicated that CAV1+EIP, CXCL11+EIP, KRT13+EIP, fibroblasts, and endothelial cells exhibited a high number of interactions, suggesting significant mutual interactions among these cell populations in the tumor microenvironment. In the assessment of communication strength, CAV1+EIP demonstrated the strongest signaling weight with fibroblasts, indicating that the interactions between these cells may play a crucial role in regulating tumor progression (Fig. 8b, c).

To further explore the specific networks among these cells, we visualized 92 identified signaling communication networks using a heatmap, showcasing the relative activity and importance of different cell groups in signaling transmission (Fig. 8d). Notably, we observed that the NECTIN signaling network between CAV1+EIP and T/NK cells exhibited the strongest and most specific communication relationship (Fig. 8e, f). In this NECTIN signaling network, CAV1+EIP acted as the strongest signaling ligand while T/NK immune cells served as the most potent signaling receptors (Fig. 8g).

This study identified three significant ligand-receptor pairs: NECTIN2-TIGIT, NECTIN1-CD96, and NECTIN1-NECTIN4 (Fig. 8h). In the signaling network, the expression level of the ligand NECTIN1 is highest in CAV1+EIP cells, while NECTIN2 is predominantly expressed in endothelial cells, fibroblasts, KRT13+EIP cells, and macrophages. CD96 and TIGIT are primarily expressed in PTPRC+EIP cells and T/NK cells, whereas NECTIN4 was detected only in KRT13+EIP cells (Fig. 8i).

Additionally, we conducted a quantitative analysis of the specificity of the aforementioned signaling relationships. The results showed that among NECTIN2-TIGIT, NECTIN1-CD96, and NECTIN1-NECTIN4, the interaction between NECTIN1 and CD96 exhibited the highest specificity between CAV1+EIP cells and T/NK cells. This finding suggests that the NECTIN1-CD96 interaction may play an important role in regulating immune cell functions mediated by CAV1+EIP. The interaction of NECTIN1 and CD96 is typically considered an inhibitory signal. Activation of CD96 can suppress the effector functions of T/NK cells, thereby reducing their capacity to kill tumor cells. Tumor cells may evade immune surveillance in the tumor microenvironment by expressing NECTIN1 to bind CD96, potentially leading to decreased T/NK cell activity. Through this signaling mechanism, tumor cells escape host immune monitoring, enabling their survival and development within the tumor microenvironment.

Discussion

This study reorganizes and analyzes a single-cell dataset of OSCC derived from the gingiva and buccal mucosa, focusing specifically on a detailed exploration of tumor epithelial cells. Through a thorough analysis of the epithelial cell subcelltypes, we found that the CAV1+EIP exhibits the highest copy number variation score and significantly contributes to poor patient prognosis. This suggests that CAV1+EIP may play a critical role in the progression of OSCC-GB.

The fate trajectories and functional signals of cells are determined by gene regulatory networks constructed by transcription factors. Therefore, we concentrated on CAV1+EIP-specific transcriptional regulators, including LHX1, ATF1, ZBTB25, HOXA7, and PLAG1. Analysis of these transcription factors, which dictate the status and functionality of CAV1+EIP cells, revealed that their gene expression levels significantly increase in the tumor state and positively correlate with tumor progression. Additionally, by integrating prognostic information from OSCC, we discovered that higher expression levels are associated with poor patient outcomes.

The study also revealed that IGF2BP1 is highly expressed in OSCC tissues and is a common target gene of the transcription factors LHX1 and ATF1. When IGF2BP1 expression levels are elevated, it negatively regulates the immune response in the tumor microenvironment. Consequently, we explored the relationship between CAV1+EIP and immune cells based on the single-cell cell signaling communication network. Notably, the NECTIN signaling network between CAV1+EIP and T cells garnered our special attention due to its highest connectivity. CAV1+EIP may influence the immune microenvironment of the tumor by binding the NECTIN1 ligand to the CD96 receptor on the surface of T cells.

It is worth noting that CD96 is an immune regulatory receptor with a complex role in modulating immune responses. In certain contexts, the activation of CD96 may be associated with immune suppression, diminishing the immune response. However, its specific mechanisms in oral cancer require further investigation.

We focused on specific states of tumor epithelial cells; therefore, by subdividing and functionally analyzing the epithelial cell subcelltypes, we categorized the epithelial cells in OSCC-GB into six subcelltypes and identified the functions of different subcelltype cells. Based on the expression patterns of marker genes, we assigned corresponding names to these six subcelltypes. Among them, the CAV1+EIP exhibited the most malignant characteristics.

Caveolin-1 (CAV-1) is a protein found in the cell membrane’s caveolar structures, involved in membrane transport and intracellular cholesterol trafficking [15]. Previous studies have shown that CAV-1 is closely associated with cell transformation and tumor progression. In OSCC, early studies indicated that the CAV-1 gene could function as an oncogene, promoting tumorigenesis [16, 17]. Specifically, CAV-1 activates the MAPK/ERK signaling pathway, which has carcinogenic effects on OSCC; additionally, CAV-1 can promote tumorigenesis and metastasis of OSCC by upregulating the ERK1/2 signaling pathway [18].

Through an in-depth analysis of the specific transcription factors in the CAV1+EIP, we found that LHX1, ATF1, ZBTB25, HOXA7, and PLAG1 exhibited significant specificity within the CAV1+EIP. Furthermore, subsequent analyses indicated that these key transcription factors are important indicators of survival prognosis in OSCC patients.

LHX1 is a nuclear transcription factor from the LIM homeodomain (LIM-HD) family, which includes key transcriptional regulators that control embryonic development and organogenesis. The expression of LHX1 was first detected during the process of gastrulation, where it regulates cell movement and is subsequently recognized in both the intermediate and lateral mesoderm [19]. Studies have shown that LHX1 plays a crucial regulatory role in cellular processes, cytoskeletal organization, and tumorigenesis [2022]. Research indicates that LHX1 acts as a positive regulator of tumor cell motility, migration, and invasiveness. Recent evidence also suggests that LHX1 may be overexpressed or reactivated in certain types of cancer [22, 23], with numerous studies indicating that LHX1 can serve as a biomarker for tumor cells [22]. Activating Transcription Factor 1 (ATF1) belongs to the ATF/CREB family [24], and is an important cellular regulator associated with the development of various cancers. Data from multiple databases consistently show that ATF1 is overexpressed in many cancers and is linked to poor prognosis and aggressive phenotypes. ATF1 plays a key role in regulating genes involved in cell proliferation, differentiation, and survival [25, 26]. Overexpression of ATF1 has been reported in both esophageal squamous cell carcinoma and nasopharyngeal carcinoma. In nasopharyngeal carcinoma, high expression of ATF1 promotes tumorigenesis of NPC cells, and its overexpression and high phosphorylation are correlated with clinical staging of the disease [27, 28]. ZBTB25 is a transcriptional repressor, with studies suggesting that several ZBTB proteins play important roles in tumor progression and chromatin remodeling, and have inhibitory roles in T cell differentiation and activation [29, 30]. Additionally, ZBTB25 can suppress the production of interferons, further enhancing the replication of influenza viruses [31]. Previous studies have indicated that transcriptional repressors play significant roles in the progression of various diseases [32]. HOXA7 is a homeobox gene located within the HOXA gene cluster on chromosome 7p15-p14, which plays a vital role in the stimulation of cell differentiation and morphological development. A recent study pointed out that HOXA7 is crucial for normal cell proliferation and differentiation during development, and its overexpression may lead to malignancy [33]. Research has found that HOXA7 expression is significantly increased at both mRNA and protein levels in OSCC. The overexpression of HOXA7 is positively correlated with clinical staging and tumor differentiation [34]. At the molecular level, elevated HOXA7 expression can activate the AKT and classic EGFR signaling pathways, leading to increased cell proliferation and chromosomal instability. Furthermore, the inactivation of tumor suppressor genes such as p16 and p53, along with the overexpression of oncogenes such as EGFR, c-myc, and PRAD-1, also contribute to the development of OSCC [35].

PLAG1 is a transcription factor involved in various cellular processes related to growth and development. In studies related to liver cancer, PLAG1 expression levels were found to be significantly higher in liver cancer tissues compared to normal tissues, and high PLAG1 expression in liver cancer patients has been significantly associated with reduced survival rates [36]. In summary, through the analysis of the specific transcription factors LHX1, ATF1, ZBTB25, HOXA7, and PLAG1, we are better able to define and describe the expression characteristics of the CAV1+EIP and their impact on tumor progression, which will help deepen our understanding of the role of this cell group in tumor biology.

Our research has revealed that LHX1 and ATF1 exhibit significantly high-risk characteristics. To further investigate the genes that they co-regulate transcriptionally, we focused on IGF2BP1. This gene is not only co-regulated by LHX1 and ATF1 but is also significantly overexpressed in OSCC. Studies have shown that IGF2BP1 expression is elevated in various human cancers, with its high levels closely associated with tumor metastasis and poor prognosis [37]. In cultured tumor-derived cells, IGF2BP1 has been confirmed to regulate the formation of lamellipodia and invadopodia [38]. Currently, IGF2BP1 is considered to play a crucial role in tumorigenesis and the development of resistance in both in vitro and in vivo cancer therapies. Furthermore, research indicates that IGF2BP1 is closely related to cisplatin resistance in oral squamous cell carcinoma, promoting tumor cell resistance by activating the downstream Akt signaling pathway [39]. Additionally, studies reveal that silencing IGF2BP1 can induce apoptosis in cancer cells, enhance the infiltration of immune cells such as CD4+ and CD8+ T cells, and reduce the expression of PD-L1 [40, 41].

In this study, we conducted a GSEA and found that genes negatively correlated with IGF2BP1 expression are involved in immune regulatory pathways, such as the T cell receptor signaling pathway and the cytokine signaling pathway. These results suggest that IGF2BP1 may influence tumor progression and immune evasion by regulating these immune pathways.

Therefore, we focused on the regulatory role of the CAV1+EIP on immune cells. Our research found that the interaction between NECTIN1 and CD96 shows high specificity between CAV1+EIP cells and T/NK cells. Notably, the binding site of CD96 is located on the V domain of NECTIN1, which contains the classical interface shared by the junction protein to facilitate cell adhesion [42]. Moreover, CD96 is highly expressed on T/NK cells, with previous studies indicating that CD96 is primarily expressed on CD4+, CD8+ T cells, and NK cells [43].

The expression of CD96 correlates positively with the number of CD8+ T cells in tumor-infiltrating tissues, suggesting that CD96 may be linked to the functional impairment of CD8+ T cells [44]. CD96 is significantly expressed in various malignancies, including brain cancer, breast cancer, and head and neck squamous cell carcinoma, while exhibiting low expression in certain diseases such as lung cancer and rectal cancer [43, 44]. It is well-known that CD96 is abundantly expressed in exhausted and functionally inactive NK cells, and its expression is associated with T cell dysfunction. In patients with OSCC, the overexpression of CD96 is notably significant; compared to healthy individuals, the phenomenon of T cell and NK cell exhaustion is markedly increased, which is closely related to the immune suppressive signals present in OSCC [45, 46].

Research indicates that human NECTIN1 can directly interact with CD96. As CD96 is regarded as an immune checkpoint on CD8+ T cells, blocking the activity of CD96, in combination with other immune checkpoint inhibitors, has been demonstrated as an effective strategy to enhance T cell activity and inhibit tumor growth [47]. Our findings highlight the NECTIN1-CD96 interaction as a potential immune checkpoint mechanism in OSCC, contributing to T cell dysfunction and tumor immune evasion. Intriguingly, CD96 shares functional similarities with PD-1, as both receptors are associated with T cell exhaustion and immunosuppression in the tumor microenvironment [47]. While PD-1/PD-L1 inhibitors (e.g., pembrolizumab, nivolumab) have shown clinical benefits in head and neck cancers, a substantial subset of patients remains unresponsive, possibly due to alternative immune evasion pathways such as the NECTIN1-CD96 axis. Given that CD96 is a member of the TIGIT/CD96/CD226 family and competes with the activating receptor CD226 for ligand binding, its inhibition might reverse T/NK cell exhaustion [43, 47].Recent studies suggest that co-blockade of multiple immune checkpoints (e.g., PD-1 + CTLA-4 or PD-1 + TIGIT) can synergistically enhance anti-tumor immunity [4850]. In OSCC, where PD-L1 expression is heterogeneous and resistance to PD-1/PD-L1 therapy is common, targeting the NECTIN1-CD96 axis could provide a complementary strategy. For instance:CD96 monoclonal antibodies or NECTIN1-CD96 interaction inhibitors might reinvigorate exhausted T cells, particularly in PD-1-resistant cases.Further studies are needed to validate the therapeutic potential of CD96 modulation in OSCC and its interplay with existing immunotherapies. This finding provides new insights into how tumor cells evade immune surveillance and points to future therapeutic strategies, suggesting the need for further investigation into blocking this interaction to develop more effective immunotherapies that strengthen anti-tumor immune responses.

Conclusion

This study reveals the crucial role of CAV1+EIP tumor epithelial cells in OSCC-GB and highlights their close association with adverse patient prognosis. The abnormal expression of specific transcription factors, including LHX1, ATF1, ZBTB25, HOXA7, and PLAG1, enhances their regulatory networks, which interact with the key gene IGF2BP1. This interaction may influence the tumor microenvironment by suppressing immune responses. Furthermore, the NECTIN1-CD96 signaling network between CAV1+EIP cells and T cells indicates their significant role in tumor immune evasion. In summary, the CAV1+EIP and its transcriptional regulatory factors are critical in the progression and immune regulation of OSCC, offering new directions for future therapeutic strategies.

Supplementary Information

Additional file 1. (21MB, tif)

Acknowledgements

Not applicable.

Author contributions

Y.A.S. and Y.T.S.: contributed to manuscript writing, conducted literature searches, and participated in data cleaning, processing, and data visualization. B.X. and W.Y.Y.: contributed to data processing and assisted in interpreting the results of data analysis. Y.W.: participated in manuscript revisions and proofreading. Y.H.Z. and L.F.Y.: provided major contributions, including project design, determination of research direction, supervised data analysis, and oversaw the entire research process.

Funding

These studies were supported by Basic research program of Science and Technology Department of Yunnan Province (202401AY070001-202).

Data availability

The raw data of this study are derived from the Cancer Genome Atlas (TCGA) database, Gene Expression Omnibus (GEO) database, AnimalTFDB database and Disco database, which are the publicly available databases. All detailed data included in the study are available upon request by contact with the corresponding author.

Declarations

Ethics approval and consent to participate

Not applicable.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Yan’an Shi and Yiting Shao have contributed equally.

Contributor Information

Lifu Yu, Email: Dentist_Yut3@163.com.

Yonghui Zhang, Email: zhangyonghui@kmmu.edu.cn.

References

  • 1.Zanoni DK, et al. Survival outcomes after treatment of cancer of the oral cavity (1985–2015). Oral Oncol. 2019;90:115–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Gupta PC, et al. Smokeless tobacco: a major public health problem in the SEA region: a review. Indian J Public Health. 2011;55(3):199–209. [DOI] [PubMed] [Google Scholar]
  • 3.Boffetta P, et al. Smokeless tobacco and cancer. Lancet Oncol. 2008;9(7):667–75. [DOI] [PubMed] [Google Scholar]
  • 4.Pathak KA, et al. Advanced squamous cell carcinoma of lower gingivobuccal complex: patterns of spread and failure. Head Neck. 2005;27(7):597–602. [DOI] [PubMed] [Google Scholar]
  • 5.Walvekar RR, et al. Squamous cell carcinoma of the gingivobuccal complex: predictors of locoregional failure in stage III-IV cancers. Oral Oncol. 2009;45(2):135–40. [DOI] [PubMed] [Google Scholar]
  • 6.More Y, D’Cruz AK. Oral cancer: review of current management strategies. Natl Med J India. 2013;26(3):152–8. [PubMed] [Google Scholar]
  • 7.Mellman I, et al. The cancer-immunity cycle: indication, genotype, and immunotype. Immunity. 2023;56(10):2188–205. [DOI] [PubMed] [Google Scholar]
  • 8.Zhang X, et al. Single-cell sequencing for precise cancer research: progress and prospects. Cancer Res. 2016;76(6):1305–12. [DOI] [PubMed] [Google Scholar]
  • 9.Mundry CS, et al. Single-cell RNA-sequencing of human spleens reveals an IDO-1(+) tolerogenic dendritic cell subset in pancreatic cancer patients that is absent in normal individuals. Cancer Lett. 2024;607: 217321. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.de Oliveira MV, et al. Immunohistochemical expression of interleukin-4, -6, -8, and -12 in inflammatory cells in surrounding invasive front of oral squamous cell carcinoma. Head Neck. 2009;31(11):1439–46. [DOI] [PubMed] [Google Scholar]
  • 11.Babiuch K, et al. Evaluation of proinflammatory, NF-kappaB dependent cytokines: IL-1α, IL-6, IL-8, and TNF-α in tissue specimens and saliva of patients with oral squamous cell carcinoma and oral potentially malignant disorders. J Clin Med. 2020;9(3):867. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Zhang W, et al. Safety and feasibility of radiotherapy plus camrelizumab for locally advanced esophageal squamous cell carcinoma. Oncologist. 2021;26(7):e1110–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Chen J, et al. Single-cell transcriptomics reveal the intratumoral landscape of infiltrated T-cell subpopulations in oral squamous cell carcinoma. Mol Oncol. 2021;15(4):866–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Theivanthiran B, et al. A tumor-intrinsic PD-L1/NLRP3 inflammasome signaling pathway drives resistance to anti-PD-1 immunotherapy. J Clin Invest. 2020;130(5):2570–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Campos A, et al. Cell intrinsic and extrinsic mechanisms of caveolin-1-enhanced metastasis. Biomolecules. 2019;9(8):314. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Hung KF, et al. The biphasic differential expression of the cellular membrane protein, caveolin-1, in oral carcinogenesis. J Oral Pathol Med. 2003;32(8):461–7. [DOI] [PubMed] [Google Scholar]
  • 17.Xue J, et al. Expression of caveolin-1 in tongue squamous cell carcinoma by quantum dots. Eur J Histochem. 2010;54(2): e20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Xu C, Gong R, Yang H. Upregulation of LY6K induced by FTO-mediated demethylation promotes the tumorigenesis and metastasis of oral squamous cell carcinoma via CAV-1-mediated ERK1/2 signaling activation. Histol Histopathol. 2024;39(10):1359–70. [DOI] [PubMed] [Google Scholar]
  • 19.Tian Y, et al. LHX1 as a potential biomarker regulates EMT induction and cellular behaviors in uterine corpus endometrial carcinoma. Clinics (Sao Paulo). 2022;77: 100103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Tam PP, et al. Regionalization of cell fates and cell movement in the endoderm of the mouse gastrula and the impact of loss of Lhx1(Lim1) function. Dev Biol. 2004;274(1):171–87. [DOI] [PubMed] [Google Scholar]
  • 21.Wan Y, Szabo-Rogers HL. Chondrocyte polarity during endochondral ossification requires protein-protein interactions between Prickle1 and Dishevelled2/3. J Bone Miner Res. 2021;36(12):2399–412. [DOI] [PubMed] [Google Scholar]
  • 22.Hamaidi I, et al. The Lim1 oncogene as a new therapeutic target for metastatic human renal cell carcinoma. Oncogene. 2019;38(1):60–72. [DOI] [PubMed] [Google Scholar]
  • 23.Dormoy V, et al. LIM-class homeobox gene Lim1, a novel oncogene in human renal cell carcinoma. Oncogene. 2011;30(15):1753–63. [DOI] [PubMed] [Google Scholar]
  • 24.Zucman J, et al. EWS and ATF-1 gene fusion induced by t(12;22) translocation in malignant melanoma of soft parts. Nat Genet. 1993;4(4):341–5. [DOI] [PubMed] [Google Scholar]
  • 25.Aktar S, et al. ATF1 restricts human herpesvirus 6A replication via beta interferon induction. J Virol. 2022;96(19): e0126422. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Chen M, et al. Emerging roles of activating transcription factor (ATF) family members in tumourigenesis and immunity: implications in cancer immunotherapy. Genes Dis. 2022;9(4):981–99. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Su B, et al. Stage-associated dynamic activity profile of transcription factors in nasopharyngeal carcinoma progression based on protein/DNA array analysis. OMICS. 2011;15(1–2):49–60. [DOI] [PubMed] [Google Scholar]
  • 28.Huang GL, et al. The protein level and transcription activity of activating transcription factor 1 is regulated by prolyl isomerase Pin1 in nasopharyngeal carcinoma progression. Cell Death Dis. 2016;7(12): e2571. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Benita Y, et al. Gene enrichment profiles reveal T-cell development, differentiation, and lineage-specific transcription factors including ZBTB25 as a novel NF-AT repressor. Blood. 2010;115(26):5376–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Lee SU, Maeda T. POK/ZBTB proteins: an emerging family of proteins that regulate lymphoid development and function. Immunol Rev. 2012;247(1):107–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Chen SC, Jeng KS, Lai MMC. Zinc finger-containing cellular transcription corepressor ZBTB25 promotes influenza virus RNA transcription and is a target for zinc ejector drugs. J Virol. 2017;91(20):10–1128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Madhavan A, et al. Transcription repressor protein ZBTB25 associates with HDAC1-Sin3a complex in mycobacterium tuberculosis-infected macrophages, and its inhibition clears pathogen by autophagy. mSphere. 2021;6(1):10–1128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Gomes AR, Zhao F, Lam EW. Role and regulation of the forkhead transcription factors FOXO3a and FOXM1 in carcinogenesis and drug resistance. Chin J Cancer. 2013;32(7):365–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Duan X, et al. The expression and significance of the HOXA7 gene in oral squamous cell carcinoma. J Oral Sci. 2017;59(3):329–35. [DOI] [PubMed] [Google Scholar]
  • 35.Scully C, Field JK, Tanzawa H. Genetic aberrations in oral or head and neck squamous cell carcinoma (SCCHN): 1. Carcinogen metabolism, DNA repair and cell cycle control. Oral Oncol. 2000;36(3):256–63. [DOI] [PubMed] [Google Scholar]
  • 36.Li J, et al. PLAG1 interacts with GPX4 to conquer vulnerability to sorafenib induced ferroptosis through a PVT1/miR-195-5p axis-dependent manner in hepatocellular carcinoma. J Exp Clin Cancer Res. 2024;43(1):143. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Elcheva IA, et al. RNA-binding protein IGF2BP1 maintains leukemia stem cell properties by regulating HOXB4, MYB, and ALDH1A1. Leukemia. 2020;34(5):1354–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Huang X, et al. Insulin-like growth factor 2 mRNA-binding protein 1 (IGF2BP1) in cancer. J Hematol Oncol. 2018;11(1):88. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Xie F, et al. Insight into the role of IGF2BP1 in drug resistant mechanism of oral squamous cell carcinoma. Shanghai Kou Qiang Yi Xue. 2021;30(5):456–61. [PubMed] [Google Scholar]
  • 40.Liu Y, et al. Allosteric regulation of IGF2BP1 as a novel strategy for the activation of tumor immune microenvironment. ACS Cent Sci. 2022;8(8):1102–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Tang H, Zhao J, Liu J. Comprehensive analysis of the expression of the IGF2BPs gene family in head and neck squamous cell carcinoma: association with prognostic value and tumor immunity. Heliyon. 2023;9(10): e20659. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Holmes VM, et al. Interaction between nectin-1 and the human natural killer cell receptor CD96. PLoS ONE. 2019;14(2): e0212443. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Feng S, et al. CD96 as a potential immune regulator in cancers. Int J Mol Sci. 2023;24(2):1303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Ye W, et al. CD96 correlates with immune infiltration and impacts patient prognosis: a pan-cancer analysis. Front Oncol. 2021;11: 634617. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Sun H, et al. Human CD96 correlates to natural killer cell exhaustion and predicts the prognosis of human hepatocellular carcinoma. Hepatology. 2019;70(1):168–83. [DOI] [PubMed] [Google Scholar]
  • 46.Thommen DS, Schumacher TN. T cell dysfunction in cancer. Cancer Cell. 2018;33(4):547–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Mittal D, et al. CD96 is an immune checkpoint that regulates CD8(+) T-cell antitumor function. Cancer Immunol Res. 2019;7(4):559–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Zong Y, Deng K, Chong WP. Regulation of Treg cells by cytokine signaling and co-stimulatory molecules. Front Immunol. 2024;15:1387975. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Zhou J, et al. The clinical significance of T cell infiltration and immune checkpoint expression in central nervous system germ cell tumors. Front Immunol. 2025;16:1536722. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Zhu M, et al. Single-cell transcriptomic and spatial analysis reveal the immunosuppressive microenvironment in relapsed/refractory angioimmunoblastic T-cell lymphoma. Blood Cancer J. 2024;14(1):218. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Additional file 1. (21MB, tif)

Data Availability Statement

The raw data of this study are derived from the Cancer Genome Atlas (TCGA) database, Gene Expression Omnibus (GEO) database, AnimalTFDB database and Disco database, which are the publicly available databases. All detailed data included in the study are available upon request by contact with the corresponding author.


Articles from Discover Oncology are provided here courtesy of Springer

RESOURCES