Abstract
This study examines the function of chromatin-remodeling genes (CRGs) in nasopharyngeal carcinoma (NPC), with an emphasis on their potential as prognostic and diagnostic biomarkers. We examined gene expression information collected from multiple datasets (GSE12452, GSE53819, GSE61218, and GSE102349) using a multi-stage methodology; we also performed differential expression, weighted gene co-expression network analysis, and functional enrichment analyses to identify pathogenic CRGs. A prognostic signature of six key genes—CDC6, EZH2, PHF14, PRC1, RAD54B, and UHRF1—was developed through machine learning methods and further validated in independent datasets. The identified genes were used to build a diagnostic model, which performed well (AUC > 0.8) in both training and validation cohorts. This model was further refined using a nomogram and demonstrated high clinical utility, as confirmed by decision curve analysis and calibration curves. Furthermore, the study of immune infiltration showed a strong correlation between immune cell types and diagnostic genes, while single-cell RNA sequencing highlighted functional differences across epithelial subpopulations in NPC. Notably, experimental validation of PHF14 indicated its involvement in NPC malignancy, with downregulation of PHF14-suppressing cell migration, invasion, and proliferation. These discoveries give fresh perspectives on the molecular processes of NPC and offer potential biomarkers for clinical diagnosis and prognosis.
Supplementary Information
The online version contains supplementary material available at 10.1007/s10238-025-01953-z.
Keywords: Nasopharyngeal carcinoma, Chromatin-remodeling genes, Machine learning, Single-cell RNA sequencing, PHF14
Introduction
Nasopharyngeal carcinoma (NPC) is a malignant tumor originating from epithelial cells, primarily occurring in the roof and lateral walls of the nasopharynx, particularly in the Rosenmüller's fossa region [1]. NPC has unique features even though it has a similar biological genesis to other head and neck epithelial malignancies. Firstly, it is relatively rare. According to data from the International Agency for Research on Cancer, there are approximately 133,000 new cases globally each year, with a predominance in male patients [2]. Secondly, its incidence demonstrates significant geographical clustering, with much higher rates in endemic regions such as Southeast Asia and North Africa compared to other areas [2, 3].
Due to the deep-seated location and unique anatomical structure of the nasopharynx, radiation therapy serves as the primary treatment modality [4]. Notably, over 95% of NPC cases are of the non-keratinizing subtype, which is strongly associated with Epstein–Barr virus (EBV) infection [4, 5]. Patients with advanced NPC continue to have a depressing survival rate despite advancements in immunotherapy and chemoradiotherapy because of tumor recurrence and/or distant metastases [6–8]. Therefore, identifying molecular biomarkers capable of predicting the prognosis of NPC patients is crucial. Such biomarkers would not only facilitate personalized monitoring of clinical outcomes but also potentially guide the development of novel therapeutic targets.
Numerous researches conducted in the past few years have demonstrated that chromatin-remodeling genes (CRGs) are important for the initiation and spread of a variety of human cancers [9, 10]. Chromatin remodeling is a key process regulating gene expression, achieved through alterations in chromatin structure [9]. This process is coordinately mediated by various proteins, including chromatin-remodeling complexes, histone-modifying enzymes, and non-coding RNAs [10, 11]. These mechanisms collectively exert precise control over gene expression in both normal and malignant cells [12–14]. Studies have reported that dysregulation of chromatin-related processes is implicated in the pathogenesis and progression of NPC [15, 16]. Consequently, CRGs hold promise as biomarkers to anticipate the clinical consequences of NPC and possess potential as novel therapeutic targets.
However, key challenges remain: Although the potential roles of CRGs have been extensively studied in various other tumors [9–14], research in the context of NPC remains relatively scarce. NPC exhibits high heterogeneity at the cellular and molecular levels [1, 4], which makes it challenging to identify consistent biomarkers across different patient cohorts [1, 17]. Research paradigms integrating single-cell resolution technologies with machine learning approaches based on CRG features have not been sufficiently explored. This study intends to overcome these obstacles by combining machine learning, multi-omics data integration, and single-cell analysis technologies in a synergistic manner to create and verify a CRG-based diagnostic and prognostic signature model for NPC. In addition to advancing our knowledge of the pathophysiology of NPC, this work will be very helpful in determining its clinical prognosis.
Methods
Data acquisition
The Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/) contained gene expression profiles and clinical information from several datasets of nasopharyngeal carcinoma (NPC) and normal samples (GSE150430, GSE12452, GSE53819, GSE61218, and GSE102349). The GSE150430 dataset comprises single-cell data from 15 NPC samples and 1 normal nasopharyngeal epithelial tissue sample. The GSE12452 dataset includes 31 NPC samples and 10 normal healthy nasopharyngeal tissue samples; this dataset served as the training set. The GSE53819 dataset includes 18 NPC samples and 18 normal healthy nasopharyngeal tissue samples, whereas the GSE61218 dataset includes 10 NPC samples and 6 normal healthy nasopharyngeal tissue samples; these two datasets served as the validation sets. The GSE102349 dataset, containing 88 NPC samples (disease) with prognostic information, was used as the prognostic analysis set.
An overview of all 132 chromatin-remodeling genes (CRGs) was obtained from the Human Transcription Factor Database (HumanTFDB; http://bioinfo.life.hust.edu.cn/HumanTFDB#!/), with the exception of six factors from the chromatin Y family. The FACER database ( http://bio-bigdata.hrbmu.edu.cn/FACER/) yielded a total of 870 chromatin regulators. As detailed in Supplementary Table 1, these genes were then merged to generate 904 chromatin-remodeling genes.
Differential gene expression analysis
Differential expression analysis was conducted between NPC and normal samples from the GSE12452 dataset using the limma package [18] (version 3.58.1) in R. Genes with an adjusted p value < 0.05 and |log2 fold change|> 1 were defined as significantly differentially expressed genes (DEGs).
WGCNA
We applied WGCNA [19] (version 1.61) to the top 5000 genes ranked by median absolute deviation (MAD) in the GSE12452 expression matrices. Modules of highly co-expressed genes were identified, and Pearson's correlation was used to determine module–trait connections between module eigengenes and the disease phenotype (NPC vs. normal). The module demonstrating the favorable positive correlation with disease status (R2 > 0.6, p value < 0.05) in the training set was chosen as the key disease-associated module.
pCRGs identification and enrichment analysis
The overlap of DEGs, key module genes, and CRGs was identified and termed pathogenic chromatin-remodeling differentially expressed genes (pCRGs). We performed functional enrichment analysis of Gene Ontology (GO) [20] and Kyoto Encyclopedia of Genes and Genomes (KEGG)[21] pathways on pCRGs using the clusterProfiler [22] package (version 4.10.0). The Benjamini–Hochberg method adjusted for multiple testing, and terms with adjusted p value < 0.05 were considered statistically significant.
Prognosis-related gene identification
We performed univariate Cox proportional hazards regression analysis (p value < 0.05) within the GSE102349 dataset to identify pCRGs significantly associated with patient survival outcomes.
Machine learning-based feature gene selection
We applied lasso-logistic regression [23, 24] (glmnet version 4.1–8) and the Boruta algorithm [25] (version 8.0.0) to prognosis-related genes in the training set. Genes identified by both algorithms formed the final feature gene set.
Diagnostic model construction and validation
We trained eighteen machine learning algorithms, including SVM, Ridge, Enet (α = 0.1–0.9), glmBoost, Lasso, Stepglm (forward), plsRglm, LDA, XGBoost, and NaiveBayes, on feature genes using the GSE12452 analysis set. We evaluated model performance using the area under the receiver operating characteristic curve (AUC) in both the analysis and validation sets (GSE53819, GSE61218). The optimal diagnostic model was determined to be the one with the highest mean AUC.
Prognostic model evaluation
The GSE102349 dataset was partitioned into training (60%) and test (40%) sets. A Ridge Cox regression model was built using diagnostic genes in the training set to calculate risk scores. The median risk scores for the training, test, and combined cohorts were used to stratify patients into high-risk and low-risk categories. Kaplan–Meier survival analysis was used to assess prognostic differences, and the timeROC [26] (version 0.4) was used to create time-dependent receiver-operating-characteristic (ROC) curves (1/2/3-year).
Cox proportional hazards analysis
To evaluate the independent prognostic value of the six-gene signature, we performed both univariate and multivariate Cox proportional hazards regression analyses. The risk score calculated based on key genes was treated as a continuous variable. Clinical covariates, including TNM staging and tumor mutation burden (TMB), were included in the model. Hazard ratio (HR) and 95% confidence interval (CI) were used to quantify the degree of association between each variable and the overall survival rate. A P value < 0.05 was established as the threshold for statistical significance.
Key genes validation
Differential expression of key genes was confirmed in the analysis and test sets. Diagnostic performance was assessed through AUC calculation using the pROC package [27] (version 1.18.4). Genes exhibiting a p value < 0.05 and an AUC > 0.8 were designated as diagnostic genes.
Immune cell correlation analysis
Infiltration scores for 28 immune cell types [28] were quantified in the analysis set using Single-sample Gene Set Enrichment Analysis (ssGSEA) via the GSVA package (version 1.50.5, https://www.bioconductor.org/packages/release/bioc/html/GSVA.html). Infiltration differences between NPC and normal groups were evaluated by Wilcoxon rank-sum tests. Pearson correlations were used to evaluate relationships between immune cell scores and diagnostic gene expression.
Functional annotation of diagnostic genes and pathway correlation
Interaction networks of diagnostic genes were explored using GeneMANIA [29]( https://genemania.org/). GO, KEGG, and HALLMARK gene sets were obtained via msigdbr (version 7.5.1, https://github.com/igordot/msigdbr). Gene Set Variation Analysis (GSVA) was used to generate pathway enrichment scores in NPC samples, which were then correlated with diagnostic scores. Following a median-based split of the groups based on high and low diagnostic scores, GSEA on HALLMARK pathways was conducted, followed by differential expression analysis.
Diagnostic nomogram
A ridge regression-based nomogram (penalty λ = 0.1) for predicting NPC status was created with the rms package [30] (version 6.8–0). Prediction accuracy was evaluated by calibration curves, and clinical utility was assessed through DCA [31] using the rmda package (version 1.6, https://cran.r-project.org/web/packages/rmda/index.html). Diagnostic performance was validated by ROC curves in both the analysis and validation sets.
GSEA analysis
Samples were dichotomized by median expression of each diagnostic gene. We performed GSEA against KEGG pathways, identifying terms with significant enrichment (adjusted p value < 0.05, |NES|> 1).
scRNA-seq processing and annotation
The GSE150430 scRNA-seq data were preprocessed and normalized using the Seurat package. We identified the top 3,000 highly variable genes (HVGs) using the 'FindVariableFeatures' function for principal component analysis (PCA). The elbow plot of variance was used to determine which principal components (PCs) were significant. Given that the dataset has 16 independent samples, we used the Harmony algorithm for batch calibration. Before further analysis, the PCA embeddings of all samples were combined. We chose a resolution of 0.2 for the 'FindClusters' function after evaluating resolutions from 0.1 to 1.0 and used it to cluster cells. The 'clustree' package was used to visualize the interaction relationships between different clusters at various resolutions. Cell types were annotated using published NPC markers [32]. Diagnostic gene scores across cell types were calculated using the AddModuleScore algorithm, with score differences between disease and control groups compared for each cell type.
Cellular subpopulation analysis
The cell type exhibiting maximal diagnostic score differences was sub-clustered through re-embedding using PCA and Harmony integration, followed by clustering with the 'FindClusters' function (resolution = 0.1). Subcluster-specific marker genes were found (thresholds: avg_log2FC > 2, p_val < 0.05), with subsequent GO and KEGG enrichment analyses performed.
Pseudotime trajectory analysis
Pseudotime ordering was performed on sub-clustered cells using Monocle [33] (version 2.28.0). The trajectory was ordered based on the top 2,000 most differentially expressed genes across subclusters. Co-upregulated gene dynamics were visualized through heatmaps, and diagnostic gene expression patterns along pseudotime were examined.
UMAP-based spatial mapping of diagnostic genes
Uniform Manifold Approximation and Projection (UMAP) embeddings were generated from the integrated single-cell data using the 'RunUMAP' function in Seurat. Diagnostic gene expression patterns across single cells were visualized on the resulting UMAP coordinates.
Cell lines and culture
These human nasopharyngeal carcinoma cells, including C666-1, HK-1, HNE3, TW03, and NP69 normal nasopharyngeal epithelial cells, were obtained from the laboratory of the School of Oncology, Guangxi Medical University. Cells were cultured at 37℃ and were supplemented with 5% CO2. NPC cells were cultured in RPMI-1640 medium (C11875500BT, Gibco, USA) supplemented with 10% fetal bovine serum (10099141C, Gibco, USA). The NP69 cell line was cultured in keratinocyte serum-free medium (10744019, Gibco, USA) supplemented with recombinant human epidermal growth factor.
PHF14 knockdown and validation
PHF14 knockdown was performed using small interfering RNAs (siRNAs) synthesized by Sangon Biotech (Shanghai, China), with sequences provided in Supplementary Table 2. Then, the siRNAs were transfected into NPC cells at a concentration of 50 nM using Lipofectamine 3000 reagent (Invitrogen, USA) and incubated for 48 h. Total RNA was extracted and cDNA synthesized using a reverse transcription kit (Takara Bio, Japan). Quantitative real-time PCR (qPCR) was conducted in triplicate (n = 3) with SYBR Premix Ex Taq II (Takara Bio, Japan) and PHF14-specific primers (Supplementary Table 2). Data are presented as mean ± standard deviation (SD), and statistical significance was assessed using the one-way ANOVA.
Western blot validation was performed as follows: We lysed cells using RIPA buffer for total protein extraction. We separated proteins by SDS–polyacrylamide gel electrophoresis (SDS-PAGE) and transferred them to PVDF membranes. Rabbit anti-PHF14 primary antibody (Proteintech, 24787-1-AP, China) was then used to incubate the membranes overnight at 4℃; the membranes were subsequently incubated with goat anti-rabbit IgG (H + L) secondary antibody (Invitrogen, SA5-35571, California) for 1 h at room temperature. We acquired blot images using the LI-COR Odyssey CLx Imager dual-color infrared imaging system.
Proliferation assay
We seeded cells in 6-well plates at a density of 2000 cells per well for colony formation. After a 14-day culture period, cells were fixed and stained with 0.5% crystal violet solution, and colony numbers were quantified per group. To assess proliferation activity, cells seeded in 96-well plates at 5 * 103 cells per well were incubated with 5-ethynyl-2′-deoxyuridine (EdU) (RiboBio, China) for 4h. Nuclei were counterstained with Hoechst 33342 (RiboBio, China). Following fixation, proliferative status was captured using laser scanning confocal microscopy (Leica, Germany). All proliferation assays were independently repeated three times.
Transwell assay
We performed Transwell invasion and migration assays using 24-well chambers (Costar, USA). For migration assays, we seeded cells in the upper chamber at 5 * 104 cells per well and cultured for 24 h. Migrated cells on the lower membrane surface were fixed and stained with 0.5% crystal violet solution. Migrated cell numbers were quantified per group. For invasion assays, the upper chamber was coated with Matrigel (BD Biosciences, USA) prior to cell seeding. After 48 h of culture, invaded cells were similarly stained and quantified. The number of migrated or invaded cells was quantified by counting five random fields per membrane under a light microscope. All assays were performed in three independent replicates.
Statistical analysis
Three separate independent runs of the in vitro tests were conducted. The correlation and expression differences were examined using the Spearman and Wilcoxon rank-sum tests. R (version 4.4.2) and GraphPad Prism 10 were used for statistical analysis, p < 0.05 was deemd statistically significant.
Results
Identification and enrichment analysis of DEGs–WGCNA–CRGs
The experimental workflow is illustrated in Fig. 1.
Fig. 1.
The flowchart of this study
Differential expression analysis employed thresholds of adjusted p value < 0.05 and |log2FC|> 1, identifying 271 upregulated and 478 downregulated DEGs. Figure 2A presents the corresponding volcano plot. Based on the scale-free fit index and average connectivity, we selected a soft threshold power of 10. At this threshold, the R2 value in Fig. 2B (left) exceeded 0.8, indicating that the network approximates a scale-free topology. Simultaneously, the mean adjacency function in Fig. 2B (right) approached 0, demonstrating a plateau trend. We set the minimal module size at 50 genes, initially obtaining 14 modules. We then merged modules with eigengene correlations ≥ 0.75, resulting in eight distinct modules (Fig. 2C). We integrated module eigengenes with clinical traits to analyze module–trait associations, thereby linking co-expression modules to disease status. As shown in Fig. 2D, we identified the green, yellow, blue, and black modules as exhibiting statistically significant correlations with the disease phenotype.
Fig. 2.
Identification and enrichment analysis of DEGs-WGCNA-CRGs genes. A Volcano plot depicting DEGs in the training set. The green and orange dots represent significantly downregulated and upregulated DEGs, respectively. The black horizontal line indicates a threshold of −log10(0.05), and the black vertical lines denote a threshold of |log2FC|= 1. B, C Soft threshold selection and dynamic clustering dendrogram from the WGCNA. D The module–trait heatmap illustrates the association between the modules and traits. E Overlap between DEGs, key module genes, and chromatin-remodeling genes. F, G GO and KEGG enrichment analysis for overlapping genes
We uncovered 21 pathogenic chromatin-remodeling genes based on the combination of WGCNA key module genes, DEGs, and chromatin-remodeling genes (Fig. 2E). We conducted GO and KEGG enrichment analyses on these genes, revealing 277 significantly enriched GO terms (p.adjust < 0.05, count ≥ 2), comprising 200 biological process terms (e.g., cell cycle), 34 cellular component terms (e.g., chromosomal region), and 43 molecular function terms (e.g., ATP binding), along with two significantly enriched KEGG pathways (e.g., cell cycle). We visualized the top 5 most significant GO terms per category and the top 10 KEGG pathways (ranked by p.adjust) in dot plots (Fig. 2F, G), respectively.
Identification of genes associated with prognosis
Prognostic screening using univariate Cox regression (p < 0.05) identified nine high-risk chromatin-remodeling genes (CRGs) with hazard ratios > 1 (Fig. 3A). Feature selection was applied to these nine prognostically significant CRGs. Lasso regression analysis (Fig. 3B) determined an optimal lambda value (λ = 0.0307) that minimized binomial deviance during threefold cross-validation, yielding six feature genes: CDC6, EZH2, PHF14, PRC1, RAD54B, and UHRF1. The Boruta algorithm (Fig. 3C) independently selected nine genes (AURKA, CDC6, EZH2, PBK, PHF14, PRC1, RAD54B, TOP2A, UHRF1) with importance scores significantly exceeding the maximum shadow variable. Intersection analysis of both gene sets (Fig. 3D) established a consensus signature of six diagnostic genes: CDC6, EZH2, PHF14, PRC1, RAD54B, and UHRF1.
Fig. 3.
Identification of key genes associated with prognosis. A Univariate Cox regression analysis of chromatin-remodeling differentially expressed genes (CR-DEGs). B, C The process of selecting key genes using the LASSO and Boruta algorithms. D Venn diagram of candidate genes
Diagnostic model construction and validation
Comparative evaluation of 18 machine learning algorithms across validation datasets identified plsRglm (Partial Least Squares Regression with Generalized Linear Models) as optimal, achieving the highest mean AUC as shown in Fig. 4A and Table 1. This algorithm exhibited robust performance consistency with minimum AUC values > 0.861 across validation sets, leading to its selection as the final diagnostic model.
Fig. 4.
Diagnostic model construction and validation. A The AUC values of 18 machine learning algorithm models on various datasets. B Univariate and multivariate Cox regression analyses of clinical and genomic prognostic factors in NPC. C Prognosis-related genes input for ridge Cox analysis. D–I KM curve and ROC curve of training set, test set, and total set
Table 1.
Performance metrics of machine learning classification models on different gene expression datasets
| Dataset | TP | FN | FP | TN | Accuracy | Sensitivity | Specificity | Precision | F1-score | Recall |
|---|---|---|---|---|---|---|---|---|---|---|
| GSE12452 | 29 | 2 | 2 | 8 | 0.9024 | 0.9355 | 0.8000 | 0.9355 | 0.9355 | 0.9355 |
| GSE53819 | 12 | 6 | 6 | 12 | 0.6667 | 0.6667 | 0.6667 | 0.6667 | 0.6667 | 0.6667 |
| GSE61218 | 10 | 0 | 1 | 5 | 0.9375 | 1.0000 | 0.8333 | 0.9091 | 0.9524 | 1.0000 |
TP true positive, which is the number of samples that are actually positive and correctly predicted as positive, FN false negative, representing the number of samples that are actually positive but incorrectly predicted as negative, FP false positive, indicating the number of samples that are actually negative but incorrectly predicted as positive, TN true negative, denoting the number of samples that are actually negative and correctly predicted as negative. Accuracy represents the proportion of correctly predicted samples to the total samples. Sensitivity (recall) measures the model's ability to identify positive samples. Specificity assesses the model's ability to identify negative samples. Precision indicates the proportion of actually positive samples among those predicted as positive. The F1-score is the harmonic mean of precision and recall, comprehensively reflecting the model's performance
Through univariate and multivariate Cox proportional hazard regression analyses, the independent prognostic value of six-gene features was evaluated, indicating that six-gene features may provide prognostic information beyond traditional clinical factors (Fig. 4B). In the univariate analysis, the six-gene risk score showed a strong link with overall survival (P = 0.002; HR = 588.767, 95% CI 9.468–36,612.850). The TNM stage presented a trend of prognostic relevance, but it was not significant (p = 0.085; HR = 2.193, 95% CI 0.896—5.367). And there was no significant correlation between TMB and survival (p = 0.142; HR = 0.997, 95% CI 0.992–1.001). In the multivariate Cox model, which adjusted for TNM stage and TMB, the six-gene risk score remained a significant independent predictor of survival (p = 0.005, HR = 1020.758, 95% CI 8.546–121,928.774). TNM stage (p = 0.385, HR = 1.516, 95% CI 0.593–3.874) and TMB (p = 0.380, HR = 0.998, 95% CI 0.993–1.003) did not attain statistical significance.
Subsequent Ridge Cox regression incorporating the six feature genes determined an optimal tuning parameter (λ = 1.7525) maximizing the concordance index (C-index; Fig. 4C), yielding the prognostic risk model: Risk score = 0.025*CDC6 + 0.793*EZH2 + 0.100*PHF14 + 0.038*PRC1 + 0.065*RAD54B + 0.069*UHRF1.
When stratifying training, testing, and full cohorts by median risk score, significant survival disparities (p < 0.05) consistently emerged between risk-stratified subgroups. Kaplan–Meier curves in Fig. 4D (training cohort), Fig. 4E (testing cohort), and Fig. 4F (full cohort) confirmed pronounced survival separation. Concurrently, ROC analysis demonstrated discriminative capacity with AUC values of 0.70 in the training set (Fig. 4G), exceeding 0.70 in both testing (Fig. 4H) and full cohorts (Fig. 4I).
Experimental validation and diagnostic model construction of characteristic genes
Experimental validation confirmed five diagnostic genes (CDC6, EZH2, PHF14, PRC1, and RAD54B) exhibiting differential expression patterns visualized in raincloud plots (Fig. 5A–F). Subsequent evaluation of diagnostic efficacy demonstrated robust performance through ROC analysis: the discovery cohort GSE12452 (Fig. 5G), validation cohort GSE53819 (Fig. 5H), and independent validation cohort GSE61218 (Fig. 5I).
Fig. 5.
Expression validation and performance evaluation of characteristic genes. A–F) Raincloud plots of characteristic genes. The scatter points on the left side depict the gene expression values in each sample. On the right side, box and violin plots illustrate the distribution of these gene expression values. G–I ROC curves of characteristic genes in GSE12452, GSE53819, and GSE61218 datasets. *P < 0.05, **P < 0.01
A diagnostic nomogram incorporating the identified genes was constructed using the discovery cohort (Fig. 6A). As evidenced by calibration curves showing a close agreement between expected and experienced outcomes, the model demonstrated great calibration accuracy. (Fig. 6B). DCA revealed superior net benefit of the nomogram compared to treat-all strategies across threshold probabilities, as indicated by its curve exceeding the reference gray line (Fig. 6C). Clinical impact curves further confirmed concordance of predicted versus actual clinical outcomes (Fig. 6D). Model discrimination remained robust across validation sets, with ROC curves yielding AUC values consistently exceeding 0.8 in the discovery cohort (Fig. 6E) and independent validation cohorts GSE53819 (Fig. 6F) and GSE61218 (Fig. 6G).
Fig. 6.
Diagnostic model construction of characteristic genes. A Nomogram of characteristic gene. B Calibration curve of diagnostic model. C DCA of diagnostic model. D Clinical impact curve of diagnostic model. E–G ROC curves of diagnostic models in training and test sets
Immune infiltration analysis of key genes
The ssGSEA quantified enrichment scores for 28 immune cell populations across samples in the discovery cohort. Subsequent Wilcoxon rank-sum tests revealed significant disease-associated alterations in immune infiltration (Fig. 7A), with certain immune cell populations being significantly upregulated in disease samples, while others were downregulated. Significant correlations between diagnostic genes (CDC6, EZH2, PHF14, PRC1, RAD54B) and immune infiltrates were visualized through a heatmap (Fig. 7B) and correlation dot plots (Fig. 7C). Representative scatterplots confirmed correlations between Type 2 T helper cells and CDC6 (Fig. 7D) and EZH2 (Fig. 7E).
Fig. 7.
Immune infiltration analysis of key genes. A Immune infiltration distribution between disease group and control group. B Correlation heatmap between key genes and immune cells. C The lollipop plots of the correlation between immune cells and the key genes. D The correlation between Type 2T helper cells and CDC6. E The correlation between Type 2T helper cells and EZH2. *P < 0.05, **P < 0.01
Potential functions and regulatory networks of prognostic genes
The gene–gene interaction (GGI) network revealed predominant physical interactions among diagnostic genes (Fig. 8A), with edge colors denoting relationship types: genetic interactions (red), co-expression (blue), co-localization (yellow), predicted interactions (purple), pathway involvement (green), physical interactions (black), and shared protein domains (orange). Gene set variation analysis (GSVA) quantified pathway activity scores for GO and KEGG terms, with correlation analysis against risk scores identifying the top five most positively and negatively correlated gene sets for biological processes (GOBP), molecular functions (GOMF), cellular components (GOCC), and KEGG pathways (Fig. 8B). Finally, GSEA of Hallmark gene sets based on log2FC between risk groups identified 30 significantly dysregulated pathways: 20 upregulated and 10 downregulated, with the top ten enriched pathways per direction visualized in Fig. 8C (upregulated) and Fig. 8D (downregulated).
Fig. 8.
Potential functions and regulatory networks of prognostic genes. A GGI network analysis of key genes. B The association between GSVA scores and risk scores of GO and KEGG pathways. C, D GSEA of the top 10 hallmark pathways with the most significant upregulation and downregulation. E, F GSEA of PHF14
Single-gene GSEA demonstrated distinct pathway associations: CDC6 showed enrichment in 126 pathways, comprising 74 upregulated and 52 downregulated, EZH2 in 103 pathways (27 upregulated; 76 downregulated), PHF14 in 122 pathways (56 upregulated; 66 downregulated), PRC1 in 126 pathways (69 upregulated; 57 downregulated), and RAD54B in 79 pathways (43 upregulated; 36 downregulated), with PHF14-specific pathway enrichment visualized in Fig. 8E, F.
Single-cell resolution reveals functionally distinct epithelial subpopulations in nasopharyngeal carcinoma
We performed PCA dimensionality reduction based on the top 3000 HVGs. The cumulative variation explained by principal components plateaued upon reaching 30 components, as illustrated in Fig. 9A. After evaluating resolutions from 0.1 to 1.0, we selected 0.2 for cell clustering. Biologically, resolution 0.2 yielded 14 clusters matching known NPC markers and functions; lower resolutions merged distinct subsets, and higher resolutions split clusters into biologically redundant subgroups. After calculating UMAP coordinates and employing all harmony-integrated data for cell clustering, 14 distinct cell clusters were found. Based on existing marker genes reported in the literature, cell clusters were identified. The annotation results, as presented in Fig. 9B, C, defined five major cell types: Fibroblasts (Fib), Epithelial cells (Epi), Myeloid cells (Myel), B cells (B), and T cells (T).
Fig. 9.
Integrated single-cell profiling reveals compartment-specific heterogeneity in NPC. A Principal components selection scree plot. B UMAP visualization annotated by cell type and disease status composition. C Heatmap of marker gene expression for cell type annotation. D Boxplots of key gene signature scores across cell types and sample groups. E Bar plot of inter-group differential signature score analysis. F UMAP visualization partitioned by cell clusters. G Heatmap of the top 100 marker genes per cluster. *P < 0.05, **P < 0.01
Subsequently, the AddModuleScore function was used to compute diagnostic gene signature scores for each cell. Inter-group differences in these signature scores were assessed for each cell type. As illustrated in Fig. 9D, notable differences were observed across all cell types between the groups. The statistical significance of these differences, evaluated using the −log10(p value), is presented in Fig. 9E. Among the cell types analyzed, epithelial cells (Epi) exhibited both the highest level of statistical significance and the largest effect size.
Further subclustering of the epithelial (Epi) compartment was performed through subset-specific HVG selection, dimensionality reduction, clustering, and biological annotation. Figure 9F presents the UMAP visualization colored by the resulting annotated epithelial subcluster identities (Epi0-Epi5).
Cluster-specific marker genes were identified for each epithelial subcluster: Epi0 (152 markers), Epi1 (68 markers), Epi2 (497 markers), Epi3 (1509 markers), Epi4 (481 markers), and Epi5 (3534 markers). Functional enrichment analysis was carried out on the marker gene sets for each subcluster. Figure 9G integrates the expression heatmap of the top 100 marker genes per subcluster with the top 5 most significantly enriched biological pathways associated with each subcluster. Epi0 exhibited predominant enrichment for cell cycle process; Epi1 was characterized by enrichment in pathways linked to cell migration, extracellular organization, and associated signaling cascades; Epi2 showed significant involvement in organ development, tissue development, and response to stimuli; Epi3 demonstrated strong associations with extracellular signaling pathways and intercellular transport mechanisms; Epi4 was linked to pathways governing organismal development and nervous system development processes; and the enrichment pathway of maker genes in Epi5 was related to the structure of cilia and their associated cellular functions.
Single-cell dissection of epithelial trajectories and oncogenic signatures in nasopharyngeal carcinoma
Epithelial (Epi) cells were isolated for pseudotime analysis. The trajectory inference results are presented in Fig. 10A–C. Epi0 subpopulations were predominantly localized to State 2, while Epi2 subpopulations primarily occupied State 3. Epi1 subpopulations exhibited broad distribution across all trajectory branches. Figure 10D displays expression patterns of the top 100 branch-dependent genes with the most significant divergence across the two differentiation trajectories. These genes segregated into five distinct clusters. These expression modules delineate key regulatory programs underpinning epithelial differentiation trajectories. Figure 10E illustrates the branch-specific expression dynamics of key diagnostic genes. Significant differential expression was observed across trajectory branches. CDC6, EZH2, and PRC1 exhibited correlated expression dynamics, while PHF14 and RAD54B displayed a distinct, shared expression pattern. We also looked into the patterns of diagnostic gene expression in the major kinds of cells. All five diagnostic genes exhibited elevated expression levels within the epithelial compartment, with PHF14 demonstrating the highest cellular detection frequency (proportion of expressing cells). Figure 10F-G presents gene-specific UMAP visualizations delineating the spatial expression patterns of each diagnostic gene across the epithelial subclusters.
Fig. 10.
Integrated spatiotemporal atlas of epithelial differentiation and key gene dynamics in NPC. A Pseudotime-embedded dimensionality reduction plot. B State assignment visualization. C Subcluster distribution along the trajectory. D Pseudotemporal expression patterns of top 100 branch-dependent genes. E Pseudotemporal expression patterns of diagnostic genes. F–J Gene-specific UMAP visualizations delineating the spatial expression patterns of each diagnostic gene across the epithelial subclusters
The impact of PHF14 on nasopharyngeal carcinoma malignancy
Given the prominent role of PHF14 identified in our preliminary studies, we selected this gene candidate for further investigation. As the initial experimental validation, both Western blot and qPCR analyses confirmed significantly elevated PHF14 expression in NPC cell lines versus NP69, shown in Fig. 11A.
Fig. 11.
The impact of PHF14 on NPC malignancy. A Expression of PHF14 in normal nasopharyngeal epithelial cells and nasopharyngeal carcinoma cells. B Validation for PHF14 knockdown in NPC cells. C, D The migration and invasion ability of NPC cells was reduced by downregulation of PHF14. E, F Knockdown of PHF14 reduced the proliferation of NPC cells as indicated by clone formation and EdU assay. *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001
We next asked whether PHF14 downregulation may influence the NPC malignancy. We used siRNAs and validated the knockdown effect on PHF14 expression by qPCR and western blot. The siRNA#1 and siRNA#2 were selected for phenotype experiment because of promising knockdown effect at both mRNA and protein level in C666-1 and TW03 cells (Fig. 11B). For Transwell assay with and without Matrigel, we observed reduced migration and invasion of NPC cells upon PHF14 downregulation (Fig. 11C, D). We observed that PHF14 downregulation aggravated the growth of NPC cells as indicated through clone formation assay (Fig. 11E). Similarly, a reduced proliferation activity in NPC cells by PHF14 knockdown was also detected in EdU assay (Fig. 11F).
Discussion
The goal of this project is to develop diagnostic and prognostic biomarkers by combining multi-omics data, machine learning, and single-cell analysis, as well as to understand the role of chromatin-remodeling genes (CRGs) in nasopharyngeal carcinoma (NPC). We identified six key genes (CDC6, EZH2, PHF14, PRC1, RAD54B, and UHRF1) as potential biomarkers for nasopharyngeal carcinoma. In the model based on machine learning, it has been proven to give excellent diagnostic results (AUC > 0.8) and has been tested based on various datasets. Moreover, immune infiltration analysis showed that there are significant associations of the different kinds of immune cells and these important genes, which shows their clinical implications for NPC. Further experimental results show that PHF14 can provoke the malignancy in NPC, and the knowledge about the molecular mechanism of NPC pathogenesis has also been advanced.
Recent investigation has explored the functions of CRG across multiple cancers, proving their significant role in gene regulation and cancer progression. Many earlier investigations have revealed that aberrant gene expression of CDC6 contributes to the oncogenic activity of various malignancies [34–36], but research in nasopharyngeal carcinoma remains lacking. PRC1 also affects various tumors through its involvement in multiple biological processes [37–39], including nasopharyngeal carcinoma [40]. In cancer, EZH2 influences tumor progression through various pathways, including cell cycle processes [41], autophagy and cell death [42], and DNA damage repair [43]. Evidence of overexpression of EZH2 exists in many other cancer types, including triple-negative breast cancer [44], diffuse-type gastric cancer [45], and endometrial cancer [46]. In nasopharyngeal carcinoma [47], the effect of Akt overexpression on the capacity for survival and apoptosis of nasopharyngeal carcinoma cells is lessened when EZH2 is knocked down. RAD54B is involved in the regulation of homologous recombination repair and DNA damage checkpoint response [48, 49], but no research has been reported in nasopharyngeal carcinoma. Our findings are consistent with these results, further indicating that EZH2, CDC6, and other key CRG are associated with the malignancy of nasopharyngeal carcinoma.
Among them, PHF14 is a key regulatory factor in this study. According to several studies, PHF14 generally serves an active part in the evolution and progression of tumors. PHF14 knockdown also appears to suppress the proliferation of bladder cancer [50]. Furthermore, it has been demonstrated that PHF14 increases glioblastoma cell invasion and development by the Wnt signaling pathway [51] improves gastric cancer cell migration and growth via the AKT pathway [52]. All things considered, PHF14 seems to be essential for the growth and spread of tumors. However, its significance and molecular basis in NPC remain completely unexplored, and our experimental validation of PHF14 downregulation revealed significant inhibition of cell migration, invasion, and proliferation, consistent with findings from other cancers. These results indicate that PHF14 contributes to the development of nasopharyngeal carcinoma and has potential being a therapeutic target to its management. Although our study emphasized the role of PHF14 in promoting the malignancy of NPC, there are no available small-molecule inhibitors for PHF14. Therefore, extensive drug testing is still required for clinical application.
The results of this study have certain clinical significance. Notably, the direct translational potential of several identified genes strengthens the clinical applicability of our findings. For instance, EZH2 inhibitors, such as tazemetostat, have received FDA approval for specific cancers and are under clinical development for others [53]. The prominent role of EZH2 in our models suggests that NPC patients with high EZH2 expression might benefit from existing epigenetic therapies. EZH2 inhibitors may be used clinically to treat NPC, but they may have complex effects on EBV-driven tumors. It is crucial to conduct preclinical testing on EBV-positive NPC models before proposing such therapies. Similarly, the roles of CDC6 and UHRF1 in DNA replication and methylation machinery also show that they could be "druggable" using drugs. This link makes our biomarker panel not only a way to diagnose diseases but also a possible way to forecast how well targeted therapy, such as current epigenetic medicines, would work.
Additionally, immune cell types substantially associated with important diagnostic genes reveal nasopharyngeal carcinoma's immunological landscape. Critical CRGs are linked to immune cell infiltration, suggesting they may be implicated in immune evasion. Since EZH2 suppresses tumor cell immunogenicity and T-cell infiltration in other cancers [54], its overexpression in NPC may contribute to a resistant microenvironment. EZH2-mediated tumor epigenetics regulate PD-L1 protein in colon cancer. EZH2 inhibitors reduce colon cancer anti-tumor immunity, supporting the possibility of the combined use of epigenetic drugs and immunotherapy [55].
This NPC study emphasizes cancer chromatin modification. Multi-omics, machine learning, and single-cell analysis make cancer biomarker discovery easier. This shows the importance of epigenetic regulation in malignancies like HCC, where dysregulation is common [56]. The CRG signature identified in our study may have relevance beyond NPC, and its evaluation in other cancer types, such as HCC where epigenetic dysregulation is also prevalent, could uncover pan-cancer mechanisms of tumorigenesis and immune evasion. Similar to NPC, skeletal metastases of unknown primary (SMUP) and other bone metastatic cancers also urgently require reliable biomarkers to predict disease progression and therapeutic response. Bone turnover markers, like ones showing bone breakdown and response to bisphosphonate drugs, have drawn much research attention, which helps us better grasp how tumors interact with the bone environment [57]. By studying how immune and stromal cells interact in the tumor environment, we can learn more about how tumors adjust and change their surroundings, no matter if the primary tumor site is known or not. This broader perspective could accelerate the development of universal biomarkers and therapeutic strategies targeting epigenetic machinery.
Despite these advantages, our study has limitations, one significant limitation being the reliance on public datasets, which could result in biases because of variations in sample collection, preparation, or evaluation techniques. The small sample size of the dataset limits the generalizability of the conclusions. Furthermore, although the machine learning models demonstrated robust performance, the validation sample size was rather modest; the conclusions remain preliminary until tested in prospective patient cohorts, and future studies should include larger and more diverse cohorts to further validate the clinical utility of these models. Finally, although PHF14 has been recognized as a significant contributor to nasopharyngeal carcinoma, its exact molecular mechanism remains unclear. Future research needs to concentrate upon PHF14's in vivo functional evaluation, using animal models to further clarifying its role in tumor progression and metastasis. Additionally, to ascertain PHF14's effectiveness as a possible treatment target for NPC, preclinical research should explore its therapeutic potential.
Conclusions
This study concludes by highlighting the important part that chromatin-remodeling genes play in the initiation and progression of nasopharyngeal carcinoma. We have created new diagnostic and prognostic models that can improve clinical decision-making by integrating machine learning, multi-omics information, and scRNA-seq. Furthermore, our results underscore PHF14's prospective as a nasopharyngeal carcinoma therapy target.
Supplementary Information
Below is the link to the electronic supplementary material.
Acknowledgements
For the data used in this work, we are thankful to the Gene Expression Omnibus.
Author contributions
YYC and XDZ designed the research; YYC and YX analyzed the data; YYC and WTQ performed the research; YYC, WLC, and WTQ wrote the paper; SHM and WLC contributed new reagents or analytic tools; ZYY developed the software necessary to perform and record experiments. All authors reviewed the paper and approved the submission.
Funding
The Joint Project on Regional High-Incidence Diseases Research of Guangxi Natural Science Foundation (2025GXNSFDA069051) and the Guangxi Science and Technology Program (GKE-ZZ202519) provided funding for this study.
Data availability
The datasets provided in this study can be found in the online repository. The name and access number of the repository can be found in the article.
Declarations
Conflict of interest
The authors declare no competing interests.
Ethics statement
Not applicable.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Chen YP, Chan AT, Le QT, Blanchard P, Sun Y, Ma J. Nasopharyngeal carcinoma. Lancet. 2019;394:64–80. 10.1016/S0140-6736(19)30956-0. [DOI] [PubMed] [Google Scholar]
- 2.Liu Q, Wang H, Chen Z, et al. Global, regional, and national epidemiology of nasopharyngeal carcinoma in middle-aged and elderly patients from 1990 to 2021. Ageing Res Rev. 2025;104:102613. 10.1016/j.arr.2024.102613. [DOI] [PubMed] [Google Scholar]
- 3.Bray F, Laversanne M, Sung H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2024;74:229–63. 10.3322/caac.21834. [DOI] [PubMed] [Google Scholar]
- 4.Lee AWM, Ng WT, Chan JYW, et al. Management of locally recurrent nasopharyngeal carcinoma. Cancer Treat Rev. 2019;79:101890. 10.1016/j.ctrv.2019.101890. [DOI] [PubMed] [Google Scholar]
- 5.Yuan L, Zhong L, Krummenacher C, Zhao Q, Zhang X. Epstein-Barr virus-mediated immune evasion in tumor promotion. Trends Immunol. 2025;46:386–402. 10.1016/j.it.2025.03.007. [DOI] [PubMed] [Google Scholar]
- 6.Xue F, Ou D, Xie C, et al. Sequential vs induction plus concurrent chemoradiotherapy in nasopharyngeal carcinoma: a randomized clinical trial. JAMA Oncol. 2025;11:1011–20. 10.1001/jamaoncol.2025.2191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Xu M, Yang S, Cui E, Wang Z, Li X. Induction chemotherapy for nasopharyngeal carcinoma. BMJ (Clin Res Ed). 2025;389:r652. 10.1136/bmj.r652. [DOI] [PubMed] [Google Scholar]
- 8.Cai M, Wang Y, Ma H, Yang L, Xu Z. Advances and challenges in immunotherapy for locally advanced nasopharyngeal carcinoma. Cancer Treat Rev. 2024;131:102840. 10.1016/j.ctrv.2024.102840. [DOI] [PubMed] [Google Scholar]
- 9.Zhang X, Liu D, Chen X, Li T, Wu G. Chromatin and epigenetic regulation in malignant tumors: a comprehensive review. Ann N Y Acad Sci. 2025;1551:33–51. 10.1111/nyas.70005. [DOI] [PubMed] [Google Scholar]
- 10.Malone HA, Roberts CWM. Chromatin remodellers as therapeutic targets. Nat Rev Drug Discov. 2024;23:661–81. 10.1038/s41573-024-00978-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Zhang X, Liu D, Yin S, Gao Y, Li X, Wu G. Metabolism and epigenetics in cancer: toward personalized treatment. Front Endocrinol. 2025;16:1530578. 10.3389/fendo.2025.1530578. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Goyal H, Kaur J. Long non-coding RNAs and autophagy: dual drivers of hepatocellular carcinoma progression. Cell Death Discov. 2025;11:376. 10.1038/s41420-025-02667-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Marei HE. Epigenetic regulators in cancer therapy and progression. NPJ Precis Oncol. 2025;9:206. 10.1038/s41698-025-01003-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.McClellan BL, Haase S, Nunez FJ, et al. Impact of epigenetic reprogramming on antitumor immune responses in glioma. J Clin Invest. 2023;133:e163450. 10.1172/JCI163450. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.He X, Yan B, Liu S, et al. Chromatin remodeling factor LSH drives cancer progression by suppressing the activity of fumarate hydratase. Cancer Res. 2016;76:5743–55. 10.1158/0008-5472.CAN-16-0268. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Xie Y, Wang H, Wang S, et al. Clinicopathological significance of ATRX expression in nasopharyngeal carcinoma patients: a retrospective study. J Cancer. 2021;12:6931–6. 10.7150/jca.63333. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Jiang J, Ying H. Revealing the crosstalk between nasopharyngeal carcinoma and immune cells in the tumor microenvironment. J Exp Clin Cancer Res. 2022;41:244. 10.1186/s13046-022-02457-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Ritchie ME, Phipson B, Wu D, et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43:e47. 10.1093/nar/gkv007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinform. 2008;9:559. 10.1186/1471-2105-9-559. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Ashburner M, Ball CA, Blake JA, et al. Gene ontology: tool for the unification of biology. The gene ontology consortium. Nat Genet. 2000;25:25–9. 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Kanehisa M, Goto S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28:27–30. 10.1093/nar/28.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Yu G, Wang L-G, Han Y, He Q-Y. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS. 2012;16:284–7. 10.1089/omi.2011.0118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.McEligot AJ, Poynor V, Sharma R, Panangadan A. Logistic LASSO regression for dietary intakes and breast cancer. Nutrients. 2020;12:2652. 10.3390/nu12092652. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Meier L, Van De Geer S, Bühlmann P. The group lasso for logistic regression. J R Stat Soc Ser B Stat Methodol. 2008;70:53–71. 10.1111/j.1467-9868.2007.00627.x. [Google Scholar]
- 25.Kursa MB, Jankowski A, Rudnicki WR. Boruta - a system for feature selection. Fundam Inform. 2010;101:271–85. 10.3233/FI-2010-288. [Google Scholar]
- 26.Blanche P, Dartigues J-F, Jacqmin-Gadda H. Estimating and comparing time-dependent areas under receiver operating characteristic curves for censored event times with competing risks. Stat Med. 2013;32:5381–97. 10.1002/sim.5958. [DOI] [PubMed] [Google Scholar]
- 27.Robin X, Turck N, Hainard A, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 2011;12:77. 10.1186/1471-2105-12-77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Huang J, Zhang J, Wang F, Zhang B, Tang X. Comprehensive analysis of cuproptosis-related genes in immune infiltration and diagnosis in ulcerative colitis. Front Immunol. 2022;13:1008146. 10.3389/fimmu.2022.1008146. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Warde-Farley D, Donaldson SL, Comes O, et al. The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic Acids Res. 2010;38:W214-220. 10.1093/nar/gkq537. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Wu J, Zhang H, Li L, et al. A nomogram for predicting overall survival in patients with low-grade endometrial stromal sarcoma: a population-based analysis. Cancer Commun. 2020;40:301–12. 10.1002/cac2.12067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Mak Int J Soc Med Decis Mak. 2006;26:565–74. 10.1177/0272989X06295361. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Chen Y-P, Yin J-H, Li W-F, et al. Single-cell transcriptomics reveals regulators underlying immune cell diversity and immune subtypes associated with prognosis in nasopharyngeal carcinoma. Cell Res. 2020;30:1024–42. 10.1038/s41422-020-0374-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Trapnell C, Cacchiarelli D, Grimsby J, et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol. 2014;32:381–6. 10.1038/nbt.2859. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Kim Y-H, Byun YJ, Kim WT, et al. CDC6 mRNA expression is associated with the aggressiveness of prostate cancer. J Korean Med Sci. 2018;33:e303. 10.3346/jkms.2018.33.e303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Mahadevappa R, Neves H, Yuen SM, et al. The prognostic significance of Cdc6 and Cdt1 in breast cancer. Sci Rep. 2017;7:985. 10.1038/s41598-017-00998-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Wang F, Zhao F, Zhang L, et al. CDC6 is a prognostic biomarker and correlated with immune infiltrates in glioma. Mol Cancer. 2022;21:153. 10.1186/s12943-022-01623-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Huang D, Chu X, Wu C, et al. CCNY-mediated phosphorylation and TET2-BACH1-driven DNA demethylation activate PRC1 to augment NSCLC progression. J Exp Clin Cancer Res. 2025;44:206. 10.1186/s13046-025-03472-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Brynychova V, Ehrlichova M, Hlavac V, et al. Genetic and functional analyses do not explain the association of high PRC1 expression with poor survival of breast carcinoma patients. Biomed Pharmacother. 2016;83:857–64. 10.1016/j.biopha.2016.07.047. [DOI] [PubMed] [Google Scholar]
- 39.Chen J, Rajasekaran M, Xia H, et al. The microtubule-associated protein PRC1 promotes early recurrence of hepatocellular carcinoma in association with the wnt/β-catenin signalling pathway. Gut. 2016;65:1522–34. 10.1136/gutjnl-2015-310625. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Wu M, Yang L, Hou X, Wang Z, Zhang J. Human polycomb protein 2 (hPC2) as a novel independent prognostic marker in nasopharyngeal carcinoma. Cancer Manag Res. 2021;13:5775–84. 10.2147/CMAR.S308884. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Nutt SL, Keenan C, Chopin M, Allan RS. EZH2 function in immune cell development. Biol Chem. 2020;401:933–43. 10.1515/hsz-2019-0436. [DOI] [PubMed] [Google Scholar]
- 42.Yao Y, Hu H, Yang Y, et al. Downregulation of enhancer of zeste homolog 2 (EZH2) is essential for the induction of autophagy and apoptosis in colorectal cancer cells. Genes. 2016;7:83. 10.3390/genes7100083. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Ito T, Teo YV, Evans SA, Neretti N, Sedivy JM. Regulation of cellular senescence by polycomb chromatin modifiers through distinct DNA damage- and histone methylation-dependent pathways. Cell Rep. 2018;22:3480–92. 10.1016/j.celrep.2018.03.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Melino M, Tu WJ, Bielefeldt-Ohmann H, et al. Depleting the action of EZH2 through PI3K-mTOR inhibition to overcome metastasis and immunotherapy resistance in triple-negative breast cancer. Mol Cancer Ther. 2025;24:1511–26. 10.1158/1535-7163.MCT-24-0693. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Zou G, Huang Y, Zhang S, et al. E-cadherin loss drives diffuse-type gastric tumorigenesis via EZH2-mediated reprogramming. J Exp Med. 2024;221:e20230561. 10.1084/jem.20230561. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Krill L, Deng W, Eskander R, et al. Overexpression of enhance of zeste homolog 2 (EZH2) in endometrial carcinoma: an NRG oncology/gynecologic oncology group study. Gynecol Oncol. 2020;156:423–9. 10.1016/j.ygyno.2019.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Yu M, Li Y, Li M, Lu D. Eudesmin exerts antitumor effects by down-regulating EZH2 expression in nasopharyngeal carcinoma cells. Chem Biol Interact. 2019;307:51–7. 10.1016/j.cbi.2019.04.028. [DOI] [PubMed] [Google Scholar]
- 48.Miyagawa K, Tsuruga T, Kinomura A, et al. A role for RAD54B in homologous recombination in human cells. EMBO J. 2002;21:175–80. 10.1093/emboj/21.1.175. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Yasuhara T, Suzuki T, Katsura M, Miyagawa K. Rad54B serves as a scaffold in the DNA damage response that limits checkpoint strength. Nat Commun. 2014;5:5426. 10.1038/ncomms6426. [DOI] [PubMed] [Google Scholar]
- 50.Miao L, Liu HY, Zhou C, He X. LINC00612 enhances the proliferation and invasion ability of bladder cancer cells as ceRNA by sponging miR-590 to elevate expression of PHF14. J Exp Clin Cancer Res. 2019;38:143. 10.1186/s13046-019-1149-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Wu S, Luo C, Li F, Hameed NUF, Jin Q, Zhang J. Silencing expression of PHF14 in glioblastoma promotes apoptosis, mitigates proliferation and invasiveness via wnt signal pathway. Cancer Cell Int. 2019;19:314. 10.1186/s12935-019-1040-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Zhao Y, He J, Li Y, et al. PHF14 promotes cell proliferation and migration through the AKT and ERK1/2 pathways in gastric cancer cells. BioMed Res Int. 2020;2020:6507510. 10.1155/2020/6507510. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Duan R, Du W, Guo W. EZH2: a novel target for cancer treatment. J Hematol Oncol. 2020;13:104. 10.1186/s13045-020-00937-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Porazzi P, Nason S, Yang Z, et al. EZH1/EZH2 inhibition enhances adoptive T cell immunotherapy against multiple cancer models. Cancer Cell. 2025;43:537-551.e7. 10.1016/j.ccell.2025.01.013. [DOI] [PubMed] [Google Scholar]
- 55.Huang J, Yin Q, Wang Y, et al. EZH2 inhibition enhances PD-L1 protein stability through USP22-mediated deubiquitination in colorectal cancer. Adv Sci. 2024;11:e2308045. 10.1002/advs.202308045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Xu C, Liang L, Liu G, et al. Predicting hepatocellular carcinoma outcomes and immune therapy response with ATP-dependent chromatin remodeling-related genes, highlighting MORF4L1 as a promising target. Cancer Cell Int. 2025;25:4. 10.1186/s12935-024-03629-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Argentiero A, Solimando AG, Brunetti O, et al. Skeletal metastases of unknown primary: biological landscape and clinical overview. Cancers (Basel). 2019;11:1270. 10.3390/cancers11091270. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The datasets provided in this study can be found in the online repository. The name and access number of the repository can be found in the article.











