Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Apr 22;15:13901. doi: 10.1038/s41598-025-96683-3

Multi-omics analysis constructs a novel neuroendocrine prostate cancer classifier and classification system

Junxiao Shen 1, Luyuan Lu 2, Zujie Chen 1, Wei Guo 1, Shuwen Wang 1, Ziqiao Liu 1, Xuke Gong 1, Yiming Qi 1, Ruyi Jin 3, Cheng Zhang 1,
PMCID: PMC12015331  PMID: 40263498

Abstract

Neuroendocrine prostate cancer (NEPC), a subtype of prostate cancer (PCa) with poor prognosis and high heterogeneity, currently lacks accurate markers. This study aims to identify a robust NEPC classifier and provide new perspectives for resolving intra- tumoral heterogeneity. Multi-omics analysis included 19 bulk transcriptomics, 14 single-cell transcriptomics, 1 spatial transcriptomics, 16 published NE signatures and 10 cellular experiments combined with multiple machine learning algorithms to construct a novel NEPC classifier and classification. A comprehensive single-cell atlas of prostate cancer was created from 70 samples, comprising 196,309 cells, among which 9% were identified as NE cells. Within this framework and in combination with bulk transcriptomics, a total of 100 high-quality NE-specific feature genes were identified and differentiated into NEPup sig and NEPdown sig. The random forest (RF) algorithm proved to be the most effective classifier for NEPC, leading to the establishment of the NEP100 model, which demonstrated robust validation across various datasets. In clinical settings, the use of the NEP100 model can greatly improve the diagnostic and prognostic prediction of NEPC. Hierarchical clustering based on NEP100 revealed four distinct NEPC subtypes, designated VR_O, Prol_N, Prol_P, and EMT_Y, each of which presented unique biological characteristics. This allows us to select different targeted therapeutic strategies for different subtypes of phenotypic pathways. Notably, NEP100 expression correlated positively with neuroendocrine differentiation and disease progression, while the VR-NE phenotype dominated by VR_O cells indicated a propensity for treatment resistance. Furthermore, AMIGO2, a component of the NEP100 signature, was associated with chemotherapy resistance and a poor prognosis, indicating that it is a pivotal target for future therapeutic strategies. This study used multi-omics analysis combined with machine learning to construct a novel NEPC classifier and classification system. NEP100 provides a clinically actionable framework for NEPC diagnosis and subtyping.

Keywords: Neuroendocrine prostate cancer (NEPC), Multi-omics, Computational biology and bioinformatics, Tumor biomarkers, Tumor heterogeneity

Subject terms: Bioinformatics, Tumour biomarkers, Tumour heterogeneity, Urological cancer, Prostate cancer

Introducing

Prostate cancer (PCa) is one of the most prevalent forms of cancer in men globally1. There is a substantial body of evidence indicating that the androgen receptor plays a pivotal role in regulating PCa. Androgen receptor (AR) is known to control the cellular program that promotes PCa cell survival and proliferation2. Consequently, androgen deprivation therapy is frequently employed as the primary treatment for prostate cancer, with the objective of prolonging overall survival. However, the majority of patients who receive this therapy develop castration-resistant prostate cancer (CRPC) within a median of 16–18 months3. However, a subset of CRPC patients exhibit a neuroendocrine (NE) phenotype with reduced or no dependence on AR signaling and a poor prognosis4. The lack of appropriate, unique identifying markers for this disease, which is the most lethal subtype with initial symptoms similar to those of CRPC and often mixed in clinical samples, presents a significant challenge for accurate diagnosis and appropriate treatment5.

The immunohistochemical characterization of classical NEPC markers, including SYP, CHGA, NCAM1 and ENO2, can be employed to clinically characterize NEPC samples6. However, owing to the significant intra-tumor heterogeneity of NEPC, the applicability of IHC for these markers may be limited. Furthermore, the set of NEPC feature genes generated from previous studies is limited by its dependence on data from bulk tumors with small sample sizes, resulting in low specificity. The development of NEPC features with greater generalizability is a pressing necessity.

The advent of single-cell sequencing technology has enabled the analysis of gene expression and intra-tumor heterogeneity at the single-cell level, thereby providing crucial raw data for elucidating the characteristics and heterogeneity of NEPC7. Concurrently, machine learning algorithms furnish the requisite technical support for revealing significant biological information obscured within extensive datasets8. Consequently, this study initially constructed an NEPC single-cell atlas through the analysis of a substantial number of samples and identified 100 high-quality NE-specific feature genes by integrating bulk transcriptome data and multiple methodologies. A robust NEPC classifier, NEP100, was subsequently developed via machine learning algorithms, and the classifier’s ability to predict NEPC diagnosis, prognosis, and treatment responsiveness was subsequently validated via multiple external validation sets of bulk and single-cell transcriptomics data. Furthermore, the clarification of the four NEPC subtypes in relation to NEP100 and the function of the potential target AMIGO2 offers significant insights into the personalized treatment of NEPC patients (Fig. 1).

Fig. 1.

Fig. 1

Flow diagram of this research. Image created with BioRender.com, with permission.

Methods

Collection and processing of single-cell transcriptomics data

To elucidate the tumor biological features and heterogeneity of NEPC, in silico sorting of prostate cancer was performed on nine publicly available datasets comprising 70 tumor samples916 (Table S1-1). The 70 single-cell transcriptomics samples were systematically curated from nine publicly available cohorts published in high-impact journals (e.g., Science, Cancer Cell) between 2020 and 2024 (Table S1-1). These cohorts were selected based on rigorous diagnostic criteria, including histopathological confirmation, clinical staging (e.g., CSPC vs. CRPC), and molecular subtyping (e.g., NEPC defined by Syn/CgA positivity and neuroendocrine marker overexpression). Cell quality thresholds were determined according to established single-cell transcriptomics protocols17 and also based on thresholds used in previous publications10. Cells with less than 250 detected genes or less than 500 transcripts were excluded to remove empty droplets and damaged cells, and a mitochondrial content threshold of 20% was selected to exclude apoptotic cells. Mitochondrial genes (n = 50), heat shock proteins (n = 178), and ribosomal genes (n = 1,253) were excluded to reduce technical noise. This aligns with liver cancer single-cell studies18, where similar filtering improved tumor heterogeneity resolution. Mitochondrial genes indicate apoptosis, HSPs reflect dissociation artifacts, and ribosomal genes dominate RNA content, masking biological signals. (Table S2-1). A total of 196,309 cells were subjected to stringent quality control in this study and 12 cell types were identified by manual labelling (corresponding biomarkers, see Table S2-2), of which 9% of total cells were identified as NE cells. The ‘Seurat’ package was employed to normalize the data, identify variably expressed genes and perform a principal component analysis (PCA). The results of these analyses were integrated, and batch effects were corrected. Subsequently, neighborhood plots were calculated from the PCA results. Lewin clustering was performed at a resolution of 0.3, and uniform manifold approximation and projection (UMAP) was used for visualization. Finally, cell types were annotated to each cluster via known marker genes (Table S2-2).

The remaining five single-cell datasets1921 (Table S1-3) were employed for validation, and the process of identifying NE subtypes was analogous to that described above.

To infer the differentiation trajectory of NE cells, we used monocle3 (V1.3.4; https://cole-trapnell-lab.github.io/monocle3/) to infer the pseudotime of each cell via the default parameters and the DDR-Tree method.

Collection and processing of spatial transcriptomics data

The spatial transcriptome data (GSE230282) were downloaded from GEO22 (Table S1-3). Data were normalized and corrected using the ‘Sctransform’ method23, which robustly addresses technical variance in single-cell data (e.g., sequencing depth) while preserving biological heterogeneity. This approach is particularly suited for NEPC datasets due to its ability to mitigate over-normalization in high-sparsity, high-ambient RNA contexts. Additionally, cellular gene expression patterns at spatial resolution were plotted via ‘SpatialDimPlot’.

Collection and processing of bulk transcriptomics

The dataset comprised publicly available studies encompassing human and xenograft prostate cancer gene expression data, pathological typing, and clinical outcome information. A total of 19 cohorts were selected based on disease relevance (mCRPC/NEPC histopathology or molecular subtyping), data quality (≥ 50 samples, transcriptomics, clinical annotations), technical consistency (Illumina platforms, standardized bioinformatics pipelines), and public availability (open-access raw data). Cohorts lacking clear definition of tumor tissue type or incomplete metadata were excluded (Tables S1-5): SU2C-2019 (n = 266)24, WCM (n = 49)25, MDA (n = 88)26 and MCTP (n = 94)27. These data were downloaded from cBio The Cancer Genomics Portal (https://www.cbioportal.org), TCGA-PARD (n = 497)28 and WCDT-MCRPC29 source data from the GDC were generated in July 2024 and January 2023, respectively, via ISB-CGC BigQuery tables. The PCaProfilter dataset (n = 1365)30 was obtained from www.PCaProfiler.com, whereas the CPGEA dataset (n = 134)31 was downloaded from https://github.com/nationstrong/CPGEA. The UWRA-CRPC dataset (n = 98)32, MSKCC dataset (n = 150)33, and other datasets were sourced from the indicated references. The datasets CamCap (n = 220)34, GSE54460 (n = 106)35, GSE197780 (n = 94)36, GSE32571 (n = 59)37, GSE134051 (n = 164)38, and GSE19959 and the datasets GSE6 (n = 112)39, GSE211856 (n = 46), GSE79021 (n = 403)40, and GSE84042 (n = 73)41 were downloaded from the Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/).

The diagnostic validation incorporated 6 independent cohorts (total n = 2,035) with neuroendocrine pathology data, while the prognostic validation included 9 independent cohorts (total n = 1,079) with clinical outcome records. Cohort characteristics are detailed in Table S5 and Supplementary data 1. The remaining data were analyzed to ascertain the correlation between clinical information.

The PCaProfilter cohort comprises data from multiple platform sources. Furthermore, the ‘combat’ function, as implemented in the ‘sva’ package, was employed to eliminate the batch effect, thus facilitating downstream analyses. Integrated meta-cohort construction involved cohort-wise Z-score and quantile normalization, followed by ComBat batch correction (‘sva’ package) using dataset source as the batch variable. Biological variance associated with disease status was preserved in the model. Validation results are provided in Figure S3E.

Furthermore, gene expression data were gathered from nine cohorts of ten cellular experiments, which included a variety of treatments, to validate the differences in target expression4249. Gene expression validation utilized prostate cancer cell lines (LNCaP, 22RV1, VCaP) treated with R1881 or enzalutamide. Vehicle-treated cells served as controls. Comparison of differential gene expression under experimental conditions that activate or suppress AR to validate the relevance of target genes to AR. Experimental designs and statistical outcomes are documented in Tables S1-6 and Fig. 8J.

Fig. 8.

Fig. 8

The key NEP100 gene AMIGO2 has potential as a new NEPC marker. (A), Comparison of DEGs in VR-NE vs. EMT-NE (y-axis) with DEGs in VR_O vs. EMT_R (x-axis). NEP100 genes are indicated by filled circles; (B), Box plot showing the expression of AMIGO2 (Z score) among the three sample-level phenotypes (left panel). UMAP plot showing the expression of AMIGO2 in the four subtypes of NE cells (middle panel). Heatmap showing the spatial distribution of AMIGO2 in multiple regions (right panel). (C) Box plots showing the expression of AMIGO2 (normalization) between NEPC and ARPC in 7 bulk transcriptomic cohorts. (D) Heatmap showing the correlation of AMIGO2 with NE- and AR-related genes in 18 bulk transcriptomic cohorts. (E), Representative IHC staining of AR and AMIGO2 in tissues from patients with CSPC, CRPC or NEPC (scale bar in the left panel: 800 μm). Scale bar in the right panel: 200 μm). (F), Expression patterns of AR and AMIGO2 among CSPC, CRPC, and NEPC. Shades of color indicate the intensity scores; (G), Heatmap showing the IHC scores of AR and AMIGO2 among different stages of PCa progression; (H), Scatter plot showing the relationships among the IHC scores of AR and AMIGO2; (I), K‒M survival curves for disease-free survival (DFS) between the high and low AMIGO2 expression groups; (J), Bar plot showing the changes in AMIGO2 expression after AR activation or inhibition in different datasets. The red bars indicate the upregulation of AMIGO2, and the red text indicates activated AR. (K), Bubble heatmap showing the correlation between sensitivity to drugs and the expression of AMIGO2. The dot size indicates the degree of statistical significance, which is colored on the basis of the correlation. Lollipop plot showing the predicted binding energy of the drug molecule to the AMIGO2 protein. 3D structure showing the key interactions and their respective types in the drug with AMIGO2. Cyan indicates hydrogen bonds; white indicates hydrophobic bonds; yellow indicates π-stacking; magenta indicates salt bridges; yellow spheres indicate aromatic ring centers; and magenta spheres represent metal ions.

Collection and verification of previous NE signatures

A total of 16 NE signatures were obtained from previous studies (Table S1-7). These include Vashchenko.200550, Lin.201451, Zhang.201552, Beltran.201625, Dasari.201753, Tsai.201754, Bluemn.201755 Henry.201856, Aggarwal.201857, Labrecque.201932, Alshalalfa.201958, Cheng.201959, Ostano.202060, Sarkar.202361, Zhang.202462, and Liu.202463. Some of these signatures contain NE-positive and NE-negative genes, which have been used for the following analyses: enrichment of previous common NE-associated genes at single-cell resolution; performance of previous signatures for pathological diagnosis of NEPC; and prediction of the prognosis of clinical outcomes in prostate cancer, including BCR and OS. To be specific, the NE risk score was calculated as the difference between the mean expression of NE-positive genes and NE-negative genes. The NE risk score served as a diagnostic classifier for NEPC (area under the receiver operating characteristic curve (AUC) evaluation) and a prognostic risk stratifier (Receiver Operating Characteristic (ROC) curves -derived cutoff). Higher AUC values indicate greater validation power.

Differential expression analysis

The differential expression of genes was analyzed via a number of methods, including the identification of the NE signature and characterization of markers of different NEPC subtypes. For the bulk transcriptomics count data, the ‘Deseq2’ package was used, whereas for the single-cell transcriptomics data, the ‘FindMarkers’ function in the ‘Seurat’ package was utilized. Differential expression thresholds were tailored to data type: bulk transcriptomics (adj.p < 0.05, |log2FC|> 1) accommodated lower resolution signals, whereas pseudobulk/single-cell transcriptomics (adj.p < 0.01, |log2FC|> 2) mitigated sparsity-driven false positives. Thresholds were empirically calibrated to retain 2,000–3,000 high-confidence differentially expressed genes (DEGs). In the case of single-cell transcriptomics, the genes that were upregulated in the NE were identified through a comparison between the NE cell clusters and all other cell clusters. The smallest percentage difference in expression between the two groups was 20%. Furthermore, NE-downregulated genes were identified through a comparison of NE cell clusters with luminal cell clusters. The minimum expression ratio between the two clusters exhibited a difference of 25% (Table S3). To identify DEGs in the NE subtypes, we employed Memento64, a tool developed by researchers at the University of California, San Francisco, who were among the first to utilize the method of moments statistical framework. The ‘Limma’ package was used to analyze DEGs between VR-NE and EMT-NE tumors in integrated NEPC bulk normalized data.

Co-expression network analysis

The WGCNA package65 was employed to generate co-expression mRNA networks for the PCaProfiler cohort. The optimal soft threshold (β) was calculated to create a scale-free network. The weighted adjacency matrix was subsequently transformed into a topological overlap matrix (TOM), and the corresponding dissimilarity (1-TOM) was generated. Module identification was conducted via the dynamic tree-cutting method. Ultimately, three modules highly correlated with NE were selected as NE-upregulated candidate genes, and those with Luminal values were selected as NE-downregulated candidate genes.

Gene set scoring and enrichment analysis

A variety of gene enrichment techniques were employed in this study, including scoring methods such as gene set enrichment analysis (GSEA), gene set variant analysis (GSVA), single-sample gene set enrichment analysis (ssGSEA), AUCell algorithms, and the ‘AddModuleScore’ function of the ‘Seurat’ package66. In particular, AUCell algorithms and the ‘AddModuleScore’ function was applied to the normalized expression values of the various genomes to obtain a score for each signature. Furthermore, to functionally annotate the NEPC subtypes, we initially performed a ssGSEA score and Z score on the normalized single-cell transcriptome data of the different subtypes. The parameters p < 0.05 and de_coef > 0.5 (subtype 3 was selected with de_coef > 1) were subsequently identified as highly differentially expressed genes (HDEGs) for each subtype and further enriched via GSEA. Additionally, ssGSEA was employed for correlation analysis between pathway activity and the NEP100 or pseudotime score. The ssGSEA algorithm derived from the ‘IOBR2’ package67 were utilized.

A function to quantify intra-tumoral heterogeneity

Shannon entropy can be used as a measure of the complexity of a system. If the more complex the system is, the more kinds of different situations that occur, the greater the Shannon entropy. On the basis of Shannon entropy, we developed a function designated THEnt to quantify intra-tumoral heterogeneity. Specifically, THEnt quantifies transcriptional heterogeneity through Shannon entropy. For bulk transcriptomics data with raw expression counts Inline graphic of Inline graphic genes:

Expression probability, forming a normalized transcriptome-wide distribution, is computed as:

graphic file with name d33e569.gif

Tumor Heterropy Entropy is derived as:

graphic file with name d33e576.gif

THEnt ranges from 0 (monoclonal dominance) to Inline graphic (maximal complexity). Higher values reflect non-uniform gene expression distributions, indicative of subclonal diversity.

Machine learning benchmark

A robust NE signature comprising 100 genes was derived through integration of: (i) differential expression in NEPC vs. ARPC and CRPC vs. CSPC, (ii) WGCNA module association with NE diagnosis, and (iii) single-cell differential expression in NE-transdifferentiated clusters and pseudobulk differential expression in NEPC (Fig. 3F). Details can be found in the results section. To further construct a robust model for predicting NEPC, six classifier machine learning algorithms were benchmarked, including random forests (RF), the cancer class, the AdaBoost classification tree (AdaBoost), boost logistic regression (LogiBoost), the weighted k-nearest neighbor classifier (KKNN), and support vector machines with class weights (SVMRadialWeights). In particular, the SU2C-2019 cohort was used as the training set, and the PCaProfliter, WCDT-MCRPC, UWRA-CRPC, WCM, MDA and Pseudobulk datasets were used as the validation set to evaluate the AUC by assessing the performance of the model. The RF algorithm with the highest mean AUC is deemed to be the best. Consequently, the importance of each feature in the RF algorithm was extracted as a weight to fit a NEPC risk score, designated NEP100, as outlined below:

graphic file with name d33e596.gif
graphic file with name d33e601.gif
graphic file with name d33e606.gif

Fig. 3.

Fig. 3

Integration of bulk and single-cell transcriptomics screening for NE signatures. (A), Bar plot showing the sample composition of the PCaProfilter cohort; (B), definition of tumor heterogeneous entropy (THEnt) based on Shannon entropy; (C), box plots showing THE across sample types; (D), dot plot showing the distribution of PCa samples in the PCaProfilter cohort after principal component analysis (PCA), with a single dot being a sample. The bar plot shows the average gene expression of the normalized samples. (E), Heatmap showing the relationships of genetic modules with clinical traits according to WGCNA. Spearman correlation test; All heatmaps shown in this article are provided by the ‘ComplexHeatmap’ package (version 2.22.0, https://bioconductor.org/packages/ComplexHeatmap/). (F), Venn plot showing 5 methods to identify up-regulated NE signature in this study (NEPup sig); (G), Volcano plot showing differentially expressed genes (DEGs) in NEPC vs. ARPC. (H) Bar plot showing the results of pathway enrichment analysis of DEGs. (I), Violin plot showing NEPup sig across cell types; lines inside represent the mean ± SD. (J), Box plot showing the NE signature identified in this study (NEPup sig and NEPdown sig) across sample types. Unpaired two-sided Wilcoxon test; (K), Heatmap showing all NE signatures (both from the previous study and this study) across cell types. Bar plot highlighting differences between luminal and NE cells. (L), Heatmap showing the expression (Z score) of 90 upregulated and 10 downregulated NE feature genes identified in this study.

To validate the prognostic value of the NEP100, the NEP100 scores were calculated for each of the samples in the nine cohorts containing information on clinical outcomes. The optimal threshold is determined on the basis of the NEP100 score of each cohort via the ‘surv_cutpoint’ function in the ‘survminer’ R package. The samples are then divided into two groups, comprising those with high and low NEP100 scores, on the basis of the aforementioned optimal threshold. Furthermore, we utilize 100 combinations of 12 classical machine learning algorithms, including the elastic network (Enet), least absolute shrinkage and selection operator (Lasso), ridge, stepwise Cox (StepCox), CoxBoost, and partial least squares regression for Cox (PLSRCox). The supervised principal components (SuperPC), gradient boosting machine (GBM), survival support vector machine (survivalSVM), eXtreme gradient boosting (XGBoost), conditional inference forests (CForest), and classification trees (CTree) were used to benchmark the NEP100 included features. The combined machine learning algorithms were employed by WCDT-MCRPC as the training set, while the remaining SU2C-2019, MCTP, CamCap, CPGEA, GSE54460, MSKCC, TCGA, and GSE197780 datasets were utilized as the validation set. The performance was estimated by comparing the average consistency index (C-index).

Meta-analysis of cox regression and survival analysis

Hazard ratios (HRs) with 95% confidence intervals (CIs), log-rank P values, and Kaplan‒Meier (KM) curves were calculated via the ‘survival’ package. The ‘ggsurvfit’ package was employed for visualizing the data. A meta-analysis was performed using the ‘MIME’ package68, which implements a random effects model with DerSimonian-Laird estimator for heterogeneity adjustment. Effect sizes (HRs) were pooled via inverse-variance weighting. Heterogeneity was quantified using I2 statistics, with sensitivity analyses conducted by excluding outlier studies (I2 > 50%). Publication bias was assessed via Egger’s test and funnel plot asymmetry. Studies were included if they (i) provided raw or processed transcriptomic data, (ii) Reported status of resistance to castration treatment and (iii) had ≥ 20 samples to ensure statistical power. Exclusion criteria encompassed (i) non-human studies, (ii) case reports/series, and (iii) datasets with incomplete prognostic outcomes and follow-up time.

Inference of NEPC subtype master regulons

Candidate Master Transcription Factors (MTFs) for each NEPC subtype were identified using the Cancer Core Transcription Factor Specificity (CaCTS) framework69. Briefly, we selected TFs meeting both criteria: (i) Top 5% by expression rank within the subtype, (ii) Top 5% by Jensen–Shannon divergence (JSD)-based CaCTS score (measuring subtype-specific expression). This dual-threshold approach ensures MTFs are both highly expressed and subtype-specific. The JSD-based CaCTS score quantifies the cancer type-specific expression of genes by measuring the similarity between their expression profile and an idealized cancer-specific distribution. Initially, normalized expression values are shifted to eliminate zeros and transformed into unit vectors (divided by their Euclidean norm). For each gene i and cancer type j, the JSD is computed between the gene’s unit vector (representing its pan-cancer expression distribution) and a unit vector where only cancer type j has a value of 1 (idealized specificity). The CaCTS score is derived as log10(JSD), ensuring higher scores reflect stronger cancer type specificity (lower JSD indicates greater alignment with the idealized distribution). This metric prioritizes genes with both high expression and subtype-restricted activity, enabling robust identification of master transcriptional regulators.

Sample-level tumor subtyping analysis of NEPC

To analyze tumor subtypes at the sample level, all NEPC samples were selected, and the NE cell subtype composition of each sample was calculated via the ssGSEA algorithm. This was performed using the top 50 DEGs as a marker for each NE cell subtype (Table S10). Hierarchical clustering was subsequently conducted for unsupervised classification. Sample-level tumor subtypes were defined by comparing the percentages of the four subtypes in different clusters72. Additionally, the stability of sample-level tumor subtypes was verified by constructing a ternary graph using the labeled genes of each subtype. The ternary map was constructed according to the proportion of different subtype-labeled genes expressed by cells (greater than the average expression) and generated via the ‘vcd’ package.

Drug response prediction

The chemotherapeutic responses across the various NEPC phenotypes were imputed via the ‘oncoPredict’ package. Transcriptome profiling and drug screening data for the cell lines were obtained from the Cancer Cell Line Encyclopedia (CCLE) and the Cancer Therapeutics Response Portal (CTRP), respectively. These data were used as a training dataset to construct linear regression models for each drug. After the drug response models were fitted, they were applied to the expression matrices of the NEPC samples to derive drug sensitivity (IC50) predictions for each sample.

Furthermore, the online database GSCA (https://guolab.wchscu.cn/GSCA)70 was employed to establish a correlation between target genes and drug sensitivity, utilizing two drug sensitivity databases, namely, Genomics of Drug Sensitivity in Cancer (GDSC) and CTRP. A total of 12 drugs were identified as potential candidates because their sensitivity was positively correlated with the expression of the target genes, as determined by the intersection of the results from the two databases.

To ascertain the binding affinity and interaction pattern of the target genes with the drug candidates, molecular docking was employed as the verification technique. The drug candidates were extracted from the PubChem Compound Database (https://pubchem.ncbi.nlm.nih.gov). For the purposes of the docking analysis, all protein and molecular files were converted to PDBQT format, with the exclusion of all water molecules and the addition of polar hydrogen atoms. Subsequently, molecular docking simulations were performed using the computerized protein‒ligand docking software Autodock Vina 1.2.2 (https://autodock.scripps.edu)71 to extract the minimum binding energy for molecular docking. Binding energy represents the energy released during the formation of a stable ligand-receptor complex, with more negative values indicating stronger binding stability. To prioritize candidates, we ranked all ligands by ascending binding energy and selected the two compounds with the lowest Binding energy values. The most favorable predicted binding sites were subsequently visualized via PyMOL (https://pymol.org).

Collection of clinical PCa samples

Tumor paraffin-embedded specimens were collected from 10 patients with PCa between June 2023 and June 2024 with the permission of the Department of Pathology. The specimens were independently reviewed by two pathologists to confirm the diagnosis of PCa and to determine whether it was NEPC. NEPC diagnosis required (i) small cell morphology with AR/PSA loss, (ii) Syn/CgA positivity, and (iii) prior ADT exposure.

The collection of patient samples was conducted in accordance with ethical standards set forth in the Declaration of Helsinki, and written informed consent was obtained from all participants. The study was approved by the Ethics Committee of the Fourth Affiliated Hospital of Zhejiang University Medical College, Yiwu, Zhejiang, China.

Immunohistochemistry

The tissue slides were deparaffinized in xylene, rehydrated in graded ethanol, and incubated in citrate buffer (10 mM trisodium citrate, pH 6.0) at 95 °C for 20 min to retrieve the antigen. The plates were subsequently sealed for 30 min at room temperature with 10% rabbit serum, after which they were incubated with primary antibodies at 4 °C overnight. The antibodies used included AMIGO2 (1:100, catalog number: 821607, Zenbio) and AR (1:100, catalog number: R380686, Zenbio). The tissue sections were subsequently incubated with the secondary antibody (at 37 °C for 30 min), stained with DAB for 30 s, and finally stained with hematoxylin as a nuclear indicator. Images were acquired via Pannoramic Scan II. Protein expression was quantified using the IHC Profiler plug-in in ImageJ 1.57j software (NIH, USA). IHC images were first subjected to color deconvolution to isolate the 3,3’-diaminobenzidine (DAB) chromogen signal. The average gray value (representing staining intensity) and percentage of positively stained area were automatically calculated by the plug-in. These metrics were combined to classify protein expression into four categories: High Positive (3 +), Positive (2 +), Low Positive (1 +), and Negative (0), based on predefined thresholds within the IHC Profiler algorithm72. All analyses were performed on three non-overlapping fields per slide to ensure representativeness. Furthermore, while the IHC Profiler plug-in reduces subjectivity through automated scoring, this method assumes uniform staining homogeneity and does not account for potential regional heterogeneity in tissue sections. Additionally, inter-observer variability was not assessed due to the fully automated workflow. Future studies would benefit from manual validation by pathologists and inter-laboratory reproducibility testing.

Statistical analysis

All the statistical analyses and data visualizations were conducted via R (version 4.3.2) or Python (version 3.13). Spearman’s rank correlation coefficient was used to ascertain the degree of correlation between continuous variables. Comparisons of continuous variables after the Z score were conducted via the unpaired t test, whereas comparisons of continuous variables between the remaining groups were performed via the Wilcoxon rank sum test. In all the box plots, the lower and upper ends of the boxes indicate the 25th and 75th percentiles, respectively, and the centerline indicates the median. All the statistical tests were two-sided, and a p value of less than 0.05 was considered statistically significant. All methods were performed in accordance with relevant guidelines and regulations. All heatmaps shown in this article are provided by the ‘ComplexHeatmap’ package (version 2.22.0, https://bioconductor.org/packages/ComplexHeatmap/).

Results

Identification of a robust NE signature

To accurately identify and establish a robust NE signature at single-cell resolution, 70 PCa single-cell samples were initially collected, comprising castration-sensitive prostate cancer (CSPC), metastatic CSPC (mCSPC), CRPC, mCRPC, and NEPC samples derived from nine publicly available cohorts. Following rigorous quality control, a single-cell atlas of human prostate cancer was constructed, comprising 196,309 cells, of which 9% were identified as NE cells (Fig. 2A). Additionally, 12 cell types were discerned via the corresponding dual markers (Fig. 2B). In parallel, we obtained 16 NE signatures from previous studies. However, the ‘AddModuleScore’ function of the ‘Seurat’ package revealed that these signatures lacked the capacity to identify NE cells at single-cell resolution (Fig. 2C; Figure S1A) and NEPC at bulk resolution (Fig. 2D; Figure S2B). To address individual methodological limitations, we further validated these findings using AUCell, an independent algorithm quantifying gene set activity across cell clusters (Extended Fig. 1). Both methods consistently showed poor performance of existing signatures. Furthermore, these genes were not significantly correlated with the ISUP (Fig. 2D). Furthermore, the top 50 genes with the greatest number of overlaps in the 16 signatures were selected (Figure S2A). Some genes presented low expression levels (< 20% in all NE cells) and lacked specificity for NEPC at bulk resolution (Fig. 2E).

Fig. 2.

Fig. 2

Large-scale comprehensive single-cell atlas of human prostate cancer (PCa). (A), A comprehensive analysis of 196,309 cells from 70 PCa tissues; (B), Violin plot showing the expression levels of selected signature genes in PCa tissues; (C), Violin plot showing previous neuroendocrine (NE) signatures across cell types. The lines inside represent the means ± SDs; (D), Box plots showing previous NE signatures across sample types. Unpaired two-sided Wilcoxon test. Correlation plots showing the correlation of previous NE signatures with the ISUP. Spearman correlation test; (E), Bubble heatmap showing the expression of the top 50 previously common genes in PCa tissues. The dot size indicates the fraction of expressing cells, which are colored on the basis of average normalized expression levels.

The PCaProfiler cohort (n = 1365, NEPC = 34, Fig. 3A) was selected as the primary source of bulk transcriptomics data. To gain greater insight into the intra-tumoral heterogeneity of PCa at various stages of progression, we developed a function based on Shannon entropy for the quantification of tumor heterogeneity (Fig. 3B). The results demonstrated that the intra-tumoral heterogeneity of NEPC is markedly greater than that of other PCa subtypes (Fig. 3C). Accordingly, we devised a pipeline for the identification of 100 high-quality NE-specific feature genes. First, following debatching and normalization (Fig. 3D), the CRPC samples were analyzed via the WGCNA-specified parameters, as detailed in the methodology section. This allowed the selection of gene modules that were highly associated with NE or Luminal (cor > 0.25, with the exception of module 0, which was not classified into any clusters) (Fig. 3E; Figure S2B). Furthermore, the differentially expressed genes (DEGs) of CRPC vs. CSPC (Figure S2C) and NEPC vs. ARPC (Fig. 3G) in the PCaProfiler cohort were integrated. Moreover, our findings revealed that the DEGs in both NEPC and CRPC patients were significantly enriched in pathways associated with neuronal differentiation (Fig. 3H). We subsequently conducted a comprehensive screening for NE cell-specific markers at single-cell resolution, as well as for pseudobulcers, on the basis of the prostate cancer single-cell atlas that we previously established (Table S3). A total of 100 high-quality NE cell gene features were ultimately identified, of which 90 NE-specific upregulated genes were designated NEPup sig (Fig. 3F) and 10 downregulated genes were designated NEPdown sig (Figure S2D). We subsequently observed that these 100 features were capable of effectively distinguishing between NE cells and other cell types at the single-cell level (Fig. 3I; Figure S2E) and between NEPC and ARPC at the bulk sample level (Fig. 3J). Moreover, in comparison with the 16 previously published NE signatures, both the NEPup sig and the NEPdown sig exhibited more pronounced NE-specific differences in the module score (Fig. 3K). At the level of gene expression, all 100 features exhibited high specificity for NE cells, and all the NEPup sig genes presented a relatively high percentage of expression abundance compared to some of the genes in the previously published signatures (Fig. 3L). In conclusion, a multi-omics approach was employed to identify a robust NE signature that is more specific than previous signatures.

Establishment and validation of an NEPC classifier based on machine learning

To further construct a powerful NEPC classifier, six classical machine learning algorithms are employed on the basis of the NEPup sig and NEPdown sig, with SU2C-2019 used as the training set and the remaining six validation sets containing NEPC. The RF algorithm was identified as the optimal model on the basis of the mean AUC, which was employed as the evaluation criterion for model performance (Fig. 4A). Interestingly, the model based on THEnt performs well in predicting NEPC (AUC = 0.931). The importance of each feature gene generated by the RF algorithm (Fig. 4B; Figure S3A; Table S4) was subsequently extracted, and a novel classifier was constructed on the basis of a weighted sum of feature genes, designated NEP100 (Fig. 4C). Additionally, a comparison was conducted between NEP100 and the 16 previous NEPC signatures, with the AUC calculated. The results demonstrated that, compared with the previous NEPC signatures, NEP100 exhibited superior performance (Fig. 4A).

Fig. 4.

Fig. 4

Establishment and validation of the NEPC classifier. (A), The area under the curve (AUC) of the 6 algorithms and 17 NE signatures in the 7 validation cohorts. The error bars denote the SDs. (B), Bar plot showing the importance of the 100 NE feature genes inferred via random forest (RF). Greater importance suggests greater contributions to the RF model when predicting NEPC diagnosis. (C) Definition of the NEPC classifier (NEP100) based on the RF algorithm. (D-F), Violin plots showing NEPup sig across cell types in external single-cell validation datasets. Lines inside represent the mean ± SD; box plot showing NE vs. luminal cells for NEP100. Unpaired two-sided Wilcoxon test; (G), Box plot showing the distribution of NEP100 among different NE features in PDX tumors (n = 112). Unpaired two-sided Wilcoxon testBox plot showing the distribution of NEP100 among different types of PCa progression in organoid tumors. Unpaired two-sided Wilcoxon test; (I), H&E staining and heatmaps of the spatial; (H), distributions of NEPup, NEPdown and NEP100 in multiple regions.

To validate the external performance of NEP100, we selected six additional datasets from a multi-omics perspective. These included three datasets containing NEPC samples from single-cell transcriptomics: Natmed.He.2021 and GSE206962. GSE266955 revealed that the module scores of the NEPup sig in all three datasets exhibited greater reliability in terms of NE specificity. Interestingly, the NEP100 score was significantly different between NE cells and luminal cells (Fig. 4D-G-F). At the level of bulk transcriptomics, the dataset GSE199596 comprises 112 PDXs (Fig. 4G), whereas GSE237602 contains 10 organoids (Fig. 4H). Our findings indicate a strong positive correlation between the NEP100 score and the progression of prostate cancer from an androgen-dependent state to a resistant state, ultimately leading to neuroendocrine spectrum transformation. Finally, at the spatial transcriptomics level, the pathologist-identified NEPC and CSPC coexistence dataset GSE230282 was selected for analysis. The results demonstrated that the NEP100 score was effective in differentiating between NE cells and luminal cells (Fig. 4I). While NEP100 demonstrates superior robustness compared to existing signatures and machine learning models, several limitations warrant consideration: (i) Cohort bias: Despite integrating multi-cohort data (n = 9), the underrepresentation of non-Western populations (e.g., Asian/African ancestry) may limit global applicability., (ii) Overfitting risk: NEP100 underlying RF algorithm might be prone to overfitting, where the models perform well on the training data but poorly on new, unseen external validation data, potentially affecting the generalizability of NEP100, and (iii) biological heterogeneity: NEP100 performance varied across NEPC molecular subtypes, necessitating subtype-specific adaptation. Future studies will prioritize multi-ancestry validation and single-cell spatial validation to resolve tumor heterogeneity.

Predictive value of the NEP100 for prognosis and clinical indicators

To further investigate the potential prognostic value of NEP100 in prostate cancer, we assembled a dataset comprising three CRPC and seven CSPC cohorts, including 1079 samples with associated clinical outcome data. Notably, in the CRPC cohort, we utilized overall survival (OS) as the clinical outcome, whereas in the CSPC cohort, we employed biochemical recurrence (BCR) or disease-free survival (DFS)21. We calculated NEP100 scores for each sample individually. The optimal threshold was determined via the ‘survminer’ package on the basis of the NEP100 score of each cohort (Table S5). The samples were then divided into two groups, designated high and low NEP100, on the basis of the optimal threshold. A comparison of 100 combinations of machine learning algorithms (excluding the RF algorithm) revealed that the NEP100 score (NEP100: c-index = 0.607) and the NEP100 risk grouping (NEP100_group: c-index = 0.617) presented the highest C-index values (Fig. 5A). Furthermore, the NEP100 has greater prognostic predictive strength than previously published prognostic models do (Fig. 5A; Table S6). Furthermore, a meta-analysis of Cox regression data was conducted, which demonstrated that NEP100 is a risk factor for poor prognosis in PCa patients across all cohorts (meta-HR (95% CI): 3.03 (2.38–3.85)) (Fig. 5B). All individual cohorts provided multivariable-adjusted HRs (covariates: age, Gleason grade, et al.), with results visualized in Extended Fig. 2. The consistency of adjusted HRs across cohorts further mitigates confounding concerns. A higher NEP100 was significantly correlated with shorter OS in patients with CRPC and shorter BCR or DFS in CSPC patients (Fig. 5C; Figure S3B).

Fig. 5.

Fig. 5

Prognostic validation of NEP100 in multiple human PCa cohorts. (A), C-indexes of the top 50 algorithmic combinations (excluding RF) and 17 NE signatures in the 9 validation cohorts. The error bars denote the SDs. (B) Meta-analysis of univariate Cox analysis results for the NEP100_group among different cohorts. (C), K‒M survival curves for overall survival (OS) or biochemical recurrence (BCR) among different cohorts. (D), Box plots showing the NEP100 between pre- and post-castration treatment in PCa patients. ENZ, enzalutamide; (F), Box plots showing NEP100 between different groups of Gleason scores (GSs) in PCa; (F), Heatmap showing the correlation of NEP100 with the activities of multiple signaling pathways in 18 bulk transcriptomic cohorts.

In the GSE197780, GSE211856 and GSE240056 cohorts, we observed a notable elevation in NEP100 scores between patients and mice receiving androgen deprivation therapy (both surgical and pharmacologic) compared with those in the exposed group. These findings suggest that NEP100 may be associated with drug resistance (Fig. 5D). Furthermore, we confirmed that NEP100 scores are associated with higher Gleason scores (GSs) in the PCaProfilter, GSE32571, and GSE134051 cohorts (Fig. 5E).

Interestingly, our findings across 18 cohorts, including both CRPC and CSPC, revealed a positive correlation between NEP100 and several biomolecular characteristics associated with unfavorable prognostic outcomes. Specifically, these characteristics include cell proliferation, immunosuppression, genomic instability, and cell cycle progression. Detailed information regarding these pathways can be found in Fig. 5F, and the specific methods, including the use of the ssGSEA algorithm for scoring and the Spearman rank correlation coefficient for quantifying correlations, are described in the methodology section.

In conclusion, NEP100 demonstrates robust prognostic value for prostate cancer progression and survival outcomes across diverse cohorts (Fig. 5A-C). Its association with post-ADT progression (Fig. 5D) further suggests potential utility in identifying therapy-resistant trajectories, warranting prospective validation.

Novel classification of NEPC based on NEP100

In view of the considerable heterogeneity observed within NE cells and the objective of assessing the typing value of NEP100 for NEPC, hierarchical clustering analysis was conducted on the basis of the varying expression patterns of the NEP100 feature genes observed in all NE cells. This resulted in the formation of four subtypes75 (Fig. 6A). The aforementioned clusters were designated subtype 1 (g0), subtype 2 (g2, g3, g5, g6, g7), subtype 3 (g4), and subtype 4 (g1, g8) (Fig. 6B).

Fig. 6.

Fig. 6

Single-cell heterogeneity landscape of the four subtypes of NE cells. (A), Heatmap showing the correlation of NEP100 sig expression between NE subtypes. (B), UMAP plot showing the four subtypes of NE cells. (C), GSVA enrichment analysis showing the activation status of biological pathways among the four subtypes. (D), Differential gene expression analysis showing up- and downregulated genes across all four subtypes. An adjusted p value < 0.05 is indicated in red. (E), UMAP plots showing the expression of marker genes of the four subtypes in NE cells. (F), Bubble heatmap showing the expression of three key biological pathway genes of the four subtypes in NE cells. The dot size indicates the fraction of expressing cells, which are colored on the basis of average normalized expression levels. (G), Bar plot showing the GO enrichment of specific biological processes, which is based on the highly differentially expressed genes (HDEGs) of three subtypes of tumor cells. (H), Top master TF regulators of each subtype inferred via CaCTS. The color and size of each point were correlated with the normalized values of the CaCTS score and TF expression, respectively. (I), Differential expression profiles of subtype markers, AR-regulated genes (AR panel), REST-repressed genes (NEURO I panel), NE-associated TFs (NEURO II panel), and mesenchymal differentiation genes (MES panel) among the four subtypes of NE cells. Red and blue indicate high and low expression, respectively. (J) Violin plots showing the expression patterns of each classic NE marker and NEPup sig among the four subtypes. The lines inside represent the means ± SDs; (K), Box plot showing the differences in NEP100 among the four subtypes.

To gain further insight into the four NE cell subtypes, a functional analysis was conducted on the four subclusters. GSVA revealed that subtype 1 exhibited elevated protein secretion scores, whereas subtypes 2 and 3 presented increased scores for the cell cycle and cell proliferation-related pathways, and subtype 4 presented a higher EMT score76 (Fig. 6C). To further increase the stability of these functional enrichment results, we proceeded to identify the HDEGs of the four subtypes77 (Fig. 6D; Table S7). The majority of the DEGs in subtype 1 were found to be related predominantly to vesicle release-related genes, including CHGB. The top HDEGs of subtypes 2 and 3 included genes that instruct proliferation, such as STMN1. The top HDEG of subtype 4 included VIM, which actively promotes tumor invasion and metastasis (Fig. 6E-F; Figure S3B). A GSEA based on the GO database was performed on these genes, and the results revealed similar gene function enrichment (Fig. 6G).

We employed CaCTS to identify the master regulator of each subtype to ascertain its master regulons78 (Fig. 6H; Table S8). Furthermore, the expression of markers characteristic of the classical subtype of small cell lung carcinoma was investigated79, including NEUROD1, POU2F3, and YAP1. Ultimately, subtype 1 was identified as VR_O because of its activation of vesicles releasing (VR) and ONECUT280. Both Subtype 2 and Subtype 3 exhibited many HDEGs associated with cell proliferation, as evidenced by the high expression and transcriptional activity of NEUROD1 and POU2F3. Consequently, subtype 2 was designated Prol_N, and subtype 3 was designated Prol_P. Additionally, the high expression of the anti-neuroendocrine factor YAP181 and the differentiation inhibitor of differentiation family (ID1, ID3), whose HDEGs are more focused on cell migration and EMT, was defined as EMT_Y. Thus, four regulators (ONECUT2, NEUROD1, POU2F3, and YAP1) and three main biological characteristics (VR, proliferation, and EMT) were identified as the most important for NEPC. Interestingly, our categorization was compared with that of previous PCa typing methods32, in which Prol_P was identified as an amphicrine phenotype (AR + ; NE +) of interest because of its expression of AR-associated genes and REST-repressed neuronal factors (NEURO I) but lack of expression of NE-associated transcription factors (NEURO II). Furthermore, EMT_Y exhibited characteristics of double-negative PCa (NE-, AR-; in general, these cells lack AR, NEURO I, and NEURO II panel genes and exhibit a mesenchymal phenotype. In addition, VR_O and Prol_N display features of small-cell NEPCs with AR-NE + (Fig. 6I).

To verify the general applicability of NEP100 in different NEPC subtypes, a comparative analysis of classical NEPC markers, including SYP, CHGA, NCAM1, and ENO2 6 (Fig. 6J), was subsequently conducted. The heterogeneity of classical NE markers underscores the limitations of relying solely on SYP/CHGA et, al. for NEPC diagnosis. NEP100 as subtype-specific molecular profiling may mitigate false-negative risks. (Fig. 6J-K). Additionally, EMT_Y presented relatively low NEP100 levels, which correlated with its lower degree of NE differentiation81 (Fig. 6K).

Trajectories of NEPC subtypes and sample-level heterogeneity

To determine the value of NEP100 in the prediction of NEPC developmental trajectories, a proposed time series analysis of the four subtypes was conducted, and the heterogeneous entropy of each cell was calculated via the TNEnt function (Fig. 7A). The results of the correlation analysis indicated a positive correlation between the NEP100 and TNEnt scores and tumor progression (Fig. 7B). Moreover, a positive correlation was identified between the NEP100 and TNEnt scores, indicating that the NEP100 may serve as a valuable tool for predicting NEPC progression and heterogeneity (Fig. 7B). Notably, a positive correlation was observed between genes associated with hypoxia and cell proliferation and pseudotime scores, whereas a negative correlation was evident with cell migration. Alternatively, there was no significant correlation between VR and pseudotime scores (Fig. 7C; Figure S3D). These findings indicate that NEPC progression is marked by dynamic shifts in prolifer-ative activation and EMT inhibition, with subtype-specific vesicle release phenotypes requiring further mechanistic validation. A comparison of the pseudotime scores of the four subtypes with PSA levels revealed that VR_O may occur early in the disease and be insidious due to low PSA levels (Fig. 7E-G; Table S9). Additionally, low TNEnt values suggest internal compositional stability (Fig. 7D).

Fig. 7.

Fig. 7

Trajectories of NEPC subtypes and sample-level heterogeneity. (A), UMAP plot showing the distribution of NEP100, THEnt and pseudotime analysis of NEPC subtypes inferred by Monocle3; (B), Scatter plot showing the relationships among NEP100, THEnt and pseudotime score; (C), Bar chart showing the relationships between pseudotime score and certain biological processes. Red and blue indicate positive and negative correlations, respectively. (D-E), Box plot showing the THEnt and pseudotime scores among the four subtypes. (F), Fan chart showing the percentages of the four subtypes in the NEPC sample. (G), Box plots showing the PSA levels among the four subtypes. (H), Heatmap of the single-sample gene set enrichment analysis (ssGSEA) scores of the four subtypes in the NEPC sample. Red indicates a greater proportion of certain subtypes. (I), The ternary plot is positioned according to the proportion of different isoforms of marker genes expressed by the cell (greater than the average expression), and the three vertices of the graph correspond to cells that express only a certain isoform of marker genes. Cells expressing the same number of isoforms of marker genes are located in the center of the plot. (J), Box plot showing the expression of marker genes (Z score) among the three sample-level phenotypes. (K), Bubble heatmap showing the comparison among the three sample-level phenotypes in terms of relative sensitivity to platinum and topoisomerase (TOP) inhibitors, DNA damage agents, antimetabolites, and inhibitors of BCL2, Aurora kinase (AURK), and PARP. The dot size indicates the proportion of samples with drug resistance, colored on the basis of the half maximal inhibitory concentration (IC50, Z score); (L), Box plots showing indicators related to immunotherapy among the three sample-level phenotypes;

To elucidate the composition of the four subtypes in the bulk NEPC samples, we first integrated NEPC samples from multiple cohorts and deployed the ssGSEA algorithm after decatch effects to calculate the NE cell subtype composition in the NEPC sample (Figure S3E). Notably, the majority of the samples exhibited a dominant mixed-NE phenotype (n = 26) or a low-specification phenotype (low-spec) (Fig. 7H). In contrast, the VR-NE phenotype was characterized as NEPC dominated by VR_O cells (n = 17). The remaining samples were classified as NEPC dominated by EMT_O cells (EMT-NE; n = 13). The low-specity phenotype was excluded from further analysis because of its low specificity. To validate the reliability of this classification, we quantified the elevated expression of NE subtype markers in each NEPC sample (Fig. 7I). Furthermore, VR-NE presented high expression of CHGB and ONECUT2, whereas EMT-NE presented high expression of VIM and YAP1. These findings corroborate the single-cell classification of subtypes, thereby substantiating the consistency of NEPC classification at the single-cell and bulk sample levels. (Fig. 7J).

Notably, drug sensitivity analysis revealed that VR-NEs were resistant to a range of chemotherapeutic agents, mainly topoisomerase (TOP) inhibitors and DNA damage agents (Fig. 7K; Table S11). Moreover, immune infiltration analysis revealed that the VR-NE phenotype was associated with a lower degree of immune cell infiltration and inflammatory response, as well as lower expression of PTPRC and PD-L1 (Fig. 7L; Figure S3F). These findings indicate that patients with the VR-NE phenotype may exhibit a worse immunotherapeutic response. In conclusion, the VR-NE phenotype, which is dominated by VR_O cells and can develop early in the development of NEPC, has poor drug responses and therefore requires a novel target.

The key NEP100 gene shows promise as a novel marker for NEPC

To identify novel markers of NEPC, particularly the VR-NE phenotype, we undertook a screening process at both the single-cell and bulk transcriptomics levels. This entailed the convergence of VR_O subtype-specific genes and highly differentially expressed genes belonging to the NEP100 signature genes in the VR-NE phenotype (Fig. 8A; Table S12). Among these genes, one of the top three genes was AMIGO2, which has been identified as a cytoplasmic membrane adhesion molecule with high importance for involvement in cell adhesion and synapse assembly82,83. Accordingly, AMIGO2 was selected for assessment as a potential marker for NEPC (Fig. 8A). First, the marker specificity of AMIGO2 in the VR-NE phenotype was validated at the sample, single-cell and spatial transcriptomics levels (Fig. 8B). Moreover, in seven external cohorts, NEPC exhibited higher AMIGO2 expression than ARPC did (Fig. 8C). Interestingly, an analysis of 18 PCa datasets revealed a general positive correlation between AMIGO2 and NE-related genes, as well as a negative correlation with AR-related genes (Fig. 8D). To validate this phenomenon, 10 samples from patients undergoing radical prostatectomy were collected and assessed for AR and AMIGO2 expression in CSPC, CRPC, and NEPC via immunohistochemistry (IHC) technology (Fig. 8E). Subsequently, ImageJ software was used to analyze the staining intensity and the range of positive staining, and the IHC score was subsequently calculated (Fig. 8F). A comparison of the PCa of the three different stages revealed that the CSPC group presented with AR + AMIGO2-, whereas the NEPC group presented with AR- AMIGO2 + (Fig. 8G). Spearman correlation analysis revealed a negative correlation between AR and AMIGO2 (Fig. 8H), whereas KM survival curves indicated that high AMIGO2 expression was associated with a poor prognosis (Fig. 8I). To further elucidate the relationship between AR-related genes and AMIGO2, a meta-dataset comprising ten cellular experiments was assembled. The results demonstrated that AMIGO2 expression was downregulated in cells treated with AR activators and upregulated in cells subjected to AR inhibition (Fig. 8J). The available evidence suggests that PCa may exhibit high expression of AMIGO2 following treatment with androgen deprivation therapy. These findings suggest that AMIGO2 may represent a promising target for the treatment of NEPC, particularly the VR-NE phenotype, with the potential to improve prognosis. Thus, a sensitivity analysis was conducted to identify potential drugs for AMIGO2. This analysis revealed 12 drugs whose sensitivities were positively correlated with AMIGO2 expression (Table S13). Subsequently, molecular docking was employed to model the binding of these drugs, with the objective of exploring their potential therapeutic value (Fig. 8K). The results demonstrated that PIK-93 and OSI-930 form visible hydrogen bonds, exhibit strong electrostatic interactions, and possess low binding energies with the AMIGO2 protein, indicating their potential therapeutic efficacy (Fig. 8K). While molecular docking offers predictive binding hypotheses, the lack of experimental validation (e.g., SPR, ITC) remains a limitation. Future studies will prioritize synthesis and biophysical validation of top candidates to confirm these computational predictions.

Disscusion & conclusion

As a PCa type with high aggressiveness and heterogeneity, NEPC lacks sensitive and specific markers and therapeutic targets for early diagnosis. Furthermore, the advent of single-cell sequencing technology has ushered in a new era of single-cell resolution in the analysis of tumor heterogeneity, whereas previous studies were generally based on bulk transcriptomics data25,32,5061. To this end, we combine multi-omics, including both single-cell and bulk transcriptomics, and apply multiple machine learning algorithms to identify a high-quality NE signature and construct a novel NEPC classifier and classification.

In this comprehensive study, we first collected a total of 70 PCa single-cell samples from nine publicly available cohorts. Following rigorous quality control procedures, a single-cell atlas comprising 196,309 cells was created, of which 9% were identified as NE cells. Within this framework and in combination with bulk transcriptomics, a total of 100 high-quality NE-specific feature genes were identified and differentiated into NEPup sig and NEPdown sig. These signatures demonstrated superior performance in distinguishing NE cells from other cell types at both the single-cell and bulk transcriptomics levels, exceeding the capabilities of previously established NE signatures.

To develop an effective classifier for NEPC, six classical machine learning algorithms were employed. The RF algorithm, which demonstrated a high AUC score, was ultimately identified as the most effective model. This RF-based model, designated NEP100, which robustness was rigorously assessed through cross-validation and external testing across six cohorts representing distinct disease states (Methods 3.9, Fig. 4). While this multi-cohort validation underscores generalizability, further prospective studies are needed to confirm clinical utility.

This study examined the prognostic significance of NEP100 in PCa and revealed that elevated NEP100 scores were associated with poorer clinical outcomes in CRPC and CSPC cohorts. Furthermore, the prognostic predictive power of the NEP100 was found to be stronger than that of published models (Results 4.3). Furthermore, NEP100 was found to be significantly correlated with responses to androgen deprivation therapy, Gleason scores, and multiple pathways associated with prostate cancer progression, including proliferation and the cell cycle73,74. These results indicate that NEP100 may serve as a valuable biomarker for identifying patients at high risk of disease progression and poor prognosis. While NEP100 correlates with post-ADT progression, direct mechanistic links to drug resistance require experimental validation in preclinical models.

Furthermore, the study identified four distinct isoforms through hierarchical cluster analysis of NEPC, characterized by varying expression patterns of the NEP100 feature genes in NE cell clusters. The subtypes were subsequently annotated via functional enrichment76 and the identification of MTFs78. These findings enabled the definition of the isoforms as VR_O, Prol_N, Prol_P, and EMT_Y, each exhibiting specific gene expression and biological features. While subtype classifications provide mechanistic insights into NEPC heterogeneity, prospective trials are needed to validate their prognostic and therapeutic relevance. Current analyses prioritize biological plausibility over direct clinical endpoints. Furthermore, this study investigated the developmental trajectory of these subtypes, with NEP100 being found to be positively correlated with the degree of neuroendocrine differentiation, disease progression, and heterogeneity. In contrast to the heterogeneity of classical NEPC markers, NEP100 was ubiquitously expressed in all NEPC subtypes. In sample-level NEPC, the VR-NE phenotype, which is predominantly VR_O cells, was observed to exhibit resistance to chemotherapy and an attenuated response to immunotherapy. This presents a significant challenge for the treatment of patients with this trait. While computational models suggest VR-NE phenotypes may resist chemotherapy (e.g., topoisomerase inhibitors) and respond poorly to immunotherapy, these hypotheses require experimental validation, as detailed in Results.

These findings underscore the imperative for the development of targeted therapeutic strategies that are explicitly designed to address NEPC, especially the VR-NE phenotype. AMIGO2, which is a marker for VR_O, is highly expressed in the VR-NE phenotype. Previous research has demonstrated the associations between AMIGO2 and the progression and metastasis of a range of tumors82,83. This study revealed a correlation between elevated AMIGO2 expression and PCa androgen deprivation therapy resistance, as well as a poorer prognosis. The prioritization of AMIGO2 and its associated pathways could facilitate the development of innovative therapies that improve the prognosis of NEPC patients. However, the small cohort size (n = 10) limits broad generalizability of the inverse correlation between AMIGO2 and ADT resistance. Future studies with expanded cohorts are warranted to validate these preliminary findings. Furthermore, while AMIGO2 expression correlates with NEPC progression and ADT resistance, its functional role remains uncharacterized. Future studies integrating experimental models are needed to dissect whether AMIGO2 is a driver or passenger of therapeutic resistance.

In conclusion, this study elucidates the role of NEP100 and the potential target AMIGO2, thereby providing valuable insights for the personalized treatment of NEPC patients and offering promise for more effective treatment of this challenging disease.

Supplementary Information

Acknowledgements

We want to acknowledge the participants and investigators of all the datasets cited by the study.

Author contributions

Conceptualization, methodology, validation, writing—original, visualization, Junxiao Shen, Ruyi Jin, Luyuan Lu, Zujie Chen; conceptualization, methodology, Wei Guo, Shuwen Wang, Ziqiao Liu, Xuke Gong, Yiming Qi; supervision, project administration, Cheng Zhang; All authors have read and agreed to the published version of the manuscript.

Funding

This research received funding from the Zhejiang Provincial Natural Science Foundation of China (Grant No. LZ22H160008).

Data availability

The public datasets analysed during the current study are available from the TCGA Research Network portal, GEO (Including GSE141445, GSE221603, GSE137829, GSE143791, GSE264573, GSE210358, GSE262624, GSE206962, GSE266955, GSE237602, GSE230282, GSE21034, GSE70770, GSE54460, GSE197780, GSE32571, GSE134051, GSE199596, GSE211856, GSE79021, GSE84042, GSE135879, GSE220097, GSE244024, GSE236441, GSE229805, GSE211638, GSE150807, GSE138939, GSE119598), cBioPortal, or the corresponding datasets. Details can be found in the methods and supplementary files. All data generated during the current study are included in the manuscript and supplementary files. Furthermore, all the analysis scripts are available by contacting the corresponding author. Other data that support the findings of this study are available from the corresponding author upon request.

Declarations

All the authors have approved the manuscript and agree with its submission.

Competing interests

The authors declare no competing interests.

Ethics approval

Patient specimens were collected in accordance with ethical standards set forth in the Declaration of Helsinki, and written informed consent was obtained from all participants. The study was approved by the Ethics Committee of the Fourth Affiliated Hospital of Zhejiang University Medical College, Yiwu, Zhejiang, China.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-025-96683-3.

References

  • 1.Bray, F. et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin.68(6), 394–424 (2018). [DOI] [PubMed] [Google Scholar]
  • 2.Aurilio, G. et al. Androgen receptor signaling pathway in prostate cancer: From genetics to clinical applications. Cells10.3390/cells9122653 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Desai, K., McManus, J. M. & Sharifi, N. Hormonal therapy for prostate cancer. Endocr. Rev.42(3), 354–373 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Liu, S., Alabi, B. R., Yin, Q. & Stoyanova, T. Molecular mechanisms underlying the development of neuroendocrine prostate cancer. Semin. Cancer Biol.86(Pt 3), 57–68 (2022). [DOI] [PubMed] [Google Scholar]
  • 5.Fei, X. et al. Promising therapy for neuroendocrine prostate cancer: Current status and future directions. Ther. Adv. Med. Oncol.16, 17588359241269676 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Wang, Y. et al. Molecular events in neuroendocrine prostate cancer development. Nat. Rev. Urol.18(10), 581–596 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Wang, J. et al. Advances and applications in single-cell and spatial genomics. Sci. China Life Sci.10.1007/s11427-024-2770-x (2024). [DOI] [PubMed] [Google Scholar]
  • 8.Shehab, M. et al. Machine learning in medical applications: A review of state-of-the-art methods. Comput. Biol. Med.145, 105458 (2022). [DOI] [PubMed] [Google Scholar]
  • 9.Tuong, Z. K. et al. Resolving the immune landscape of human prostate at a single-cell level in health and cancer. Cell Rep.37(12), 110132 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Chen, S. et al. Single-cell analysis reveals transcriptomic remodelings in distinct cell types that contribute to human prostate cancer progression. Nat. Cell Biol.23(1), 87–98 (2021). [DOI] [PubMed] [Google Scholar]
  • 11.Dong, B. et al. Single-cell analysis supports a luminal-neuroendocrine transdifferentiation in human prostate cancer. Commun. Biol.3(1), 778 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Kfoury, Y. et al. Human prostate cancer bone metastases have an actionable immunosuppressive microenvironment. Cancer Cell39(11), 1464-1478.e1468 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Wang, Z. et al. Single-cell transcriptional regulation and genetic evolution of neuroendocrine prostate cancer. iScience25(7), 104576 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Zaidi, S. et al. Single-cell analysis of treatment-resistant prostate cancer: Implications of cell state changes for cell surface antigen-targeted therapies. Proc. Natl. Acad. Sci. U S A121(28), e2322203121 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Chan, J. M. et al. Lineage plasticity in prostate cancer depends on JAK/STAT inflammatory signaling. Science377(6611), 1180–1191 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Cejas, P. et al. Subtype heterogeneity and epigenetic convergence in neuroendocrine prostate cancer. Nat. Commun.12(1), 5775 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Luecken, M. D. & Theis, F. J. Current best practices in single-cell RNA-seq analysis: A tutorial. Mol. Syst. Biol.15(6), e8746 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Xue, R. et al. Liver tumour immune microenvironment subtypes and neutrophil heterogeneity. Nature612(7938), 141–147 (2022). [DOI] [PubMed] [Google Scholar]
  • 19.He, M. X. et al. Transcriptional mediators of treatment resistance in lethal prostate cancer. Nat. Med.27(3), 426–433 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Chen, C. C. et al. Temporal evolution reveals bifurcated lineages in aggressive neuroendocrine small cell prostate cancer trans-differentiation. Cancer Cell41(12), 2066-2082.e2069 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Beshiri, M. L. et al. Stem cell dynamics and cellular heterogeneity across lineage subtypes of castrate-resistant prostate cancer. Stem Cells42(6), 526–539 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Watanabe, R. et al. Spatial gene expression analysis reveals characteristic gene expression patterns of de novo neuroendocrine prostate cancer coexisting with androgen receptor pathway prostate cancer. Int. J. Mol. Sci.10.3390/ijms24108955 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Hafemeister, C. & Satija, R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol.20(1), 296 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Abida, W. et al. Genomic correlates of clinical outcome in advanced prostate cancer. Proc. Natl. Acad. Sci. U S A116(23), 11428–11436 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Beltran, H. et al. Divergent clonal evolution of castration-resistant neuroendocrine prostate cancer. Nat. Med.22(3), 298–305 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Anselmino, N. et al. Integrative molecular analyses of the MD anderson prostate cancer patient-derived xenograft (MDA PCa PDX) series. Clin. Cancer Res.30(10), 2272–2285 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Grasso, C. S. et al. The mutational landscape of lethal castration-resistant prostate cancer. Nature487(7406), 239–243 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.The Molecular Taxonomy of Primary Prostate Cancer. Cell 2015, 163(4):1011-1025. [DOI] [PMC free article] [PubMed]
  • 29.Quigley, D. A. et al. Genomic hallmarks and structural variation in metastatic prostate cancer. Cell174(3), 758-769.e759 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Bolis, M. et al. Dynamic prostate cancer transcriptome analysis delineates the trajectory to disease progression. Nat. Commun.12(1), 7033 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Li, J. et al. A genomic and epigenomic atlas of prostate cancer in Asian populations. Nature580(7801), 93–99 (2020). [DOI] [PubMed] [Google Scholar]
  • 32.Labrecque, M. P. et al. Molecular profiling stratifies diverse phenotypes of treatment-refractory metastatic castration-resistant prostate cancer. J. Clin. Invest.129(10), 4492–4505 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Taylor, B. S. et al. Integrative genomic profiling of human prostate cancer. Cancer Cell18(1), 11–22 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Ross-Adams, H. et al. Integration of copy number and transcriptomics provides risk stratification in prostate cancer: A discovery and validation cohort study. EBioMedicine2(9), 1133–1144 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Long, Q. et al. Global transcriptome analysis of formalin-fixed prostate cancer specimens identifies biomarkers of disease recurrence. Cancer Res.74(12), 3228–3237 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Linder, S. et al. Drug-induced epigenomic plasticity reprograms circadian rhythm regulation to drive prostate cancer toward androgen independence. Cancer Discov.12(9), 2074–2097 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Kuner, R. et al. The maternal embryonic leucine zipper kinase (MELK) is upregulated in high-grade prostate cancer. J. Mol. Med. (Berl.)91(2), 237–248 (2013). [DOI] [PubMed] [Google Scholar]
  • 38.Friedrich, M. et al. The role of lncRNAs TAPIR-1 and -2 as diagnostic markers and potential therapeutic targets in prostate cancer. Cancers (Basel)10.3390/cancers12051122 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Coleman, I. M. et al. Therapeutic implications for intrinsic phenotype classification of metastatic castration-resistant prostate cancer. Clin. Cancer Res.28(14), 3127–3140 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Sinnott, J. A. et al. Prognostic utility of a new mrna expression signature of gleason score. Clin. Cancer Res.23(1), 81–87 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Fraser, M. et al. Genomic hallmarks of localized, nonindolent prostate cancer. Nature541(7637), 359–364 (2017). [DOI] [PubMed] [Google Scholar]
  • 42.Nyquist, M. D. et al. Molecular determinants of response to high-dose androgen therapy in prostate cancer. JCI Insight10.1172/jci.insight.129715 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Suominen, M. I. et al. enhanced antitumor efficacy of radium-223 and enzalutamide in the intratibial LNCaP prostate cancer model. Int. J. Mol. Sci.10.3390/ijms24032189 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Qian, C. et al. ONECUT2 Activates Diverse Resistance Drivers of Androgen Receptor-Independent Heterogeneity in Prostate Cancer. bioRxiv2(3), 202 (2023). [Google Scholar]
  • 45.Ruoff, R. et al. MED19 encodes two unique protein isoforms that confer prostate cancer growth under low androgen through distinct gene expression programs. Sci. Rep.13(1), 18227 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.White, R. E. 3rd. et al. Saracatinib synergizes with enzalutamide to downregulate AR activity in CRPC. Front. Oncol.13, 1210487 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Labaf, M. et al. Increased AR expression in castration-resistant prostate cancer rapidly induces AR signaling reprogramming with the collaboration of EZH2. Front. Oncol.12, 1021845 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Verma, S., Shankar, E., Chan, E. R. & Gupta, S. Metabolic reprogramming and predominance of solute carrier genes during acquired enzalutamide resistance in prostate cancer. Cells10.3390/cells9122535 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Chatterjee, P. et al. Supraphysiological androgens suppress prostate cancer growth through androgen receptor-mediated DNA damage. J. Clin. Invest.129(10), 4245–4260 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Vashchenko, N. & Abrahamsson, P. A. Neuroendocrine differentiation in prostate cancer: Implications for new treatment modalities. Eur. Urol.47(2), 147–155 (2005). [DOI] [PubMed] [Google Scholar]
  • 51.Lin, D. et al. High fidelity patient-derived xenografts for accelerating prostate cancer discovery and drug development. Cancer Res.74(4), 1272–1283 (2014). [DOI] [PubMed] [Google Scholar]
  • 52.Zhang, X. et al. SRRM4 expression and the loss of REST activity may promote the emergence of the neuroendocrine phenotype in castration-resistant prostate cancer. Clin. Cancer Res.21(20), 4698–4708 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Dasari, A. et al. Trends in the incidence, prevalence, and survival outcomes in patients with neuroendocrine tumors in the United States. JAMA Oncol.3(10), 1335–1342 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Tsai, H. K. et al. Gene expression signatures of neuroendocrine prostate cancer and primary small cell prostatic carcinoma. BMC Cancer17(1), 759 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Bluemn, E. G. et al. Androgen receptor pathway-independent prostate cancer is sustained through FGF signaling. Cancer Cell32(4), 474-489.e476 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Henry, G. H. et al. A cellular anatomy of the normal adult human prostate and prostatic urethra. Cell Rep.25(12), 3530-3542.e3535 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Aggarwal, R. et al. Clinical and genomic characterization of treatment-emergent small-cell neuroendocrine prostate cancer: A multi-institutional prospective study. J. Clin. Oncol.36(24), 2492–2503 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Alshalalfa, M. et al. Characterization of transcriptomic signature of primary prostate cancer analogous to prostatic small cell neuroendocrine carcinoma. Int. J. Cancer145(12), 3453–3461 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Cheng, S. & Yu, X. Bioinformatics analyses of publicly available NEPCa datasets. Am. J. Clin. Exp. Urol.7(5), 327–340 (2019). [PMC free article] [PubMed] [Google Scholar]
  • 60.Ostano, P. et al. Gene expression signature predictive of neuroendocrine transformation in prostate adenocarcinoma. Int. J. Mol. Sci.10.3390/ijms21031078 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.De Sarkar, N. et al. Nucleosome patterns in circulating tumor DNA reveal transcriptional regulation of advanced prostate cancer phenotypes. Cancer Discov.13(3), 632–653 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Zhang, T. et al. Integrated analysis of single-cell and bulk transcriptomics develops a robust neuroendocrine cell-intrinsic signature to predict prostate cancer progression. Theranostics14(3), 1065–1080 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Liu, S. et al. CDHu40: A novel marker gene set of neuroendocrine prostate cancer. Brief. Bioinform.10.1093/bib/bbae471 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Kim, M. C. et al. Method of moments framework for differential expression analysis of single-cell RNA sequencing data. Cell187(22), 6393-6410.e6316 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Langfelder, P. & Horvath, S. WGCNA: An R package for weighted correlation network analysis. BMC Bioinformatics9, 559 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Hao, Y. et al. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat. Biotechnol.42(2), 293–304 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Zeng, D. et al. Enhancing immuno-oncology investigations through multidimensional decoding of tumor microenvironment with IOBR. Cell Rep. Methods10.1016/j.crmeth.2024.100910 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Liu, H. et al. Mime: A flexible machine-learning framework to construct and visualize models for clinical characteristics prediction and feature selection. Comput. Struct. Biotechnol. J.23, 2798–2810 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Reddy, J. et al. Predicting master transcription factors from pan-cancer expression data. Sci. Adv.10.1126/sciadv.abf6123 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Liu, C. J. et al. GSCA: An integrated platform for gene set cancer analysis at genomic, pharmacogenomic and immunogenomic levels. Brief. Bioinform.10.1093/bib/bbac558 (2023). [DOI] [PubMed] [Google Scholar]
  • 71.Eberhardt, J., Santos-Martins, D., Tillack, A. F. & Forli, S. AutoDock Vina 1.2.0: New docking methods, expanded force field, and python bindings. J. Chem. Inf. Model61(8), 3891–3898 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Varghese, F., Bukhari, A. B., Malhotra, R. & De, A. IHC Profiler: An open source plugin for the quantitative evaluation and automated scoring of immunohistochemistry images of human tissue samples. PLoS ONE9(5), e96801 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Alumkal, J. J. et al. Transcriptional profiling identifies an androgen receptor activity-low, stemness program associated with enzalutamide resistance. Proc. Natl. Acad. Sci. U S A117(22), 12315–12323 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Kim, D. H. et al. BET bromodomain inhibition blocks an AR-Repressed, E2F1-Activated treatment-emergent neuroendocrine prostate cancer lineage plasticity program. Clin. Cancer Res.27(17), 4923–4936 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Gao, Z. J. et al. Single-cell analyses reveal evolution mimicry during the specification of breast cancer subtype. Theranostics14(8), 3104–3126 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Yang, J. et al. Single-cell transcriptomic landscape deciphers olfactory neuroblastoma subtypes and intratumoral heterogeneity. Nat. Cancer10.1038/s43018-024-00855-5 (2024). [DOI] [PubMed] [Google Scholar]
  • 77.Guo, D. Z. et al. Single-cell tumor heterogeneity landscape of hepatocellular carcinoma: Unraveling the pro-metastatic subtype and its interaction loop with fibroblasts. Mol. Cancer23(1), 157 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Wang, Z. et al. Molecular subtypes of neuroendocrine carcinomas: A cross-tissue classification framework based on five transcriptional regulators. Cancer Cell42(6), 1106-1125.e1108 (2024). [DOI] [PubMed] [Google Scholar]
  • 79.Baine, M. K. et al. SCLC subtypes defined by ASCL1, NEUROD1, POU2F3, and YAP1: A comprehensive immunohistochemical and histopathologic characterization. J. Thorac. Oncol.15(12), 1823–1835 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Guo, H. et al. ONECUT2 is a driver of neuroendocrine prostate cancer. Nat. Commun.10(1), 278 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Kawai, H., Matsuoka, R., Ito, T. & Matsubara, D. Molecular subtypes of high-grade neuroendocrine carcinoma (HGNEC): What is YAP1-Positive HGNEC?. Front. Biosci. (Landmark Ed)27(3), 108 (2022). [DOI] [PubMed] [Google Scholar]
  • 82.Liu, Y. et al. In vivo selection of highly metastatic human ovarian cancer sublines reveals role for AMIGO2 in intra-peritoneal metastatic regulation. Cancer Lett.503, 163–173 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Yong, Y. et al. AMIGO2 characterizes cancer-associated fibroblasts in metastatic colon cancer and induces the release of paracrine active tumorigenic secretomes. J. Pathol.265(1), 14–25 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

The public datasets analysed during the current study are available from the TCGA Research Network portal, GEO (Including GSE141445, GSE221603, GSE137829, GSE143791, GSE264573, GSE210358, GSE262624, GSE206962, GSE266955, GSE237602, GSE230282, GSE21034, GSE70770, GSE54460, GSE197780, GSE32571, GSE134051, GSE199596, GSE211856, GSE79021, GSE84042, GSE135879, GSE220097, GSE244024, GSE236441, GSE229805, GSE211638, GSE150807, GSE138939, GSE119598), cBioPortal, or the corresponding datasets. Details can be found in the methods and supplementary files. All data generated during the current study are included in the manuscript and supplementary files. Furthermore, all the analysis scripts are available by contacting the corresponding author. Other data that support the findings of this study are available from the corresponding author upon request.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES