Skip to main content
Diagnostics logoLink to Diagnostics
. 2025 Dec 26;16(1):85. doi: 10.3390/diagnostics16010085

Biomarker-Based Precision Prediction of Immunotherapy Response in Hepatocellular Carcinoma

Hsu-Wen Chao 1,2,3, Yi-Mei Joy Lin 4, Chen-Shiou Wu 5,*
Editors: Tudor Drugan, Daniel Leucuta
PMCID: PMC12786039  PMID: 41515582

Abstract

Background: Hepatocellular carcinoma (HCC) remains a major global health challenge with limited treatment options for advanced disease. Although immune checkpoint inhibitors (ICIs) have shown clinical benefits, response rates remain low, emphasizing the need for reliable biomarkers to guide patient selection. Given the critical role of metabolic reprogramming in immune modulation, this study aimed to identify a metabolic gene signature predictive of immunotherapy response in HCC. Methods: Three independent transcriptomic datasets (GSE279750, GSE215011, and GSE235863) comprising 35 ICI-treated HCC samples were integrated after quality control and ComBat batch correction. Differentially expressed genes were identified using DESeq2 and limma, followed by integration of the meta-analysis results. Machine learning models, including LASSO regression and random forest algorithms, were applied for feature selection, and a logistic regression model was developed for predictive scoring. Results: A five-gene metabolic signature (PLPPR1, CNTN3, HOXA10, HAGLR, and ENPP3) demonstrated good discriminative ability between responders and non-responders, with consistent performance observed across internal validation analyses. Functional enrichment analysis revealed significant involvement of metabolic pathways, with HOXA10 linked to immune evasion and CNTN3 associated with immune activation. Conclusions: This five-gene signature represents a biologically interpretable biomarker panel with potential utility for immunotherapy response stratification in HCC. The integrative analytical framework provides preliminary evidence supporting its value, warranting further validation in larger, independent clinical cohorts before clinical translation.

Keywords: hepatocellular carcinoma, immunotherapy response prediction, metabolic gene signature, biomarker discovery, machine learning

1. Introduction

Hepatocellular carcinoma (HCC) represents a major global health challenge with limited therapeutic options for advanced disease [1,2]. While immune checkpoint inhibitors have emerged as promising treatments, clinical outcomes remain highly variable, with response rates of only 15–30% in HCC patients [3,4]. This therapeutic heterogeneity underscores the critical need for robust predictive biomarkers to guide treatment selection. Current biomarker approaches, including PD-L1 expression and tumor mutational burden, demonstrate insufficient predictive accuracy in HCC due to its unique molecular complexity and diverse etiology [5,6,7]. The intricate tumor microenvironment, characterized by chronic inflammation and metabolic dysfunction, creates distinct challenges for immunotherapy efficacy prediction [2,8].

Emerging evidence highlights the pivotal role of metabolic reprogramming in determining immunotherapy response [9,10]. Specifically, dysregulated pathways such as glycolysis, fatty acid oxidation, and amino acid metabolism have been implicated in modulating T-cell exhaustion and immune evasion in HCC [11,12]. These metabolic dependencies represent an underexplored dimension for biomarker development. Advanced computational approaches, particularly machine learning techniques combining Least Absolute Shrinkage and Selection Operator (LASSO) regression and random forest algorithms, offer powerful methodologies for identifying clinically relevant gene signatures from high-dimensional transcriptomic data [13,14]. Such approaches can effectively handle complex genomic datasets while minimizing overfitting risks. This study integrates transcriptomic profiles from three independent HCC cohorts receiving immune checkpoint inhibitor therapy. To address this unmet need, we aimed to identify robust metabolic gene signatures predictive of immunotherapy response in HCC by integrating multi-cohort transcriptomic data and applying advanced machine learning models. Through systematic application of meta-analysis, LASSO regularization, and random forest feature selection, we identified a five-gene metabolic signature (PLPPR1, CNTN3, HOXA10, HAGLR, ENPP3) that accurately distinguishes treatment responders from non-responders. This integrative approach may facilitate patient stratification and guide precision immunotherapy in clinical practice.

2. Materials and Methods

2.1. Data Acquisition and Preprocessing

The three Gene Expression Omnibus (GEO) datasets [15] were selected to represent independent and clinically relevant cohorts of HCC patients treated with immune checkpoint inhibitors. GSE279750 (n = 10) includes patients receiving first-line anti–PD-L1–based combination immunotherapy, with surgical tumor specimens collected after more than three months of treatment and classified as responders or non-responders according to modified RECIST criteria [5]. GSE215011 (n = 10) comprises tumor RNA-seq data from patients treated with nivolumab monotherapy (anti–PD-1), enabling comparison of transcriptional profiles between responders and non-responders [16]. GSE235863 (n = 15) represents HBV-positive HCC patients receiving anti–PD-1 plus lenvatinib combination therapy (pembrolizumab or sintilimab) and includes paired samples collected before or after treatment initiation; responders were defined as patients achieving complete or partial response (CR/PR), whereas non-responders were defined as those with stable or progressive disease (SD/PD) [17]. All datasets were generated using next-generation sequencing–based transcriptomic profiling and provided clinical response annotations. Across the three datasets, a total of 35 patients were included, comprising 22 responders and 13 non-responders based on the original clinical annotations. For consistency, only tumor-derived gene expression data with available response information were included in the integrative analysis. Raw count data underwent quality control, excluding samples with low sequencing depth (<1 million reads) or excessive missing values (>20%). Gene expression values were log2-transformed after adding a pseudocount of 1. Principal component analysis (PCA) was performed using the prcomp() function in R (https://www.R-project.org/) with center = TRUE and scale = TRUE parameters to visualize sample distribution and assess separation between response groups.

2.2. Batch Effect Correction and Data Integration

The ComBat algorithm from the sva package (version 3.50.0) was applied for batch effect correction. The ComBat() function implemented parametric empirical Bayes adjustments, treating dataset origin as the batch variable and treatment response as the protected biological variable. This approach removed technical variation while maintaining biological differences between groups. Post-correction quality was confirmed through integrated principal component analysis. The ComBat-corrected expression matrix served as the foundation for subsequent analyses.

2.3. Differential Expression Analysis

Individual dataset analyses were conducted using DESeq2 (version 1.42.0) with default parameters. Genes with absolute log2 fold-change greater than 0.5 and false discovery rate (FDR) less than 0.1 were considered differentially expressed [18]. The top differentially expressed genes from each dataset were visualized using pheatmap (version 1.0.12) with hierarchical clustering and Z-score normalization. Integrated analysis on ComBat-corrected data used limma (version 3.58.1) to identify high-confidence differentially expressed genes meeting stringent thresholds. Volcano plots were generated using EnhancedVolcano (version 1.20.0).

2.4. Meta-Analysis Integration

MetaVolcano (version 1.16.0) combined statistical evidence from three independent analyses through permutation testing (n_permutations = 10,000). Genes achieving significance in at least two of three datasets were considered high-confidence candidates. Correlation analysis used Pearson coefficients with hierarchical clustering to identify co-expression modules. Venn diagram analysis [19] was performed to identify dataset-specific and shared differentially expressed genes.

2.5. Machine Learning-Based Feature Selection

LASSO logistic regression [20] was performed using the glmnet package (version 4.1-8) with an L1 penalty (α = 1). The regularization parameter λ was selected by 10-fold cross-validation using cv.glmnet, and the value minimizing cross-validated deviance (λmin) was applied. Genes with non-zero coefficients were retained as candidate features. Random forest analysis was conducted using the randomForest package (version 4.7-1.1) [21] as an independent feature-ranking approach, with default parameters including 500 trees (ntree = 500) and internally optimized mtry values. Feature importance was quantified using the mean decrease in Gini index. LASSO-selected genes were further ranked by random forest importance, and genes with the highest combined scores were selected for signature development. The combined score was defined as the product of the absolute LASSO coefficient and the random forest importance (mean decrease in Gini). Model performance was evaluated using ROC analysis with predefined train/test splits and repeated stratified cross-validation. To minimize overfitting and information leakage, feature selection and model coefficients were fixed before validation, and only predefined gene signatures were evaluated during resampling.

2.6. Signature Score Development and Validation

A logistic regression model was constructed using the glm() function with family = binomial(link = “logit”). Maximum likelihood estimation optimized regression coefficients to maximize discriminative ability between responders and non-responders. The multi-gene signature score was calculated as the weighted sum of normalized expression values using regression coefficients (βi) derived from the logistic regression model:

Signature Score = Σ βi × Xi

where βi represents the regression coefficient for gene i, and Xi denotes its normalized expression level. Model performance was evaluated through receiver operating characteristic (ROC) curve analysis using pROC (version 1.18.5) [22], calculating area under the curve (AUC) values for the signature score and individual genes. Bootstrap validation with 1000 iterations assessed model stability and generated 95% confidence intervals [23]. To further mitigate potential bias from a single data split, additional internal validation was conducted using repeated stratified cross-validation (5-fold cross-validation repeated 100 times). Importantly, feature selection and coefficient estimation were fixed prior to cross-validation, and only the predefined Signature Score was evaluated during resampling, thereby minimizing the risk of information leakage. Differences in Signature Score distributions between responders and non-responders were assessed using the Wilcoxon rank-sum test.

2.7. Pathway Enrichment Analysis

KEGG pathway enrichment analysis was performed using clusterProfiler (version 4.10.0) [24]. The enrichKEGG() function identified significantly enriched pathways (adjusted p-value less than 0.05) among the signature genes.

2.8. Statistical Analysis

All analyses were performed in R (version 4.5.0) (https://www.R-project.org/). Continuous variables were compared using Wilcoxon rank-sum test. Multiple testing correction applied the Benjamini–Hochberg method [25] to control FDR. All tests were two-sided with p-value less than 0.05 considered statistically significant. Data visualization used ggplot2 (version 3.5.0) [26].

3. Results

3.1. Data Integration Reveals Consistent Gene Expression Patterns Across Cohorts

To evaluate gene expression patterns and identify predictive biomarkers associated with immunotherapy response, we integrated gene expression data from three independent GEO cohorts comprising 35 patients with HCC treated with immune checkpoint inhibitors (GSE279750, n = 10; GSE215011, n = 10; GSE235863, n = 15). Across the integrated cohort, 22 patients were classified as responders and 13 as non-responders based on the original clinical annotations. These cohorts encompassed distinct but clinically relevant immunotherapy settings, including anti–PD-L1–based and anti–PD-1–based regimens, administered as monotherapy or in combination with lenvatinib, with tumor samples collected at different clinical time points. The integration workflow encompassed data acquisition, quality control, batch effect correction, differential expression analysis, meta-analysis integration, machine learning-based feature selection, signature score development, and biological pathway enrichment analysis (Figure 1). Principal component analysis (PCA) was performed on log2-transformed expression data to assess sample distribution and batch effects. Before correction, samples from the three datasets exhibited distinct clustering patterns, indicating significant batch effects (Figure 2A–C). PCA plots showed clear separation between Responder (blue) and Non-Responder (red) samples within each dataset. After applying ComBat batch effect correction, the integrated PCA plot demonstrated effective removal of technical variation while maintaining biological signal integrity (Figure 2D). The first two principal components explained 18.7% and 10.5% of total variance, respectively. Samples from different datasets (represented by circles, triangles, and squares) exhibited homogeneous distribution, presenting successful data integration. Importantly, batch correction preserved the biological separation between response groups, verifying successful multi-cohort harmonization while retaining signals essential for robust biomarker identification.

Figure 1.

Figure 1

Integrated workflow for biomarker discovery and validation in immunotherapy response prediction. This study integrated three GEO datasets (GSE279750, GSE215011, GSE235863) to identify predictive biomarkers for immunotherapy response. The workflow comprised: (1) data acquisition and quality control; (2) batch effect correction and normalization; (3) differential expression analysis; (4) meta-analysis integration; (5) feature selection using LASSO and Random Forest; (6) signature score development and ROC validation; (7) model stability assessment via cross-validation, bootstrap, and external validation; (8) KEGG pathway enrichment analysis.

Figure 2.

Figure 2

Principal component analysis of three datasets with batch effect correction. (AC) PCA plots of log2-transformed expression data showing distribution of Responder (blue) and Non-Responder (red) samples across GSE279750 (n = 10), GSE215011 (n = 10), and GSE235863 (n = 15). Dashed ellipses represent 68% confidence intervals. (D) Integrated PCA of all 35 samples after ComBat batch correction, with different shapes indicating dataset origin (circle/triangle/square). Batch effects were effectively removed while maintaining clear separation between response groups (PC1: 18.7%, PC2: 10.5%).

3.2. Differential Expression Analysis Identifies Response-Related Genes with Consistent Expression Patterns

We performed differential gene expression analysis independently on each dataset using DESeq2, identifying the top 20 differentially expressed genes (DEGs) in each cohort (Figure 3A–C). Heatmap visualizations revealed clear and consistent expression patterns between Responder and Non-Responder groups across all three datasets. In GSE279750, hierarchical clustering successfully separated samples by response status (Figure 3A); GSE215011 demonstrated similarly robust differential expression patterns (Figure 3B); while the largest cohort, GSE235863 (n = 15), further validated the consistency of these expression profiles (Figure 3C). Concordant results across three independent datasets strengthen DEG credibility, demonstrating biologically meaningful and reproducible expression patterns. Integrated analysis on ComBat-corrected data using limma identified 146 high-confidence DEGs (33 upregulated and 113 downregulated) meeting stringent thresholds (|log2FC| > 0.5, FDR < 0.1), as shown in the volcano plot (Figure 3D). These DEGs with consistent cross-cohort patterns provide a robust foundation for subsequent meta-analysis and machine learning-based biomarker identification.

Figure 3.

Figure 3

Differential gene expression analysis reveals distinct molecular signatures between treatment responders and non-responders. Comprehensive visualization of differentially expressed genes (DEGs) across three independent datasets and integrated analysis. (A) GSE279750 (n = 10) heatmap displays the top 20 DEGs with clear separation between Responder (green annotation) and Non-Responder (blue annotation) groups. Green and purple colors represent relative gene expression levels (Z-score normalized), with hierarchical clustering successfully distinguishing samples by treatment response status. (B) GSE215011 (n = 10) validates consistent differential expression patterns using its top 20 DEGs, where red indicates upregulation and blue indicates downregulation in responders versus non-responders. (C) GSE235863 (n = 15), the largest individual cohort, further substantiates reproducibility with its top 20 DEGs displayed in brown and teal colors. (D) Volcano plot of integrated analysis from ComBat-corrected data (n = 35 total) identifies 146 high-confidence DEGs (33 upregulated in red, 113 downregulated in blue) meeting stringent criteria (|log2FC| > 0.5, FDR < 0.1, indicated by dashed lines). Gray dots represent non-significant genes (NS).

3.3. Meta-Analysis Reveals Consistent High-Confidence Core Genes Across Datasets

To identify genes with robust cross-dataset reproducibility, we employed MetaVolcano meta-analysis integrating statistical evidence from three independent cohorts. The MetaVolcano plot (Figure 4A) displays average log2 fold-change versus cross-dataset significance, identifying 15 genes achieving statistical significance in at least two of three datasets (red dots). Cross-dataset log2FC heatmaps (Figure 4B) of these 15 significant genes validated directional consistency across GSE215011, GSE235863, and GSE279750, with most genes maintaining uniform expression patterns. Venn diagram analysis (Figure 4C) revealed dataset-specific contributions across the three cohorts. GSE279750 identified the largest total number of DEGs (n = 225), with 212 unique DEGs specific to this dataset. GSE235863 identified 162 DEGs, and GSE215011 identified 37 DEGs. Notably, only one core gene (LINC01554) achieved statistical significance across all three datasets, highlighting the importance of meta-analysis for identifying reproducible markers. The correlation heatmap (Figure 4D) revealed co-expression patterns among the 15 significant genes, with hierarchical clustering identifying distinct gene modules suggesting coordinated regulatory mechanisms. Collectively, these results establish a robust set of cross-validated candidate genes, providing a reliable foundation for subsequent machine learning-based feature selection and predictive model development.

Figure 4.

Figure 4

Meta-analysis identifies robust cross-dataset differentially expressed genes and their correlation patterns. Comprehensive meta-analysis integrating differential gene expression results from three independent datasets using MetaVolcano approach. (A) MetaVolcano plot displays genes based on average log2 fold change (x-axis) and cross-dataset significance (y-axis). Red dots (n = 15) indicate genes achieving statistical significance in at least two of three datasets, while gray dots represent non-significant genes. (B) Heatmap visualization of these 15 significant genes showing log2FC values across GSE215011, GSE235863, and GSE279750. Red colors indicate upregulation and blue colors indicate downregulation, demonstrating concordant expression patterns across all cohorts. (C) Venn diagram illustrating the overlap of significant genes among datasets. GSE279750 identified the largest number of unique DEGs (n = 225), while one core gene (LINC01554) was consistently significant across all three datasets. (D) Correlation heatmap of the 15 significant genes reveals co-expression patterns, with red indicating positive correlations and blue indicating negative correlations. Hierarchical clustering identifies distinct gene modules with coordinated expression changes, suggesting shared regulatory mechanisms underlying treatment response.

3.4. Machine Learning-Based Feature Selection Identifies Five Core Predictive Genes

Importantly, genes identified by cross-dataset meta-analysis were used as candidates for feature selection; however, final model features were selected based on predictive contribution in multivariate machine learning models rather than overlap frequency alone. To identify the most informative features for predicting immunotherapy response, we employed two complementary machine learning approaches on the integrated expression matrix. LASSO regression with L1 penalty performs automatic feature selection by shrinking coefficients of less important variables toward zero (Figure 5A). Following cross-validation optimization, the coefficient plot displays the magnitude and direction of coefficients for 14 candidate genes selected based on predictive performance. Blue bars represent positive coefficients, indicating genes positively associated with immunotherapy response, while negative coefficients (PLPPR1, HAGLR, HOXA10) are associated with non-responders. Random forest analysis provides an independent assessment of feature importance through ensemble learning (Figure 5B). Green bars show the mean Gini decrease for each gene, with higher values indicating greater importance in classification decisions. ENPP3, PLPPR1, and CHI3L1 exhibited the highest random forest importance scores. To integrate insights from both algorithms, we calculated combined scores by multiplying the absolute LASSO coefficients by random forest importance (Figure 5C). Specifically, the combined score for each gene was defined as: Combined score = |βLASSO| × RF importance (mean decrease in Gini). Combined score ranking identified the top five genes: PLPPR1 (1.55), CNTN3 (1.51), HOXA10 (1.36), HAGLR (1.16), and ENPP3 (1.11), which were selected for further validation. Box plot analysis demonstrated significant expression differences between responders (blue) and non-responders (red) for all five genes (p < 0.05) (Figure 5D). To evaluate the predictive performance, we constructed a logistic regression model based on the five-gene signature and first assessed its discriminative ability using a predefined training and testing split. ROC analysis demonstrated strong discriminative performance in the initial training/testing evaluation (Figure S1A). Given the limited sample size, we further evaluated the stability of this result using repeated stratified cross-validation with a fixed Signature Score. Across resampling iterations, the model showed highly consistent discriminative performance, yielding consistently high AUC values (Figure S1B,C). Collectively, these results establish a robust five-gene predictive signature derived from dual-algorithm feature selection, supported by consistent performance across independent testing and repeated resampling, underscoring the internal robustness of the integrated machine learning framework for predicting immunotherapy response within the current cohort.

Figure 5.

Figure 5

Machine learning-based feature selection identifies five core predictive genes for immunotherapy response. We employed two complementary machine learning approaches to identify the most informative features for predicting immunotherapy response. (A) LASSO (Least Absolute Shrinkage and Selection Operator) regression with L1 penalty performs automatic feature selection by shrinking coefficients of less important variables toward zero. The coefficient plot displays the magnitude and direction of coefficients for 14 candidate genes after cross-validation optimization. Blue bars represent positive coefficients, indicating genes positively associated with immunotherapy response, while negative coefficients (such as PLPPR1, HAGLR, HOXA10) are associated with Non-Responders. (B) Random forest analysis provides an independent assessment of feature importance through ensemble learning. Green bars indicate the mean decrease in Gini index for each gene across the forest, with higher values reflecting greater importance in classification. ENPP3, PLPPR1, and CHI3L1 exhibited the highest importance scores. (C) Combined score ranking integrates insights from both algorithms by multiplying the absolute LASSO coefficients by random forest importance. The top five genes (PLPPR1: 1.55, CNTN3: 1.51, HOXA10: 1.36, HAGLR: 1.16, ENPP3: 1.11) were selected for further validation. (D) Box plots demonstrate significant expression differences between Responders (blue) and Non-Responders (red) for all five genes (p < 0.05).

3.5. The Five-Gene Signature Score Demonstrates Consistent Predictive Performance with Internal Stability

Based on the machine learning–based feature selection results, the top five genes (PLPPR1, CNTN3, HOXA10, HAGLR, and ENPP3) were incorporated into a logistic regression model. Using the integrated transcriptomic dataset, regression coefficients were optimized through maximum likelihood estimation to model discrimination between responders and non-responders. The resulting five-gene signature score was defined as follows:

Signature Score = (−194.836 × PLPPR1) + (147.927 × CNTN3) − (326.820 × HOXA10) − (2.582 × HAGLR) − (41.937 × ENPP3)

In this model, the positive coefficient of CNTN3 indicates a positive association between its expression and immunotherapy response, whereas HOXA10 exhibited the largest negative coefficient, identifying it as a major contributor to non-response. PLPPR1 and ENPP3 also showed negative associations with response, while HAGLR contributed a smaller effect. The absolute coefficient magnitudes reflect the relative contributions of individual genes to the composite score. Receiver operating characteristic (ROC) analysis showed that the five-gene signature score achieved complete separation between responders and non-responders in the integrated cohort, yielding an observed AUC of 1.0 (Figure 6A), exceeding individual gene performance (AUC range: 0.675–0.841). Consistent with this, box plot visualization demonstrated distinct signature score distributions between groups without overlap (p = 1.35 × 10−9, Wilcoxon rank-sum test; Figure 6B). To assess internal stability, bootstrap validation with 1000 resampling iterations was performed (Figure S1D). The bootstrap AUC distribution showed minimal variation, with a mean AUC of 1.0 and a 95% confidence interval of [1.0, 1.0], indicating stable performance within the current dataset. Given the limited sample size, these results should be interpreted as evidence of internal consistency rather than definitive generalizability.

Figure 6.

Figure 6

Development, validation, and biological interpretation of the five-gene signature prediction model. (A) ROC comparison of single vs. combined models. ROC analysis demonstrated that the combined five-gene model (AUC = 1.0) outperformed individual genes (PLPPR1: 0.846; CNTN3: 0.724; HOXA10: 0.841; HAGLR: 0.773; ENPP3: 0.675). (B) Signature Score distribution shows complete separation between Responder and Non-Responder groups (p = 1.35 × 10−9, Wilcoxon test), with Responders exhibiting significantly higher scores than Non-Responders. (C) KEGG pathway enrichment analysis shows significant enrichment in metabolism-related pathways (pantothenate and CoA biosynthesis, nicotinate and nicotinamide metabolism, nucleotide metabolism) and transcriptional misregulation in cancer (p < 0.05), suggesting these core genes may influence immunotherapy response through metabolic reprogramming.

3.6. Pathway Enrichment Analysis Reveals Metabolic Reprogramming as a Key Mechanism

To elucidate biological mechanisms underlying the five-gene signature’s predictive power, we performed KEGG pathway enrichment analysis using clusterProfiler (Figure 6C). Overrepresentation analysis identified seven significantly enriched pathways (corrected p < 0.05, Benjamini–Hochberg correction). Core genes were significantly enriched in metabolism-related pathways, including pantothenate and coenzyme A biosynthesis, nicotinic acid and nicotinamide metabolism, starch and sucrose metabolism, pyrimidine metabolism, nucleotide metabolism, and purine metabolism. Additionally, enrichment in transcriptional dysregulation in cancer suggests involvement in oncogenic networks. These findings reveal that metabolic reprogramming represents a key determinant of immunotherapy response, with the five core genes coordinately regulating cellular energy metabolism, nucleotide biosynthesis, and redox homeostasis, thereby influencing tumor microenvironment and immune cell function.

4. Discussion

Our study developed and internally evaluated a five-gene signature (PLPPR1, CNTN3, HOXA10, HAGLR, and ENPP3) for predicting immunotherapy response through integrative analysis of three independent GEO transcriptomic datasets. Within the combined cohort, the signature showed high discriminative ability and internally stable performance across resampling-based validation analyses. However, given the limited sample size and retrospective design, these findings should be interpreted with caution. Further validation in larger, independent, and prospectively collected clinical cohorts is required to establish robustness, generalizability, and clinical applicability.

The identification of metabolic reprogramming as a key mechanism underlying immunotherapy response aligns with emerging evidence highlighting the critical role of metabolism in immune function and tumor microenvironment modulation [27,28]. Our pathway enrichment analysis revealed significant involvement of nucleotide metabolism, coenzyme biosynthesis, and energy metabolism pathways, consistent with findings that metabolic alterations profoundly impact immune cell infiltration and activation [29,30]. To further explore gene-level contributions within the signature, our integrated LASSO-random forest approach achieved near-perfect classification through robust cross-validation while minimizing overfitting and maximizing biological interpretability. Within this framework, HOXA10 emerged as the strongest negative predictor of immunotherapy response (coefficient: −326.820), consistent with its role as a master transcriptional regulator implicated in immune evasion and treatment resistance [31,32]. High HOXA10 levels may promote immunosuppressive microenvironments by regulating immune cell differentiation [33,34]. CNTN3, with positive coefficient (+147.927), encodes a cell adhesion molecule that may facilitate beneficial immune cell infiltration or enhance immune recognition mechanisms [35]. PLPPR1 encodes a membrane-associated phospholipid phosphatase–related protein involved in lipid signaling and membrane dynamics, processes increasingly recognized as critical for immune receptor signaling, immune cell activation, and metabolic fitness within the tumor microenvironment [36,37]. HAGLR (also known as HOXD-AS1), a long non-coding RNA, has been implicated in transcriptional regulation linked to tumor progression, epithelial–mesenchymal transition, and immune suppression, suggesting a role in shaping immunosuppressive tumor states that may limit immunotherapy efficacy [38,39]. ENPP3 has recently been identified as an extracellular cGAMP hydrolase that acts as an innate immune checkpoint by attenuating cGAMP–STING signaling [40]. Genetic or functional disruption of ENPP3 enhances antitumor immunity in a STING-dependent manner, implicating ENPP3 in immune regulation within the tumor microenvironment [40]. Collectively, these genes converge on metabolic regulation, transcriptional control, and immune modulation, key biological processes underlying immunotherapy responsiveness.

Moreover, our meta-analysis integrating three independent cohorts demonstrated biomarker reproducibility across distinct populations. Notably, only LINC01554 achieved statistical significance across all three datasets, underscoring the critical importance of meta-analytical approaches for the discovery of robust biomarkers. Previous studies have identified LINC01554 as a liver-enriched tumor suppressor lncRNA that regulates glucose metabolism by promoting PKM2 degradation and inhibiting the Akt/mTOR pathway, thereby suppressing HCC progression [41]. Its downregulation correlates with larger tumor size, advanced TNM stage, and poorer prognosis in HCC [41]. Moreover, recent integrative models have consistently recognized LINC01554 as a protective, metabolism-related lncRNA linked to favorable immunotherapy response [42]. These findings support LINC01554 as a metabolic–immune regulatory biomarker with translational potential in HCC. Pathway analysis revealed metabolic enrichment alongside regulatory insights from HOXA10 and CNTN3, highlighting the integrated contribution of metabolic and transcriptional regulation to immunotherapy outcomes in HCC and suggesting therapeutic targets for combination strategies where metabolic modulators could enhance immunotherapy efficacy.

Study Limitations and Future Directions

Several limitations should be acknowledged. First, the integrated cohort size was relatively small (n = 35), which may increase the risk of optimistic performance estimates despite internal validation. Although repeated stratified cross-validation with a fixed Signature Score was applied to mitigate overfitting and information leakage, the high AUC values observed should be interpreted as evidence of internal stability rather than generalizability. Therefore, external validation in larger, independent, and prospectively collected cohorts is required to confirm clinical utility. In addition, while bulk transcriptomic analysis enabled identification of a metabolically relevant signature, future single-cell or spatial transcriptomic studies may provide deeper mechanistic insights and enhance translational relevance for precision immunotherapy.

5. Conclusions

In conclusion, this study identified and internally evaluated a five-gene metabolic signature (PLPPR1, CNTN3, HOXA10, HAGLR, and ENPP3) associated with immunotherapy response in hepatocellular carcinoma through integrative analysis of multiple transcriptomic datasets. By combining meta-analysis with complementary machine learning approaches, we demonstrated internally consistent discrimination between responders and non-responders and provided biologically interpretable insights into metabolic and immune-related mechanisms. While these findings offer preliminary evidence supporting the potential utility of this signature for response stratification, external validation in larger, independent, and prospectively collected cohorts will be essential before clinical translation. Our study highlights the value of integrative multi-cohort and systems-level approaches for biomarker discovery in immuno-oncology.

Acknowledgments

The authors sincerely thank Jia-Ni Chen for her valuable assistance with the statistical analyses performed in R.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/diagnostics16010085/s1, Figure S1: Robust internal validation of the five gene Signature Score for predicting immunotherapy response.

Author Contributions

Conceptualization, H.-W.C., Y.-M.J.L. and C.-S.W.; Methodology, H.-W.C. and C.-S.W.; Formal analysis, C.-S.W.; Writing—original draft, C.-S.W.; Writing—review and editing, H.-W.C. and Y.-M.J.L.; Project administration, C.-S.W.; Funding acquisition, H.-W.C. and C.-S.W. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

Not applicable. This study used publicly available, de-identified data from the Gene Expression Omnibus (GEO) database and did not involve human or animal subjects.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available in GEO at https://www.ncbi.nlm.nih.gov/geo/, accessed on 1 August 2025, reference number [GSE215011, GSE235863, and GSE279750].

Conflicts of Interest

All the authors report no relevant conflicts of interest for this article.

Funding Statement

This study was supported by the Taichung Veterans General Hospital (TCVGH-1137304A; TCVGH-1143910B; TCVGH-1153904B) and the National Science and Technology Council of Taiwan (NSTC 114-2320-B-038-048).

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

References

  • 1.Rumgay H., Arnold M., Ferlay J., Lesi O., Cabasag C.J., Vignat J., Laversanne M., McGlynn K.A., Soerjomataram I. Global burden of primary liver cancer in 2020 and predictions to 2040. J. Hepatol. 2022;77:1598–1606. doi: 10.1016/j.jhep.2022.08.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Hwang S.Y., Danpanichkul P., Agopian V., Mehta N., Parikh N.D., Abou-Alfa G.K., Singal A.G., Yang J.D. Hepatocellular carcinoma: Updates on epidemiology, surveillance, diagnosis and treatment. Clin. Mol. Hepatol. 2025;31:S228–S254. doi: 10.3350/cmh.2024.0824. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Finn R.S., Qin S., Ikeda M., Galle P.R., Ducreux M., Kim T.Y., Kudo M., Breder V., Merle P., Kaseb A.O., et al. Atezolizumab plus Bevacizumab in Unresectable Hepatocellular Carcinoma. N. Engl. J. Med. 2020;382:1894–1905. doi: 10.1056/NEJMoa1915745. [DOI] [PubMed] [Google Scholar]
  • 4.Yau T., Kaseb A., Cheng A.L., Qin S., Zhu A.X., Chan S.L., Melkadze T., Sukeepaisarnjaroen W., Breder V., Verset G., et al. Cabozantinib plus atezolizumab versus sorafenib for advanced hepatocellular carcinoma (COSMIC-312): Final results of a randomised phase 3 study. Lancet Gastroenterol. Hepatol. 2024;9:310–322. doi: 10.1016/S2468-1253(23)00454-5. [DOI] [PubMed] [Google Scholar]
  • 5.Xie P., Guo L., Yu Q., Zhao Y., Yu M., Wang H., Wu M., Xu W., Xu M., Zhu X.D., et al. ACE2 Enhances Sensitivity to PD-L1 Blockade by Inhibiting Macrophage-Induced Immunosuppression and Angiogenesis. Cancer Res. 2025;85:299–313. doi: 10.1158/0008-5472.CAN-24-0954. [DOI] [PubMed] [Google Scholar]
  • 6.Xu W., Zhao Y., Weng J., Yu M., Yu Q., Xie P., Liu S., Guo L., Zhang B., Xu Y., et al. Galectin-4 drives anti-PD-L1/BVZ resistance by regulating metabolic adaptation and tumour-associated neutrophils in hepatocellular carcinoma. Gut. 2025 doi: 10.1136/gutjnl-2025-336374. [DOI] [PubMed] [Google Scholar]
  • 7.Taherifard E., Tran K., Saeed A., Yasin J.A., Saeed A. Biomarkers for Immunotherapy Efficacy in Advanced Hepatocellular Carcinoma: A Comprehensive Review. Diagnostics. 2024;14:2054. doi: 10.3390/diagnostics14182054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Shen K.Y., Zhu Y., Xie S.Z., Qin L.X. Immunosuppressive tumor microenvironment and immunotherapy of hepatocellular carcinoma: Current status and prospectives. J. Hematol. Oncol. 2024;17:25. doi: 10.1186/s13045-024-01549-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Zhang X., Lao M., Sun K., Yang H., He L., Liu X., Liu L., Zhang S., Guo C., Wang S., et al. Sphingolipid synthesis in tumor-associated macrophages confers immunotherapy resistance in hepatocellular carcinoma. Sci. Adv. 2025;11:eadv0558. doi: 10.1126/sciadv.adv0558. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Xia L., Oyang L., Lin J., Tan S., Han Y., Wu N., Yi P., Tang L., Pan Q., Rao S., et al. The cancer metabolic reprogramming and immune response. Mol. Cancer. 2021;20:28. doi: 10.1186/s12943-021-01316-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Feng J., Li J., Wu L., Yu Q., Ji J., Wu J., Dai W., Guo C. Emerging roles and the regulation of aerobic glycolysis in hepatocellular carcinoma. J. Exp. Clin. Cancer Res. 2020;39:126. doi: 10.1186/s13046-020-01629-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Hu B., Lin J.Z., Yang X.B., Sang X.T. Aberrant lipid metabolism in hepatocellular carcinoma cells as well as immune microenvironment: A review. Cell Prolif. 2020;53:e12772. doi: 10.1111/cpr.12772. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Ou Y., Xia C., Ye C., Liu M., Jiang H., Zhu Y., Yang D. Comprehensive scRNA-seq analysis to identify new markers of M2 macrophages for predicting the prognosis of prostate cancer. Ann. Med. 2024;56:2398195. doi: 10.1080/07853890.2024.2398195. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Zhang J., Wang H., Tian Y., Li T., Zhang W., Ma L., Chen X., Wei Y. Discovery of a novel lipid metabolism-related gene signature to predict outcomes and the tumor immune microenvironment in gastric cancer by integrated analysis of single-cell and bulk RNA sequencing. Lipids Health Dis. 2023;22:212. doi: 10.1186/s12944-023-01977-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Barrett T., Wilhite S.E., Ledoux P., Evangelista C., Kim I.F., Tomashevsky M., Marshall K.A., Phillippy K.H., Sherman P.M., Holko M., et al. NCBI GEO: Archive for functional genomics data sets—Update. Nucleic. Acids Res. 2013;41:D991–D995. doi: 10.1093/nar/gks1193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Liu C., Zhou C., Xia W., Zhou Y., Qiu Y., Weng J., Zhou Q., Chen W., Wang Y.N., Lee H.H., et al. Targeting ALK averts ribonuclease 1-induced immunosuppression and enhances antitumor immunity in hepatocellular carcinoma. Nat. Commun. 2024;15:1009. doi: 10.1038/s41467-024-45215-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Guo X., Nie H., Zhang W., Li J., Ge J., Xie B., Hu W., Zhu Y., Zhong N., Zhang X., et al. Contrasting cytotoxic and regulatory T cell responses underlying distinct clinical outcomes to anti-PD-1 plus lenvatinib therapy in cancer. Cancer Cell. 2025;43:248–268.e9. doi: 10.1016/j.ccell.2025.01.001. [DOI] [PubMed] [Google Scholar]
  • 18.Korthauer K., Kimes P.K., Duvallet C., Reyes A., Subramanian A., Teng M., Shukla C., Alm E.J., Hicks S.C. A practical guide to methods controlling false discoveries in computational biology. Genome Biol. 2019;20:118. doi: 10.1186/s13059-019-1716-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Chen H., Boutros P.C. VennDiagram: A package for the generation of highly-customizable Venn and Euler diagrams in R. BMC Bioinform. 2011;12:35. doi: 10.1186/1471-2105-12-35. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Friedman J., Hastie T., Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J. Stat. Softw. 2010;33:1–22. doi: 10.18637/jss.v033.i01. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Speiser J.L., Miller M.E., Tooze J., Ip E. A Comparison of Random Forest Variable Selection Methods for Classification Prediction Modeling. Expert Syst. Appl. 2019;134:93–101. doi: 10.1016/j.eswa.2019.05.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Robin X., Turck N., Hainard A., Tiberti N., Lisacek F., Sanchez J.C., Muller M. pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinform. 2011;12:77. doi: 10.1186/1471-2105-12-77. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Steyerberg E.W., Harrell F.E., Jr., Borsboom G.J., Eijkemans M.J., Vergouwe Y., Habbema J.D. Internal validation of predictive models: Efficiency of some procedures for logistic regression analysis. J. Clin. Epidemiol. 2001;54:774–781. doi: 10.1016/S0895-4356(01)00341-9. [DOI] [PubMed] [Google Scholar]
  • 24.Wu T., Hu E., Xu S., Chen M., Guo P., Dai Z., Feng T., Zhou L., Tang W., Zhan L., et al. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innovation. 2021;2:100141. doi: 10.1016/j.xinn.2021.100141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Gerovska D., Larrinaga G., Solano-Iturri J.D., Marquez J., Garcia Gallastegi P., Khatib A.M., Poschmann G., Stuhler K., Armesto M., Lawrie C.H., et al. An Integrative Omics Approach Reveals Involvement of BRCA1 in Hepatic Metastatic Progression of Colorectal Cancer. Cancers. 2020;12:2380. doi: 10.3390/cancers12092380. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Chen Y.C., Li D.B., Wang D.L., Peng H. Comprehensive analysis of distal-less homeobox family gene expression in colon cancer. World J. Gastrointest. Oncol. 2023;15:1019–1035. doi: 10.4251/wjgo.v15.i6.1019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Li J., Huang K., Thakur M., McBride F., Sadagopan A., Gallant D.S., Khanna P., Laimon Y.N., Li B., Mohanna R., et al. Oncogenic TFE3 fusions drive OXPHOS and confer metabolic vulnerabilities in translocation renal cell carcinoma. Nat. Metab. 2025;7:478–492. doi: 10.1038/s42255-025-01218-9. [DOI] [PubMed] [Google Scholar]
  • 28.Liu J., Zhang X., Fan X., Liu P., Mi Z., Tan H., Rong P. Liensinine reshapes the immune microenvironment and enhances immunotherapy by reprogramming metabolism through the AMPK-HIF-1alpha axis in hepatocellular carcinoma. J. Exp. Clin. Cancer Res. 2025;44:208. doi: 10.1186/s13046-025-03477-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Li L., Chao Z., Waikeong U., Xiao J., Ge Y., Wang Y., Xiong Z., Ma S., Wang Z., Hu Z., et al. Metabolic classifications of renal cell carcinoma reveal intrinsic connections with clinical and immune characteristics. J. Transl. Med. 2023;21:146. doi: 10.1186/s12967-023-03978-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Vijayanathan Y., Ho I.A.W. The Impact of Metabolic Rewiring in Glioblastoma: The Immune Landscape and Therapeutic Strategies. Int. J. Mol. Sci. 2025;26:669. doi: 10.3390/ijms26020669. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Nie S., Zhang L., Liu J., Wan Y., Jiang Y., Yang J., Sun R., Ma X., Sun G., Meng H., et al. ALKBH5-HOXA10 loop-mediated JAK2 m6A demethylation and cisplatin resistance in epithelial ovarian cancer. J. Exp. Clin. Cancer Res. 2021;40:284. doi: 10.1186/s13046-021-02088-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Li J., Chang J., Wang J., Xu D., Yang M., Jiang Y., Zhang J., Jiang X., Sun Y. HOXA10 promote pancreatic cancer progression via directly activating canonical NF-kappaB signaling pathway. Carcinogenesis. 2022;43:787–796. doi: 10.1093/carcin/bgac042. [DOI] [PubMed] [Google Scholar]
  • 33.Wang T., Liu M., Jia M. Integrated Bioinformatic Analysis of the Correlation of HOXA10 Expression with Survival and Immune Cell Infiltration in Lower Grade Glioma. Biochem. Genet. 2023;61:238–257. doi: 10.1007/s10528-022-10258-9. [DOI] [PubMed] [Google Scholar]
  • 34.Ge F., Tie W., Zhang J., Zhu Y., Fan Y. Expression of the HOXA gene family and its relationship to prognosis and immune infiltrates in cervical cancer. J. Clin. Lab. Anal. 2021;35:e24015. doi: 10.1002/jcla.24015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Zhou J., Xie Z., Cui P., Su Q., Zhang Y., Luo L., Li Z., Ye L., Liang H., Huang J. SLC1A1, SLC16A9, and CNTN3 Are Potential Biomarkers for the Occurrence of Colorectal Cancer. Biomed. Res. Int. 2020;2020:1204605. doi: 10.1155/2020/1204605. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Fuchs J., Bareesel S., Kroon C., Polyzou A., Eickholt B.J., Leondaritis G. Plasma membrane phospholipid phosphatase-related proteins as pleiotropic regulators of neuron growth and excitability. Front. Mol. Neurosci. 2022;15:984655. doi: 10.3389/fnmol.2022.984655. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Tilve S., Iweka C.A., Bao J., Hawken N., Mencio C.P., Geller H.M. Phospholipid phosphatase related 1 (PLPPR1) increases cell adhesion through modulation of Rac1 activity. Exp. Cell Res. 2020;389:111911. doi: 10.1016/j.yexcr.2020.111911. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Zhang F., Chen X., Xi K., Qiu Z., Wang Y., Gui Y., Hou Y., Chen K., Zhang X. Long noncoding RNA HOXD-AS1 in various cancers: A meta-analysis and TCGA data review. OncoTargets Ther. 2018;11:7827–7840. doi: 10.2147/OTT.S184303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Liu Q., Li Y., Tan B., Zhao Q., Fan L., Zhang Z., Wang D., Zhao X., Liu Y., Liu W. LncRNA HAGLR regulates gastric cancer progression by regulating the miR-20a-5p/E2F1 axis. Aging. 2024;16:11843–11856. doi: 10.18632/aging.206039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Mardjuki R., Wang S., Carozza J., Zirak B., Subramanyam V., Abhiraman G., Lyu X., Goodarzi H., Li L. Identification of the extracellular membrane protein ENPP3 as a major cGAMP hydrolase and innate immune checkpoint. Cell Rep. 2024;43:114209. doi: 10.1016/j.celrep.2024.114209. [DOI] [PubMed] [Google Scholar]
  • 41.Zheng Y.L., Li L., Jia Y.X., Zhang B.Z., Li J.C., Zhu Y.H., Li M.Q., He J.Z., Zeng T.T., Ban X.J., et al. LINC01554-Mediated Glucose Metabolism Reprogramming Suppresses Tumorigenicity in Hepatocellular Carcinoma via Downregulating PKM2 Expression and Inhibiting Akt/mTOR Signaling Pathway. Theranostics. 2019;9:796–810. doi: 10.7150/thno.28992. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.He M., Gu W., Gao Y., Liu Y., Liu J., Li Z. Molecular subtypes and a prognostic model for hepatocellular carcinoma based on immune- and immunogenic cell death-related lncRNAs. Front. Immunol. 2022;13:1043827. doi: 10.3389/fimmu.2022.1043827. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

The data presented in this study are openly available in GEO at https://www.ncbi.nlm.nih.gov/geo/, accessed on 1 August 2025, reference number [GSE215011, GSE235863, and GSE279750].


Articles from Diagnostics are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES