Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Apr 14;15:12833. doi: 10.1038/s41598-025-97333-4

Identification and validation of LY6H and GRM3 as candidate biomarkers for Glioma-related epilepsy

Zhenpan Zhang 1,#, Jianhuang Huang 2,#, Caihou Lin 1,, Risheng Liang 1,
PMCID: PMC11997038  PMID: 40229486

Abstract

Gliomas are the most common primary tumors of the central nervous system, with epilepsy serving as a frequent clinical manifestation. Glioma-related epilepsy (GRE) significantly affects patients’ quality of life and prognosis. In this study, we integrated bioinformatics and multiple machine learning methods to perform a proteomic analysis of brain tumor samples from patients with GRE and from those with gliomas none epilepsy (GNE). Our findings identified LY6H and GRM3 as potential signature proteins of GRE. Further investigation showed that LY6H and GRM3 expression levels were markedly reduced in GRE samples, with favorable diagnostic performance according to ROC curve analyses. Finally, we conducted an independent external validation using the Bluk-RNA dataset GSE199759, and the results corroborated our prior analyses. This work not only provides new biomarkers for the early detection of GRE but also offers valuable insights into its molecular mechanisms and potential therapeutic strategies.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-025-97333-4.

Keywords: Glioma-related epilepsy, Proteomic, Biomarkers, Lasso, Machine learning

Subject terms: Cancer genomics, Cancer screening, CNS cancer, Tumour biomarkers, Diagnostic markers, Predictive markers, Prognostic markers, Proteomics, Diseases of the nervous system, Computational biology and bioinformatics, Biomarkers, Oncology

Introduction

Gliomas are the most common primary intracranial tumors, accounting for approximately 80% of malignant central nervous system tumors1. According to the latest World Health Organization (WHO) classification of CNS tumors, gliomas range from Grade 1 to 4: Grades 1 and 2 are defined as low-grade gliomas (LGG), whereas Grades 3 and 4 are categorized as high-grade gliomas (HGG)2. Seizure symptoms represent one of the most common clinical manifestations of gliomas, a condition clinically defined as glioma-related epilepsy (GRE). The prevalence of GRE approaches nearly 50% in patients with HGG and rises to an impressive 90% in those diagnosed with LGG3. Most GRE cases begin with focal seizures that evolve into bilateral tonic–clonic seizures, which can be life-threatening in severe situations4. Moreover, because GRE patients often require long-term administration of one or more antiepileptic drugs, the resulting cognitive impairment significantly undermines their quality of life and work capacity5. Recurrent episodes of GRE may potentially expedite the histological or radiological progression of LGG into HGG, thereby exerting a profound influence on the overall survival outcomes of patients afflicted with gliomas4,6,7. Elucidating the molecular mechanisms underlying GRE is therefore crucial for accurate prognosis assessment and the development of effective treatment strategies.

Previous studies indicate that dysregulated glutamate metabolism8, reduction in GABAergic neurons9, and aberrant ion channel activation10are closely involved in GRE. However, the molecular profiles within gliomas that precipitate GRE remain poorly understood. Most existing literature focuses on the mechanisms of a limited number of proteins13,14, and comprehensive analyses of proteomic landscapes are scarce, leaving significant gaps in our understanding of GRE pathogenesis9. Identifying additional key proteins would not only shed new light on GRE pathophysiology but also pave the way for precision therapies.

Proteomic technologies offer a robust approach to examining the molecular networks that underlie GRE15. Liquid chromatography–tandem mass spectrometry (LC-MS/MS), in particular, can rapidly quantify thousands of proteins, thereby enabling the investigation of tumor–epilepsy interactions within the tumor microenvironment16. In this study, LC-MS/MS–based proteomic analysis was employed to compare brain tumor samples from patients with GRE and those with glioma none epilepsy (GNE). By systematically identifying proteins crucial to GRE pathogenesis, this work aims to provide novel insights for clinical diagnosis and to inform the development of individualized therapeutic strategies17,18.

Materials and methods

Pathological sample collection

This study constitutes a retrospective case-control analysis, as illustrated in Fig. 1. The subjects were patients diagnosed with GRE and primary epilepsy who underwent surgical intervention in the Department of Neurosurgery at Fujian Medical University Union Hospital between January 2022 and June 2024. The study included only newly diagnosed patients and did not include cases of recurrent glioma. Collected data included age, sex, initial symptoms, tumor laterality, and primary anatomical location of the tumor. The study was approved by the Ethics Committee of Fujian Medical University Union Hospital. Furthermore, all methods were conducted in accordance with relevant guidelines and regulations.

Fig. 1.

Fig. 1

Technical flowchart of the study. (A) Study Cohorts and Experimental Workflow. (Created by Figdraw) (B) Proteomic Analysis and Key Protein Identification Strategy. GRE, glioma-related epilepsy. DEPs, differentially expressed proteins.

GRE Diagnosis and Inclusion Criteria: (1) Pathological diagnosis confirming the CNS5 classification of diffuse gliomas19. (2) EEG confirmation of epileptiform discharges associated with diffuse gliomas. Exclusion Criteria: Age greater than 70 or less than 18; severe psychiatric disorders; hereditary or idiopathic epilepsy. The rules for determining the causal relationship between epilepsy symptoms and glioma are as follows: (1) The patient has no prior history of epilepsy; (2) There are accompanying symptoms of increased intracranial pressure (such as headache and vomiting) or focal neurological deficits (such as hemiplegia or aphasia); (3) Dynamic electroencephalogram (EEG) or long-term video EEG captures the brain activity during both the interictal and ictal periods, clearly defining the spatial relationship between the epileptic focus and the tumor.

Ethical review

The pathological sample testing and data utilization in this study have been approved by the Ethics Review Committee of Fujian Medical University Union Hospital. The ethical approval number is 2023 KY163.

Proteomic sequencing and analysis

Sample acquisition

Following treatment protocols, glioma resection and epileptic focus excision surgeries were performed, with samples obtained intraoperatively for GRE (experimental group) and glioma none epilepsy (control group). All samples underwent rigorous pathological diagnosis conducted by two senior neuropathologists.

Protein extraction and trypsin digestion

Pathological samples were mixed with four volumes of lysis buffer (1% SDS, 1% protease inhibitors) and subjected to sonication. Following centrifugation at 12,000 g for 10 min at 4 °C, the supernatant was transferred to a new tube for protein concentration determination utilizing the BCA assay. Equal amounts of protein from each sample were prepared for digestion, adjusting the volume with lysis buffer, followed by the addition of one volume of pre-chilled acetone. Following vortex mixing, four volumes of pre-chilled acetone were added, and samples were precipitated at − 20 °C for 2 h. After centrifugation at 4500 g for 5 min and discarding the supernatant, the pellet was washed two to three times with pre-chilled acetone. After air drying, the pellet was resuspended in TEAB at a final concentration of 200 mM, sonicated to disperse, and trypsin was added at a ratio of 1:50 (protease: protein, m/m) for overnight digestion. Subsequently, dithiothreitol (DTT) was added to achieve a final concentration of 5 mM, followed by reduction at 56 °C for 30 min. Iodoacetamide (IAA) was then added to achieve a final concentration of 11 mM, and the samples were incubated in the dark at room temperature for 15 min.

LC-MS/MS analysis

Peptides were dissolved in mobile phase A and separated using the Easy-nLC1000 ultra-high-performance liquid chromatography system. Mobile phase A consisted of a water solution containing 0.1% formic acid and 2% acetonitrile; mobile phase B comprised acetonitrile-water solution with 0.1% formic acid. The liquid chromatography gradient was set as follows: 0–9 min, 6–24% B; 9–11 min, 24–35% B; 11–13 min, 35–90% B; 13–15 min, 90% B, with a flow rate maintained at 500 nl/min. After separation via ultra-high-performance liquid chromatography, the fractions were injected into a capillary ion source for ionization, followed by data acquisition with the timsTOF Pro mass spectrometer. The ion source voltage was set at 1.75 kV, and both parent ions and their secondary fragments were detected and analyzed using TOF. Data acquisition utilized a data-independent parallel accumulation serial fragmentation (dia-PASEF) mode, with the first mass spectrometry scan range set to 300–1500 m/z. After collecting one first-stage spectrum, 20 PASEF mode acquisitions followed, with secondary mass spectrometry scans spanning 400–850 and each window being 7 m/z.

Database search

DIA data were processed using the DIA-NN search engine20 (v.1.8). Tandem mass spectrometry data were compared against the Homo_sapiens_9606_SP_20231220.fasta database (containing 20,429 entries) as well as a reverse decoy database. Trypsin/P was designated as the cleavage enzyme, allowing for a maximum of one missed cleavage. Fixed modifications included N-terminal methionine cleavage and carbamidomethylation of cysteine. The false discovery rate (FDR) was adjusted to < 1%.

Quality control

Post-proteomic sequencing, stringent quality control measures were applied to the data. Initially, the total ion current (TIC) and base peak chromatograms (BPC) from mass spectrometry data were evaluated for sequencing quality. Subsequently, software such as Mascot and Proteome Discoverer was employed for peptide and protein identification and quantification. Normalization of peak areas for labeled peptides was performed, eliminating low-abundance and poorly reproducible peptides to ensure data accuracy and reliability.

Domain annotation

Protein domains are specific regions within proteins characterized by conserved sequences that usually perform independent functions and represent structural components of molecular function, typically comprising 25 to 500 amino acids. These regions are spatially compact and structurally stable, capable of independently folding into functional structures. A protein may possess multiple domains, and a single domain may exist in various proteins. For our project data, protein domain annotation was performed based on the Pfam database (http://pfam.xfam.org) and the associated PfamScan tool.

Differential expression analysis

Following quality control, the data were leveraged for differential expression analysis. The “limma” package21 was employed to compare the experimental and control groups, identifying significantly differentially expressed proteins (DEPs). The screening criteria for DEPs were Padj < 0.05 and |log2 fold change| ≥ 1. Visualization of results was accomplished through volcano and heatmap analyses.

Weighted gene Co-Expression network analysis (WGCNA)

To further investigate protein expression patterns, we utilized the R package “WGCNA”22 for weighted gene co-expression network analysis. Initial preprocessing of data included selecting the top 2000 proteins based on median absolute deviation (MAD), checking for missing values, and identifying outlier samples. Subsequently, soft thresholding was selected with a scale-free topology index (R²) > 0.85 and an average connectivity < 100, prioritizing the smallest power value that meets these conditions. A one-step method was used to construct the co-expression network, with parameters set to minModuleSize and mergeCutHeight, with default values of 30 and 0.25, respectively. Additionally, the default network type is unsigned. Thereafter, proteins were clustered into distinct modules via dynamic tree cut. Finally, Pearson correlation coefficients were calculated to assess the relationship between traits and modules, alongside p-values, and gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses were performed using the “clusterProfiler” package23 to identify critical functional modules and pathways most relevant to GRE.

GRE-Associated differentially expressed Proteins(GRE-Associated DEPs)

The intersection of proteins from the most relevant modules obtained via WGCNA with DEPs yielded GRE-associated DEPs, visualized using a Venn diagram.

Protein-Protein interaction (PPI) network construction

To delve deeper into the interactions of GRE-associated DEPs, we constructed a PPI network utilizing the STRING database (https://cn.string-db.org/). By inputting the list of GRE-associated differentially expressed proteins, corresponding PPI interaction data were acquired, and network visualization was accomplished through Cytoscape software24(version v3.10.2). Subsequently, the “CytoHubba” plugin25 was employed to score all proteins, identifying the top 20 GRE-associated DEPs under the Degree Centrality and Matthews Correlation Coefficient (MCC) algorithms.

Protein functional enrichment analysis

Initially, we conducted functional annotation through databases like UniProt and NCBI, predicting functional domains via InterProScan. Subsequently, GO enrichment analysis and KEGG pathway enrichment analysis26 (www.kegg.jp/kegg.kegg1.html)were performed using the R package “clusterProfiler”23, setting significance thresholds (p-value < 0.05 and enrichment fold > 1.5). To ascertain the significance of enrichment, Fisher’s exact test was employed, and the Benjamini-Hochberg method adjusted p-values to control for false discovery rates. Finally, bar plots or bubble plots were generated using “ggplot2”27 for result visualization.

Lasso regression model

To further screen for key proteins associated with GRE, we employed the R packages “glmnet”28and “caret”29for constructing the Lasso model and selecting proteins. Initially, phenotypic data were designated as the response variables (y), and protein expression data as the predictor variables (x). To ensure the reproducibility of our analysis results, we set a random seed and utilized the createFolds function to partition the data into five folds for cross-validation. Within each fold, we conducted a grid search over various alpha values (ranging from 0 to 1 with increments of 0.1) to identify the optimal alpha and lambda values through cross-validation. Subsequently, based on the cross-validation outcomes, we selected the model demonstrating the best overall performance and fitted the final Lasso regression model using the optimal parameters derived from this model. Finally, we achieved visualization of the results using the “ggplot2” package27.The candidate proteins selected by Lasso regression will be further subjected to feature selection using machine learning methods.

Recursive feature elimination and support vector Machine(SVM-RFE)

We employed a combination of Recursive Feature Elimination (RFE) and Support Vector Machine (SVM) methodologies to identify key proteins associated with GRE30,31, assessing model performance through cross-validation. This analysis was implemented using the R packages “caret”29and “e1071”32. Initially, the SVM-RFE algorithm recursively eliminates the least important features through five-fold cross-validation until a specified number of features remain. Subsequently, the features identified by the SVM-RFE algorithm were retained, and error rate and accuracy curves were plotted to evaluate the effectiveness of the feature selection process. Finally, an SVM model was constructed using the selected features. Feature coefficients were obtained using an SVM model with a linear kernel33, and feature importance was visualized with the “ggplot2” package27.

Random forest (RF) model

To address the issue of class imbalance, we employed the Synthetic Minority Over-sampling Technique (SMOTE) from the “DMwR” package34for data augmentation. Subsequently, a Random Forest classification model was constructed using the “randomForest” package35. The model parameters were set to include 500 decision trees (ntree) and a number of features selected at each split equal to the square root of the total number of features (mtry). Variable importance was assessed using the Mean Decrease in Gini index. Importance bar plots were generated utilizing the “ggplot2” package27.To further elucidate the model’s predictions, SHAP (SHapley Additive exPlanations) values were calculated with the “fastshap” package36, enabling the analysis of each protein’s contribution to the model’s output. SHAP value summary plots were subsequently created. Finally, the top 10 proteins with the highest variable importance were identified.

Gradient boosting Machine(GBM)

To further classify and select key proteins associated with GRE, we employed the GBM. Initially, to mitigate the issue of class imbalance, data augmentation was performed using the SMOTE from the “DMwR” package34. Subsequently, we performed a hyperparameter grid search for the GBM model using the “caret” package29, selecting interaction.depth, number of trees (n.trees), learning rate (shrinkage), and minimum observations per leaf node (n.minobsinnode) as the parameters to be tuned. Five-fold cross-validation was employed to determine the optimal combination of these parameters. After identifying the best parameters, the final GBM model was trained. Protein importance was assessed using Relative Influence, and the top 10 proteins with the highest importance were extracted and visualized using horizontal bar charts created with the “ggplot2” package27. To further elucidate the feature contributions of the GBM model, SHAP values were calculated using the “fastshap” package36, enabling the evaluation of each protein’s impact on the model’s output. SHAP summary plots were then generated with the “ggplot2” package27, providing a visual representation of each protein’s overall contribution to the model’s predictions.

Identification and internal validation of GRE signature proteins

Firstly, we intersected the top 20 proteins ranked by the EPC and Degree algorithms, the top 10 proteins with the highest importance coefficients from the RF and GBM models, and the key proteins identified by the SVM-RFE model to obtain the GRE feature proteins. Subsequently, we employed the Mann-Whitney U test to compare the expression levels of GRE feature proteins between the two groups, considering p < 0.05 as statistically significant. Next, the diagnostic performance of GRE feature proteins was evaluated using the “ggplot2”27and “pROC”37packages. Finally, we established a multivariate model incorporating GRE feature proteins and clinical factors using the “rms” package38 to assess the benefit of GRE feature proteins in predicting GRE at the individual level.

External validation of Bulk-RNA sequencing transcriptome

The Bulk-RNA sequencing transcriptome data for GRE were sourced from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo). The search strategy for this study included: (1) thematic search for "glioma-related epilepsy"; (2) selection of "Expression profiling by array" under study type; (3) samples derived from Homo sapiens; (4) datasets containing control group samples. Among the search results that met these criteria, the dataset with the largest sample size was selected. The dataset GSE199759 comprised 9 GRE samples and 16 glioma with no epilepsy(GNE), with mRNA sequencing based on the GPL19072 sequencing platform.

Preliminary analysis of the Bulk-RNA sequencing dataset was conducted using R software (version 4.3.1). Prior to data analysis, cleaning was performed, including normalization and log2 transformation via the “NormalizeBetweenArrays” function. Differentially expressed genes (DEGs) were identified utilizing the “Limma” package21. Genes with |log2 fold change| > 1 and adjusted p-value < 0.05 were classified as DEGs. The expression levels of GRE signature genes were compared using the Mann-Whitney U test, with p< 0.05 indicating statistical significance. The diagnostic performance of GRE signature genes was subsequently evaluated using “ggplot2”27and “pROC”37.

Effects of GRE signature genes on gliomas

To further explore the expression profiles and prognostic implications of GRE signature genes in gliomas, we utilized data from the Chinese Glioma Genome Atlas (CGGA)39(https://www.cgga.org.cn/) and The Cancer Genome Atlas Program (TCGA)40(https://www.cancer.gov/ccg/research/genome-sequencing/tcga). The CGGA contains comprehensive genomic and clinical data from Chinese cohorts, enabling detailed analyses of gene expression patterns and patient outcomes. Meanwhile, TCGA encompasses large-scale, multi-omic datasets across various cancer types, including gliomas, along with robust clinical annotations.

Statistical analysis

Statistical analyses for this study were performed using R software version 4.3.1. The Mann-Whitney U test was utilized to compare protein or mRNA expression levels, with p < 0.05 signifying statistical significance.

Results

General information

A total of 20 GRE patients and 10 GNE patients were included in the proteomic study (Table 1). Detailed clinical characteristics are presented in Table 1.

Table 1.

Baseline clinical characteristics of all patients in proteomic studies.

Characteristics Distribution GNE(N = 10) GRE(N = 20) p-value
Gender Female(n,%) 5 (50.0) 9 (45.0) 1
Male(n,%) 5 (50.0) 11 (55.0)
Age(mean, SD) 40.40 (± 10.45) 39.55 (± 10.81) 0.241
Side Left(n,%) 3 (30.0) 11 (55.0) 0.365
Right(n,%) 7 (70.0) 9 (45.0)
Site Frontal Lobe(n,%) 6 (60.0) 10 (50.0) 0.897
Temporal Lobe(n,%) 4 (40.0) 10 (50.0)
Pathology Diffuse Astrocytoma(n,%) 5 (50.0) 12 (60.0) 0.896
Oligodendroglioma(n,%) 5 (50.0) 8 (40.0)
WHO Grade Grade 2(n,%) 9 (90.0) 15 (75.0) 0.628
Grade 3(n,%) 1 (10.0) 5 (25.0)

GRE, glioma-related epilepsy. GNE, glioma none epilepsy.

Differential protein expression analysis

A total of 6,784 proteins were identified through TMT labeling quantitative technology (Sum PEP Score < 0.01, FDR < 1%). All proteomic samples met the quality control and reproducibility criteria and were therefore included in the study (Fig. 2A-B). Based on the defined thresholds, a total of 1,376 differentially expressed proteins were identified, among which 740 were upregulated and 636 were downregulated (Fig. 2C-D). GO enrichment analysis of these differentially expressed proteins revealed significant involvement in biological processes such as oxidative phosphorylation, mitochondrial electron transport from NADH to ubiquinone, and regulation of neuron projection development. KEGG pathway enrichment primarily highlighted pathways related to neurodegeneration—multiple diseases, dopaminergic synapses, and cholinergic synapses (Fig. 2E).

Fig. 2.

Fig. 2

Differential Protein Expression Analysis between GRE and GNE Proteomes. (A-B) Quality control results of proteomic expression data. (C-D) Identification of 1,376 DEPs through differential expression analysis. E. Bar chart illustrating the GO and KEGG enrichment analysis results of DEPs. GRE, glioma-related epilepsy. GNE, glioma none epilepsy. DEPs, differentially expressed proteins.

WGCNA

Prior to analysis, quality control was performed, and no obvious outlier samples were identified (Fig. 3A). The R² threshold was set to 0.85, resulting in an optimal soft-thresholding power of 4 (Fig. 3B). Using a stepwise approach, a weighted co-expression network was constructed, identifying seven modules. The protein sets within each module are listed in Supplementary Table 1. Through Pearson correlation analysis, we found that the Brown module (r = 0.98, p = 6e- 20) was most significantly associated with GRE (Fig. 3C). The Brown module contains a total of 1,765 proteins (see Supplementary Table 1), some of which have been reported to be associated with glioma-related epilepsy (GRE), such as SLC1 A2, SYN1, GRM2, etc., suggesting the importance of this module. Subsequently, we conducted functional enrichment analysis on the proteins within the Brown module (Fig. 3D). The GO enrichment results included biological processes such as modulation of chemical synaptic transmission, regulation of synaptic plasticity, and glutamatergic synaptic transmission. KEGG pathway enrichment identified pathways including synaptic vesicle cycle, glutamatergic synapse, and cholinergic synapse. Therefore, we hypothesize that the Brown module represents the protein set most relevant to GRE.

Fig. 3.

Fig. 3

WGCNA of GRE and GNE Proteomic Data. (A) Quality control before analysis revealed no obvious outlier samples. (B) Optimal soft-thresholding power selected with R² set to 0.85. (C) Heatmap displaying Pearson correlations between different modules and various traits. (D) Bar chart showing functional enrichment results of proteins within the Brown module. GRE, glioma-related epilepsy. GNE, glioma none epilepsy. WGCNA, Weighted Gene Co-expression Network Analysis.

PPI construction

Initially, an intersection between the proteins within the Brown module and the DEPs was performed, resulting in 125 GRE-related DEPs(Fig. 4A). These GRE-related DEPs were subsequently imported into the STRING database to construct a PPI network. The constructed network comprised 123 nodes and 142 edges, with a PPI enrichment p-value of 4.33e- 06 (Fig. 4B). Further functional enrichment analysis of the GRE-related DEPs was conducted (Fig. 4C). Enrichment results highlighted pathways including oxidative phosphorylation, glutamatergic synapse, and dopaminergic synapse. Subsequently, using the CytoHubba plugin, we scored the proteins and identified the top 20 proteins based on the Degree and EPC algorithms (Supplementary Table 2).

Fig. 4.

Fig. 4

Construction of the PPI Network for GRE-Related DEPs. (A) Identification of GRE-related DEPs. (B) PPI network of GRE-related DEPs constructed using the STRING database. (C) Bar chart illustrating the functional enrichment analysis results of GRE-related DEPs. GRE, glioma-related epilepsy. GNE, glioma none epilepsy. DEPs, Differentially expressed proteins. PPI, Protein-Protein Interaction.

Lasso model

To further extract significant protein features, we employed Lasso regression modeling on the GRE-related DEPs. The optimal point on the performance curve was identified at a moderate regularization parameter (alpha value) (Fig. 5A-B), effectively preventing both overfitting and underfitting. This approach yielded 20 key protein features. The selected proteins and their corresponding importance coefficients are presented in Fig. 5C.

Fig. 5.

Fig. 5

Selection of Key Proteins for GRE Using the Lasso Regression Model. (A) Lasso coefficient trajectory plot. (B) Lasso cross-validation curve. (C) Ranking of important proteins and their coefficients. GRE, glioma-related epilepsy. Lasso, Least Absolute Shrinkage and Selection Operator.

Machine learning

Initially, RF algorithm was employed to model the protein expression data. The RF model assessed the importance of each protein based on the Mean Decrease in Gini index, providing insights into their relative contributions (Fig. 6A). To further interpret the predictive outcomes of the model, SHAP analysis was utilized, elucidating the impact of individual proteins on the model’s output (Fig. 6B).

Fig. 6.

Fig. 6

Machine Learning Selection of Key GRE Proteins. (A) Ranking of protein importance under RF model. (B) SHAP value summary plot under the RF model, illustrating the differential contributions of proteins to model predictions. (C) Ranking of protein importance under GBM model. (D) SHAP value summary plot under GBM model, illustrating the differential contributions of proteins to model predictions. (E-F) Curves showing that when the number of features is reduced to eight, the model’s error rate reaches its lowest point and accuracy reaches its highest point. (G) Bar chart displaying the feature coefficients of six key GRE proteins under the SVM-RFE model.

Next, we performed a grid search to optimize the hyperparameters of GBM model, identifying the optimal combination as follows: interaction depth of 5, number of trees set to 200, learning rate of 0.01, and minimum observations per node of 10. Utilizing this optimized model, we extracted the importance of each protein feature (Fig. 6C). To further investigate the interpretability of the model’s predictions, we employed SHAP analysis to evaluate the contribution of each protein to the model’s output. Figure 6D illustrates the impact of each protein on the model’s predictions and their distribution trends.

Subsequently, an iterative process of training SVM model was conducted, during which the least important protein features were progressively eliminated. This refinement ultimately identified a subset of GRE proteins critical to classification performance. The analysis revealed that when the number of features was reduced to six, the error rate reached its minimum on the error rate curve, while the accuracy peaked on the accuracy curve (Figs. 6E-F). The ranked importance of these six key GRE protein features is illustrated in Fig. 6G.

GRE signature proteins

Initially, we identified the intersection of proteins ranked in the top 20 by the EPC and Degree algorithms, the top 10 proteins with the highest importance scores derived from the RF and GBM models, and the critical proteins determined by the SVM-RFE model (Supplementary Table 2). This integrative analysis resulted in the identification of two GRE signature proteins, LY6H and GRM3 (Fig. 7A). Subsequently, the ROC curve analysis indicated that LY6H (AUC = 0.850) and GRM3 (AUC = 0.795) exhibited commendable diagnostic performance (Fig. 7B). Furthermore, we conducted a comparative analysis of the expression levels of the GRE signature proteins, revealing that LY6H and GRM3 were significantly downregulated in GRE.(Fig. 7C). We then utilized a multivariate model to assess the impact of the GRE signature proteins and various clinical factors on GRE. The results indicated that individual-level risk assessment using GRE signature proteins yielded significant benefits (Fig. 7D). Finally, functional enrichment analyses were conducted for LY6H and GRM3 individually (Fig. 7E-F). LY6H functional enrichment primarily Includes vesicle-mediated transport at synapses, regulation of synaptic plasticity, signal release from synapse, among Other Processes. GRM3 functional enrichment primarily includes signal release from synapse, exocytosis, neurotransmitter transport, among other Processes.

Fig. 7.

Fig. 7

Acquisition and Internal Validation of GRE Signature Proteins. (A) Venn diagram illustrating the acquisition of GRE signature proteins (LY6H and GRM3). (B) ROC curves demonstrating the excellent diagnostic performance of the two GRE signature proteins. (C) Violin plots showing the expression differences of the two GRE signature proteins. (D) Nomogram illustrating the benefits of using GRE signature proteins for individual-level GRE risk prediction. (E-F) Bubble plots displaying the functional enrichment results of LY6H and GRM3.

External validation using transcriptomic data

Prior to processing, the analysis of dataset GSE199759 revealed no obvious outlier samples (Fig. 8A). Differential expression analysis identified 381 differentially expressed genes (Fig. 8B). Using the Mann-Whitney U test, we observed that the mRNA expressions of LY6H and GRM3 were significantly downregulated in GRE patients (Fig. 8C). ROC curve analysis indicated that GRM3 (AUC = 0.903) and LY6H (AUC = 0.896) exhibited excellent diagnostic performance (Fig. 8D).

Fig. 8.

Fig. 8

External Validation of GRE signature Genes Using Transcriptomic Data. (A) Quality control results of dataset GSE199759 were satisfactory. (B) Differential expression analysis identified 381 differentially expressed genes. (C) Violin plots showing significant downregulation of LY6H and GRM3 mRNA expressions in GRE. (D) ROC curves demonstrating excellent diagnostic performance of GRE signature genes.

Impact of GRE signature genes on glioma

We conducted an analysis using the mRNAseq- 693 dataset from the CGGA database. The results revealed that the expression levels of LY6H and GRM3 significantly decreased with increasing pathological grade. Additionally, the expression levels of LY6H and GRM3 were significantly negatively correlated with the overall survival of glioma patients. Similar findings were observed in the TCGA dataset (Supplementary Figure S1). This suggests the potential of LY6H and GRM3 as promising targets for glioma-specific therapies.

Discussion

This study utilized LC-MS/MS technology to compare the proteomic data of GRE and GNE patients. By integrating bioinformatics and machine learning methods, we systematically identified signature proteins associated with GRE. Additionally, we validated these findings using an independent external dataset, further confirming the reliability of our results. These discoveries not only enhance our understanding of the GRE molecular profile but also provide potential biomarkers for clinical diagnosis and treatment, highlighting their significant value.

Through rigorous research methodologies, we ultimately identified LY6H and GRM3 as signature proteins of GRE. Firstly, GRM3 encodes the metabotropic glutamate receptor 3 (mGluR3), which is localized at the presynaptic terminals of axons and functions to inhibit neurotransmitter release41.In a meta-analysis integrating four large-scale genome-wide association studies on epilepsy, encompassing 800,869 participants, GRM3 was recognized as a potential risk gene for epilepsy42.Further studies have suggested that modulation or activation of mGluR may serve as a promising therapeutic strategy for epilepsy4345. However, some literature indicates that GRM3 gene expression alone is insufficient to differentiate between individuals with and without epileptic seizures46. We hypothesize that this discrepancy may be attributed to selection bias arising from the inclusion of infratentorial lesions in these studies. Notably, mGluR3 expression in gliomas is higher than in most other solid tumors, including lung, colon, and breast cancers, and is negatively correlated with patient survival rates47. In our study, both GRM3 mRNA and protein expression levels were significantly reduced in GRE patients, which was also associated with poorer glioma prognosis. These combined evidences suggest that GRM3 is a viable biomarker for GRE, although its precise mechanism of action remains to be elucidated.

Secondly, LY6H belongs to the lymphocyte antigen 6 (LY6) gene family. In recent years, LY6H has garnered increasing attention due to its multifaceted roles in cancer development, stem cell maintenance, immune regulation, and its association with more aggressive and treatment-resistant cancers. Multiple studies have confirmed that LY6H is significantly upregulated in gliomas and is closely linked to patient prognosis48. From a molecular structure perspective, LY6H retains a unique three-finger motif essential for binding to α7 nicotinic acetylcholine receptors (α7 nAChR)49, thereby inhibiting central nervous system nicotinic acetylcholine receptor currents and promoting epileptogenic processes50. It is noteworthy that in animal models of acquired epilepsy, LY6H knockout had been observed to induce an increase in the amplitude of miniature excitatory postsynaptic currents (mEPSCs) in glutamatergic neurons, which is intricately linked to remodeling processes within the nervous system51.Our study revealed a significant downregulation of LY6H mRNA and protein expression levels within the GRE, consistent with findings from previous research. However, the precise role of LY6H in the epileptogenic mechanisms of GRE remains to be elucidated through further investigation. Although the exact pathways through which the reduced expression of LY6H and GRM3 triggers GRE remain unclear, our results sufficiently demonstrate the critical and significant roles of these two proteins in GRE.

Furthermore, LY6H and GRM3 exhibit substantial clinical application potential as diagnostic biomarkers for GRE. They achieved AUC values of 0.850 and 0.795, respectively, in ROC curve analyses, indicating high diagnostic accuracy. Validation in the Bulk-RNA dataset further substantiated their stability and generalizability. Clinically, these biomarkers can be detected through routine RNA assays, which are cost-effective and straightforward, facilitating their widespread adoption. Additionally, combining these markers with other clinical factors can help construct more precise diagnostic models, thereby playing a crucial role in the early identification of high-risk individuals and the formulation of personalized treatment plans. Moreover, their underlying biological functions may present new targets for future therapeutic interventions. Overall, LY6H and GRM3, with their excellent diagnostic performance and clinical feasibility, hold promise as essential tools for clinicians to promptly identify and manage GRE patients, thereby significantly improving patient outcomes and advancing precision medicine.

The primary strength of this study lies in its comprehensive integration of multiple advanced bioinformatics approaches, including proteomics, high-throughput data analysis, and machine learning models, to systematically uncover potential pathogenic molecules associated with GRE. The findings underscore the critical roles of LY6H and GRM3 in the progression of glioma and tumor-associated epileptic seizures, suggesting that targeting these receptors may offer a promising “one-drug-multiple-targets” therapeutic strategy. Nevertheless, certain limitations should be acknowledged. Firstly, the relatively small sample size may impact the generalizability of the results. Future studies with larger cohorts are needed to further validate the stability and universality of these characteristic proteins. That said, the current sample size has been rigorously assessed and deemed to possess sufficient statistical power. Secondly, while the association of LY6H and GRM3 with GRE has been substantiated through proteomic analyses and external validation using Bulk-RNA sequencing, their precise biological functions and underlying mechanisms remain to be fully elucidated. Future research should aim to further delineate their specific roles in the initiation and progression of GRE through functional experiments, such as gene knockout or overexpression analyses.

Conclusion

In this study, we employed a combination of bioinformatics analyses and machine learning methods to identify LY6H and GRM3 as signature proteins of GRE and validated their superior diagnostic performance. This discovery not only provides new biomarkers for the early diagnosis of GRE but also offers significant insights into the molecular mechanisms underlying GRE and the development of novel therapeutic strategies. Future research should further validate these findings and thoroughly explore their potential clinical applications.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1 (649.5KB, jpg)
Supplementary Material 2 (517.3KB, docx)
Supplementary Material 3 (38.5KB, xlsx)

Author contributions

Z.Z. and J.H. contributed equally as co-first authors. C.L. and R.L. contributed equally as co-corresponding authors. Z.Z.: Conceptualization, software development, formal analysis, validation, and drafting of the manuscript. J.H.: Study conceptualization, formal analysis, project administration, and manuscript drafting. C.L.: Study design, funding acquisition, and manuscript review and editing. R.L.: Funding acquisition, and manuscript quality control. All authors reviewed and approved the final manuscript.

Funding

This work was supported by the Joint Funds for the Innovation of Science and Technology, Fujian Province [grant number 2019Y9052].and the Joint Funds for the Innovation of Science and Technology, Fujian Province [grant number 2023Y9198].

Data availability

The datasets used and analyzed during the current study are available from the following sources: GSE199759: Bulk RNA sequencing dataset available at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi? acc=GSE199759.The proteomic data used in this study can be reasonably requested from the corresponding author. We will provide relevant metadata and documentation to facilitate the proper use of these data.

Declarations

Ethics approval and consent to participate

This study was conducted in accordance with ethical guidelines and regulations, including the Declaration of Helsinki. Ethical approval for the study was obtained from the Ethics Committee of Fujian Medical University Union Hospital (Approval No.: 2023 KY163). Written informed consent was obtained from all participants and/or their legal guardians before participation. Participants were fully informed of the study objectives, procedures, risks, and benefits, and were assured of their right to withdraw from the study at any time without any consequences.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Zhang Zhenpan and Huang Jianhuang: Co-first author.

Contributor Information

Caihou Lin, Email: grouplin@fjmu.edu.cn.

Risheng Liang, Email: doctorlr123@126.com.

References

  • 1.Xu, S. et al. Immunotherapy for glioma: current management and future application. Cancer Lett.476, 1–12 (2020). [DOI] [PubMed] [Google Scholar]
  • 2.Louis D N, Perry, A. et al. The 2021 WHO classification of tumors of the central nervous system: a summary. Neuro Oncol.23(8), 1231–1251 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.You, G. & Sha, Z. Clinical diagnosis and perioperative management of Glioma-Related epilepsy. Front. Oncol.10, 550353 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Pauletto, G. et al. Pre- and Post-surgical poor seizure control as hallmark of malignant progression in patients with glioma??. Front. Neurol.13, 890857 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Yogendran, L. & Rudolf, M. Navigating disability insurance in the American healthcare system for the low-grade glioma patient. Neurooncol Pract.10(1), 5–12 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Mazzucchi, E. et al. The persistence of seizures after tumor resection negatively affects survival in low-grade glioma patients: a clinical retrospective study. J. Neurol.269(5), 2627–2633 (2022). [DOI] [PubMed] [Google Scholar]
  • 7.Santos-Pinheiro F, Park, M. et al. Seizure burden pre- and postresection of low-grade gliomas as a predictor of tumor progression in low-grade gliomas. Neurooncol Pract.6(3), 209–217 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Buckingham, S. C. et al. Glutamate release by primary brain tumors induces epileptic activity. Nat. Med.17(10), 1269–1274 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Saviuk, M. et al. Unexplained causes of Glioma-Associated epilepsies: A review of theories and an area for research. Cancers (Basel), 15(23). (2023). [DOI] [PMC free article] [PubMed]
  • 10.Takayasu, T. et al. Ion channels and their role in the pathophysiology of gliomas. Mol. Cancer Ther.19(10), 1959–1969 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Avila E K, Tobochnik, S. et al. Brain tumor-related epilepsy management: A society for Neuro-oncology (SNO) consensus review on current management. Neuro Oncol.26(1), 7–24 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Van Dellen E, D. O. U. W. L. et al. MEG network differences between low- and high-grade glioma related to epilepsy and cognition. PLoS One7(11), e50122 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Shen, S. et al. Correlation of preoperative seizures with a wide range of tumor molecular markers in gliomas: an analysis of 442 glioma patients from China. Epilepsy Res.166, 106430 (2020). [DOI] [PubMed] [Google Scholar]
  • 14.Feyissa, A. M. et al. Potential influence of IDH1 mutation and MGMT gene promoter methylation on glioma-related preoperative seizures and postoperative seizure control . Seizure69, 283–289 (2019). [DOI] [PubMed] [Google Scholar]
  • 15.Bludau, I. Proteomic and interactomic insights into the molecular basis of cell functional diversity. Nat. Rev. Mol. Cell. Biol.21(6), 327–340 (2020). [DOI] [PubMed] [Google Scholar]
  • 16.Zhang, Z. et al. High-throughput proteomics. Annu. Rev. Anal. Chem. (Palo Alto Calif)7, 427–454 (2014). [DOI] [PubMed] [Google Scholar]
  • 17.Park, M. et al. The role of extracellular vesicles in optic nerve injury: neuroprotection and mitochondrial homeostasis. Cells, 11(23). (2022). [DOI] [PMC free article] [PubMed]
  • 18.Li, Y. et al. Proteomic characterization of gastric cancer response to chemotherapy and targeted therapy reveals new therapeutic strategies. Nat. Commun.13(1), 5723 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Sejda, A. & Grajkowska, W. WHO CNS5 2021 classification of gliomas: a practical review and road signs for diagnosing pathologists and proper patho-clinical and neuro-oncological Cooperation. Folia Neuropathol.60(2), 137–152 (2022). [DOI] [PubMed] [Google Scholar]
  • 20.Lou, R. et al. Benchmarking commonly used software suites and analysis workflows for DIA proteomics and phosphoproteomics. Nat. Commun.14(1), 94 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Ritchie M E, Phipson, B. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res.43(7), e47 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Langfelder, P. WGCNA: an R package for weighted correlation network analysis. BMC Bioinform.9, 559 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Wu, T. et al. ClusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innov. (Camb)2(3), 100141 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Majeed, A. Protein-Protein interaction network exploration using cytoscape. Methods Mol. Biol.2690, 419–427 (2023). [DOI] [PubMed] [Google Scholar]
  • 25.Chin, C. H. et al. CytoHubba: identifying hub objects and sub-networks from complex interactome. BMC Syst. Biol.8(Suppl 4(Suppl 4), S11 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Kanehisa, M. & Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res.28 (1), 27–30 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Villanueva R A M & Chen Z J. ggplot2: Elegant Graphics for Data Analysis [Z] (Taylor & Francis, 2019).
  • 28.Tay, J. K. & Narasimhan, B. Hastie T. Elastic net regularization paths for all generalized linear models. J. Stat. Softw., 106. (2023). [DOI] [PMC free article] [PubMed]
  • 29.Kuhn M. Building predictive models in R using the caret package. J. Stat. Softw.28, 1–26 (2008).
  • 30.Guyon, I. An introduction to variable and feature selection. J. Mach. Learn. Res.3(Mar), 1157–1182 (2003). [Google Scholar]
  • 31.Guyon, I. & Weston, J. Gene selection for cancer classification using support vector machines. Mach. Learn.46, 389–422 (2002). [Google Scholar]
  • 32.Dimitriadou, E. et al. Package ‘e1071’. R Software package, avaliable at http://cran rproject org/web/packages/e1071/index html, (2009).
  • 33.Weston, J. et al. Use of the zero norm with linear models and kernel methods. J. Mach. Learn. Res.3, 1439–1461 (2003). [Google Scholar]
  • 34.Torgo L. Package’DMwR’. Comprehensive R Archive Network, (2013).
  • 35.Biau, G., Scornet E. A random forest guided tour. Test25, 197–227 (2016). [Google Scholar]
  • 36.Jethani, N. et al. Fastshap: Real-time shapley value estimation; proceedings of the International conference on learning representations, F, [C]. (2021).
  • 37.Robin, X. & Turck, N. pROC: an open-source package for R and S + to analyze and compare ROC curves. BMC Bioinform.12, 77 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Harrell F E. Regression modeling strategies. R package version. : 6.2-0. (2012).
  • 39.Zhao, Z. et al. Chinese glioma genome atlas (CGGA): A comprehensive resource with functional genomic data from Chinese glioma Patients. Genomics Proteom. Bioinf.19 (1), 1–12 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Weinstein, J. N. et al. The cancer genome atlas Pan-Cancer analysis project. Nat. Genet.45 (10), 1113–1120 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Dogra, S. et al. Activating mGlu(3) metabotropic glutamate receptors rescues Schizophrenia-like cognitive deficits through metaplastic adaptations within the hippocampus. Biol. Psychiatry90(6), 385–398 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Song, M. et al. Genome-Wide Meta-Analysis identifies two novel risk loci for epilepsy. Front. Neurosci.15, 722592 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Monn, J. A. et al. Synthesis and Pharmacological characterization of C4(β)-Amide-Substituted 2-Aminobicyclo[3.1.0]hexane-2,6-dicarboxylates. Identification of (1 S,2 S,4 S,5 R,6 S)-2-Amino-4-[(3-methoxybenzoyl)amino]bicyclo[3.1.0]hexane-2,6-dicarboxylic acid (LY2794193), a highly potent and selective mGlu(3) receptor agonist. J. Med. Chem.61(6), 2303–2328 (2018). [DOI] [PubMed] [Google Scholar]
  • 44.Celli, R. et al. mGlu3 metabotropic glutamate receptors as a target for the treatment of absence epilepsy: preclinical and human genetics data. Curr. Neuropharmacol.21(1), 105–118 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Peterson A R, Binde, D. K. Astrocyte glutamate uptake and signaling as novel targets for antiepileptogenic therapy. Front. Neurol.11, 1006 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Lange, F. et al. A glutamatergic biomarker panel enables differentiating grade 4 gliomas/astrocytomas from brain metastases. Front. Oncol.14, 1335401 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Wirsching H G, Silgine, M. Negative allosteric modulators of metabotropic glutamate receptor 3 target the stem-like phenotype of glioblastoma. Mol. Ther. Oncolytics20, 166–174 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Qin, H. et al. Pan-cancer analysis suggests that LY6H is a potential biomarker of diagnosis, immunoinfiltration, and prognosis. J. Cancer15(17), 5515–5539 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Fruchart-Gaillard C, Gilquinb et al. Experimentally based model of a complex between a snake toxin and the alpha 7 nicotinic receptor. Proc. Natl. Acad. Sci. U S A99(5), 3216–3221 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Becchetti, A. et al. Nicotinic acetylcholine receptors and epilepsy. Pharmacol. Res.189, 106698 (2023). [DOI] [PubMed] [Google Scholar]
  • 51.Puddifoot C A, Wu, M. Ly6h regulates trafficking of alpha7 nicotinic acetylcholine receptors and nicotine-induced potentiation of glutamatergic signaling. J. Neurosci.35(8), 3420–3430 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1 (649.5KB, jpg)
Supplementary Material 2 (517.3KB, docx)
Supplementary Material 3 (38.5KB, xlsx)

Data Availability Statement

The datasets used and analyzed during the current study are available from the following sources: GSE199759: Bulk RNA sequencing dataset available at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi? acc=GSE199759.The proteomic data used in this study can be reasonably requested from the corresponding author. We will provide relevant metadata and documentation to facilitate the proper use of these data.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES