Summary
The clinical diagnosis of epilepsy is predominantly based on history taking, morbidity records, and imaging during seizures. The emergence of proteomics has enhanced disease marker detection and potential drug target identification. We perform a longitudinal survival analysis of 2,920 plasma proteins and epilepsy onset, utilizing plasma proteome data from 52,372 UK Biobank participants (440 incident cases). We identify 103 proteins with significant associations with epilepsy, with neurofilament light polypeptide (NEFL) (hazard ratio [HR] [95% confidence interval (CI)]: 2.13 [1.85–2.46]) and growth differentiation factor 1 (GDF15) (1.82 [1.60–2.07]) exhibiting the strongest correlations. Enrichment and network analyses uncovered the pivotal role of the immune response and pinpointed four central hubs. Furthermore, 103 screened proteins are significantly associated with brain regions implicated in epileptogenesis and show stronger correlation with stress-related events than genetic predisposition. We investigate the predictive ability of top-ranked proteins for future epilepsy risk and their potential as drug targets. These findings are crucial for identifying early biomarkers and optimizing therapeutic strategies.
Keywords: epilepsy, plasma proteins, machine learning, prediction
Graphical abstract

Highlights
-
•
Multiple plasma protein levels are associated with the risk of developing epilepsy
-
•
Abnormal trajectory changes in epilepsy-associated protein levels before disease onset
-
•
Pathway analyses highlight the pivotal role of the immune response
-
•
Plasma proteins are predictive of future epilepsy risk to some extent
Zhang et al. utilize large-scale plasma proteomic data to investigate the association between 2,920 plasma proteins and the risk of incident epilepsy. They characterize the temporal trajectories of epilepsy-associated protein levels preceding disease onset and perform clustering analysis. Ultimately, they develop a protein-based predictive model for assessing future epilepsy risk.
Introduction
Epilepsy, a common spectrum disorder with multiple risk factors and strong genetic predispositions,1,2 is usually accompanied by the presence of at least one comorbidity.3 The etiology of epilepsy includes inherited genetic mutations, structural lesions, infections, metabolic derangements, immune abnormalities, and other unknown factors.4 A variety of psychiatric disorders5 (e.g., anxiety and depression) and somatic symptom disorders6 (e.g., type 1 diabetes and arthritis) have been strongly associated with epilepsy. Until now, the clinical diagnosis of epilepsy has mostly relied on meticulous history taking, reliable morbidity records, and imaging during seizures, as there is no readily available gold standard. Although the efficacy of anti-seizure medication in treating epileptic symptoms is clear,7,8 there is still a lack of cause-specific treatment options. The most widely used and readily available clinical analyses are blood tests, and most therapeutic decisions are based on blood biomarkers, with protein being the most common analyte in clinical laboratories.9 Moreover, proteomic approaches are promising in successfully identifying biomarkers with comparable efficacy, particularly for certain diseases.9
Most previous studies in epilepsy proteomics have primarily focused on brain protein alterations in animal models and human brain tissue, mostly obtained from postmortem autopsies and surgical specimens.10,11,12,13,14,15 However, studies relying on human brain tissue for proteomic analysis10,11,13 have faced constraints due to the limited availability and challenges in acquiring samples, the analysis of single regions, cross-sectional design, or the insufficient sensitivity approach. Large-scale plasma proteomic studies offer a relatively comprehensive approach to overcoming these limitations. They empower researchers to discern protein expression variations among individuals and to elucidate the effects of environmental, lifestyle, and genetic factors on disease pathogenesis.16 Our study bridges the gap in research on plasma proteomic profiling of epilepsy risk in adults using large-scale plasma proteomic data. Prospective analysis of plasma protein dynamics in a longitudinal cohort may facilitate the underlying pathological processes preceding the clinical onset of epilepsy.
Using proteomic data from the UK Biobank (UKB), we performed longitudinal survival analyses to identify associations between 2,920 plasma protein levels and incident epilepsy risk followed over 17 years. Then, we selected proteins significantly associated with epilepsy and showed the trajectory changes of these screened proteins over the 15 years before epilepsy diagnosis. Furthermore, we correlated the epilepsy-associated proteins with neuroimaging data, polygenic risk scores (PRSs) for epilepsy, and environmental variables and searched for potential biological pathways and drug targets. Finally, we examined the contribution of these proteins to predict future epilepsy risk and the predictive ability of the highly ranked proteins.
Results
Characteristics of the study population
In the current study, we included a total of 52,372 individuals who had proteomic data and were not diagnosed with epilepsy at baseline. Over a mean follow-up period of 13.90 years (SD = 2.50), 440 individuals were diagnosed with incident epilepsy (Table 1). In addition, of the 2,923 proteomic biomarkers tested, 2,920 proteins were retained for our analysis after quality control (STAR Methods).
Table 1.
Demographic characteristics of participants included in the study
| Overall | Incident epilepsy | Healthy controls | p values | |
|---|---|---|---|---|
| N | 52,372 | 440 | 51,932 | |
| Age (mean, SD) | 56.81 (8.21) | 59.15 (7.64) | 56.79 (8.21) | <0.001 |
| Female, n (%) | 28,275 (54.0%) | 226 (51.4%) | 28,049 (54.0%) | 0.289 |
| BMI (mean, SD) | 27.46 (4.80) | 27.80 (5.08) | 27.46 (4.79) | 0.165 |
| Ethnic, white, n (%) | 48,830 (93.2%) | 411 (93.4%) | 48,419 (93.2%) | 0.348 |
| Education level, n (%) | <0.001 | |||
| College degree | 24,363 (46.5%) | 154 (35.0%) | 24,209 (46.6%) | |
| Other degree | 27,121 (51.8%) | 273 (62.0%) | 26,848 (51.7%) | |
| Smoking status, n (%) | 0.001 | |||
| Current | 5,512 (10.5%) | 64 (14.5%) | 5,448 (10.5%) | |
| Previous | 18,274 (34.9%) | 169 (38.4%) | 18,105 (34.9%) | |
| Never | 28,336 (54.1%) | 205 (46.6%) | 28,131 (54.2%) | |
| Alcohol drinker status, n (%) | 0.055 | |||
| Current | 47,778 (91.2%) | 385 (87.5%) | 47,393 (91.3%) | |
| Previous | 1,995 (3.8%) | 24 (5.5%) | 1,971 (3.8%) | |
| Never | 2,461 (4.7%) | 27 (6.1%) | 2,434 (4.7%) | |
| Follow-up years (mean, SD) | 13.90 (2.50) | 7.15 (3.85) | 13.96 (2.40) | <0.001 |
p values were estimated by t tests or chi-squared tests. All tests were two-sided. BMI, body mass index; SD, standard deviation.
Association between protein levels and incident epilepsy risk
We investigated the association of 2,920 plasma proteins with incident epilepsy using Cox proportional hazard models adjusted for age, gender, ethnicity, qualification, body mass index (BMI), socioeconomic status, smoking status, and alcohol drinker status. Following Bonferroni correction, 103 proteins were significantly associated with incident epilepsy during the follow-up periods (p < 0.05/2,920, Figure 1A). Among the screened proteins, neurofilament light polypeptide (NEFL, hazard ratio [95% confidence interval (CI)] = 2.13 [1.85–2.46]) and growth differentiation factor 15 (GDF15, 1.82 [1.60–2.07]) showed the strongest association with incident epilepsy risk, with p values as low as 3.36 × 10−25 and 5.93 × 10−20, respectively (Table S2). To minimize the impact of comorbidities, we conducted an additional analysis adjusting for diabetes history and estimated glomerular filtration rate (eGFR) stages. Forty-nine proteins were associated with incident epilepsy, 48 of which were consistent with the primary results (Table S3).
Figure 1.
Association of 2,920 plasma proteins with incident epilepsy
(A–E) Volcano plots displaying the hazard ratios (x axis) and statistical significance (−log10 of two-sided p values, y axis) for the associations of plasma proteins with incident epilepsy (A), incident epilepsy occurring in participants below (B) and above (C) the age of 60 years, and incident epilepsy occurring in males (D) and females (E). Cox proportional hazard regression models were adjusted for age, gender, ethnicity, qualification, body mass index, socioeconomic status, smoking status, and alcohol drinker status. Proteins above the horizontal dotted line had Bonferroni-corrected p < 0.05. The red dots represent risk proteins, while the purple dots represent protective proteins.
(F) Volcano plot showing the results of replication analysis using linear regression models adjusted for the same covariates of survival analysis. The x axis indicates the regression coefficient, and the y axis indicates the −log10 of the p value for each association. The blue dots represent risk proteins, and the orange dots represent protective proteins.
We further categorized epilepsy into focal epilepsy and generalized epilepsy for subgroup analysis. Of the 440 patients diagnosed with epilepsy during the follow-up period, 27 were identified as having focal epilepsy and 39 as having generalized epilepsy, while the majority of the remaining cases were categorized as unspecified epilepsy. The results showed that four proteins (leucine-rich alpha-2-glycoprotein [LRG1], hepatitis a virus cellular receptor 2 [HAVCR2], v-set and immunoglobulin domain-containing protein 4 [VSIG4], and serine protease inhibitor Kazal-type 1 [SPINK1]) were specifically associated with focal epilepsy, whereas four distinct proteins (branched-chain-amino-acid aminotransferase, cytosolic [BCAT1], GDF15, superoxide dismutase [Mn], mitochondrial [SOD2], and G antigen 2A [GAGE2A]) showed associations with generalized epilepsy (Tables S4 and S5). Among these biomarkers, all four focal epilepsy-linked proteins and generalized epilepsy-linked GDF15 demonstrated concordance with the main analysis.
After stratifying the study population by baseline age, gender, diabetes, and eGFR stages, we conducted Cox regression analyses to evaluate the effect of demographic factors and comorbidities on the relationship between protein levels and epilepsy risk. We categorized the participants into two groups based on age, using 60 years as the dividing line. We discovered that 33 proteins (Figure 1B; Table S6) and 21 proteins (Figure 1C; Table S7) were significantly linked to epilepsy in the respective age groups after applying the Bonferroni correction. In addition, we found 17 proteins associated with epilepsy in males, 15 of which were identified in the preliminary analyses (Figure 1D; Table S8). In contrast, 37 proteins were associated with epilepsy in females, 32 of which were consistent with findings in the overall population (Figure 1E; Table S9). In the nondiabetic population, 27 proteins were identified as associated with epilepsy (Table S10), compared with 2 (insulin-like growth factor-binding protein 7 [IGFBP7] and BMP-binding endothelial regulator protein [BMPER]) in the diabetic population (Table S11). Due to population limitations, we divided the participants into two groups: normal or high kidney function (G1) and mildly decreased kidney function (G2). In group G1, we identified 36 epilepsy-associated proteins (Table S12), while only 2 (receptor-type tyrosine-protein phosphatase C [PTPRC] and complement component C7 [C7]) were found in group G2 (Table S13). Significant correlations between NEFL and GDF15 proteins and the risk of epilepsy were found in most subgroup analyses.
We utilized discrete follow-up intervals to identify spectral differences between proteins associated with epilepsy risk in the near-term (epilepsy onset ≤ 5 years after protein measurement) and long-term (>5 years) period.17 The analysis identified 38 and 13 proteins associated with short-term and long-term epilepsy risk, respectively (Tables S14 and S15). Among them, GDF15 and NEFL showed consistently significant correlations across both time frames (p < 0.001). Notably, 13 proteins were newly identified as being specifically associated with short-term epilepsy risk, including leukocyte-associated immunoglobulin-like receptor 1 (LAIR1), cadherin-2 (CDH2), tumor necrosis factor ligand superfamily member 13B (TNFSF13B), vascular endothelial growth factor receptor 1 (FLT1), IGFBP-like 1 (IGFBPL1), prosaposin (PSAP), 6-pyruvoyl tetrahydrobiopterin synthase (PTS), tumor necrosis factor receptor superfamily member 16 (NGFR), pro-adrenomedullin (ADM), retinal dehydrogenase 1 (ALDH1A1), thimet oligopeptidase (THOP1), histamine N-methyltransferase (HNMT), and fatty acid-binding protein, liver (FABP1).
Sensitivity analyses and cross-sectional validation
To assess the robustness of the main findings, we conducted a sensitivity analysis excluding epilepsy cases diagnosed within 2 years of baseline. After Bonferroni correction, 81 proteins consistently remained significantly associated with epilepsy risk, as observed in the main analysis (Table S16). To validate the results found in the survival analysis, we included previously excluded participants diagnosed with epilepsy at baseline for cross-sectional validation. We used linear regression analysis to compare 2,920 plasma protein levels in 574 individuals with baseline epilepsy diagnoses and 51,932 controls who had never developed epilepsy. After Bonferroni correction, we identified 192 proteins associated with epilepsy, of which 29 showed consistent direction and significance with the longitudinal results (Figure 1F; Tables S17 and S18). This may be due to the different causal associations between protein levels and epilepsy.
Temporal evolution and trajectory clustering of epilepsy-associated proteins before epilepsy diagnosis
We then compared the trajectories of plasma protein levels before epilepsy diagnosis using locally estimated scatterplot smoothing (LOESS). We found 102 epilepsy-associated plasma proteins, excluding renin (REN), that showed changes in abnormal levels before diagnosis (Table S19). Figure 2A shows the chronological order in which protein levels reached abnormality (absolute Z score > 0.25) 11 years before diagnosis, from top to bottom. We observed that 30 plasma proteins exhibited abnormalities around 15 years before diagnosis and five proteins remained at abnormally high levels throughout the disease, including TNFRSF10B, glial fibrillary acidic protein (GFAP), phosphoinositide-3-kinase-interacting protein 1 (PIK3IP1), T-lymphocyte surface antigen Ly-9 (LY9), and ephrin-A4 (EFNA4). In contrast to most proteins, the levels of uromodulin (UMOD), ras-related protein Rab-6A (RAB6A), and contactin-5 (CNTN5) were abnormally low before epilepsy diagnosis. In addition, NEFL and GDF15 levels, which are strongly associated with epilepsy, became abnormal 14.2 and 11.2 years before diagnosis, respectively, and both remained abnormally high throughout the disease.
Figure 2.
Temporal evolutions of plasma proteins before the diagnosis of epilepsy and their trajectory clustering
(A) Z score changes of epilepsy-associated proteins. Protein levels were Z-scored using the mean and the standard deviation of that plasma protein in matched controls as the reference. The trajectories were estimated by LOESS. Heat represents absolute Z scores greater than 0.25 (red, Z score > 0; blue, Z score < 0).
(B-D) Protein trajectories of the three identified clusters. Clusters were grouped using unsupervised hierarchical clustering, with the thicker lines reflecting the average trajectory in each cluster. The number of proteins is displayed by n.
To reduce proteomic complexity, we used unsupervised hierarchical clustering to group epilepsy-associated plasma proteins into three clusters of protein trajectories ranging in size from 3 to 54 proteins (Table S20). All three clusters of proteins tended to show non-linear trajectories of change. NEFL and GDF15 plasma proteins were included in cluster 1. The protein trajectories of clusters 1 and 2 increased throughout the disease and reached maximum Z score levels 5 years before and at the time of disease diagnosis, respectively (Figures 2B and 2C). In contrast, the protein levels in cluster 3 remained abnormally low before disease onset (Figure 2D).
Biological function of epilepsy-associated proteins
We further explored the functional pathways of the screened epilepsy-associated proteins using enrichment analysis to gain insights into their biological features and regulatory mechanisms. The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment results showed that most proteins were mainly enriched in extracellular regions, cytokine-cytokine receptor interactions, signal transduction, and protein binding (Figure 3A; Table S21). In contrast, the phenotype enrichment analysis using the Mouse Genome Informatics (MGI) database18 showed that most epilepsy-associated proteins were significantly enriched in the immune system phenotype (pfisher = 9.56 × 10−12), the hematopoietic system phenotype (pfisher = 9.11 × 10−7), and the neoplastic phenotype (pfisher = 7.89 × 10−4) (Figure 3B; Table S22). This implies that the immune system may be crucial in developing the epileptic course.
Figure 3.
Biological function analysis of epilepsy-associated proteins
(A) Results of the pathway enrichment analysis. The x axis represents the −log10 of the p value for each term. Only 12 significant results are shown, and the full results are shown in Table S21. The y axis shows different terms, each indicated by different colors representing their sources. Abbreviations: BP, biological process; CC, cellular component; MF, molecular function; KEGG, Kyoto Encyclopedia of Genes and Genomes.
(B) Results of phenotype enrichment analysis. The x axis represents the −log10 of the p value for each term. The y axis indicates different phenotypes from the Mouse Genome Informatics platform. The p value was obtained from the Fisher exact test. Phenotypes significantly enriched by FDR correction are shown in the figure, and the full results are shown in Table S22.
(C) Results of the transcription factor enrichment analysis. Transcription factors that were significantly enriched after FDR correction are shown. Each enriched transcription factor target is linked to an associated gene. The color bar indicates the significance of the association between proteins and incident epilepsy, with darker colors denoting stronger significance.
(D) The protein-protein interaction network of epilepsy-associated proteins. The nodes represent proteins, with darker shades indicating higher importance. The edges connecting the nodes vary in width, representing the combined score value between the two connected proteins, with thicker lines indicating larger values of interaction strength.
Furthermore, we explored potential transcription factor targets using the transcriptional regulatory relationships unraveled by sentence-based text mining (TRRUST) database from Metascape. A total of 15 transcription factors were identified as key regulators of the biological effects, with top significance attributed to jun proto-oncogene (JUN), specificity protein 1 (SP1), nuclear factor κB p65 (RELA), GATA binding protein 2 (GATA2), and nuclear factor κB subunit 1 (NFKB1) (Figure 3C; Table S23). These transcription factors exert pivotal roles in various signaling pathways, including inflammatory responses, immune regulation, vascular pathology, and stress response pathways.19,20,21,22
Next, we performed protein-protein interaction (PPI) analyses using the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database23 to gain insight into the complex interactions among 103 epilepsy-associated proteins and to discover disease-related signaling pathways. Our analysis revealed a complex network of 236 PPIs among these proteins, underscoring a profound level of interconnectivity and hinting at substantial underlying biological associations. Utilizing the cytoHubba plugin in Cytoscape,24 we ranked the proteins and identified key hub proteins within the network. Notably, we discovered that tumor necrosis factor receptor superfamily member 1A (TNFRSF1A), CD274 molecule (CD274), HAVCR2, and metallopeptidase inhibitor 1 (TIMP1) exhibited the higher degree scores (Figure 3D; Table S24). Congruent with a preceding study,25 our research has identified that TNFRSF1A is a central hub gene in epilepsy, with a marked increase in its expression levels observed in individuals diagnosed with the condition.
Association between epilepsy-associated proteins and brain structure
The development of epilepsy is closely linked to cortico-subcortical neural networks, and changes in brain structure may reflect the widespread network effects of epilepsy. We tested the correlation between epilepsy-associated proteins and brain structures by using linear regression models to illustrate the association between plasma proteins and epilepsy risk. After the false discovery rate (FDR) correction, 36 and 46 of 103 epilepsy-associated plasma proteins were associated with lower cortical and subcortical whole-brain gray matter volumes, respectively (Figure 4A; Tables S25 and S26). Additionally, 4 proteins were associated with white matter hyperintensities (WMHs, Figure 4A; Table S27).
Figure 4.
Association between epilepsy-associated proteins and brain structure
(A) The association of the epilepsy-associated proteins with brain volume measures, including cortical gray matter volume (CGV), subcortical gray matter volume (SGV), and white matter hyperintensity (WMH). Associations were assessed using linear regression models, and the standardized coefficients were obtained. Statistical significance was determined using two-tailed p values. Significant associations after the FDR correction are indicated by “∗” symbols, while those after the Bonferroni correction are represented by “∗∗” symbols. The color bar illustrates the direction of the associations, with red indicating a positive correlation and blue indicating a negative correlation.
(B–E) Associations between epilepsy-associated proteins and different regional brain structural measures (e.g., cortical surface area in B, gray matter volume in C and D, and white matter index in E). Linear regression models were used. Statistical significance is based on two-tailed p values, with a significant association being FDR_Q < 0.05. Color bars indicate the number of significantly associated proteins, with darker colors indicating a higher number of significant associations.
We further investigated the relationship of selected plasma proteins with cortical and subcortical volumes, cortical surface area, and fractional anisotropy (FA) and mean diffusivity (MD) of 27 major white matter tracts. After FDR correction, a total of 77 proteins were found to be significantly associated with at least one brain region measure. It is worth noting that 16 proteins were significantly associated with at least ten brain structures, including NEFL, GDF15, GFAP, glutathione hydrolase 1 proenzyme (GGT1), asialoglycoprotein receptor 1 (ASGR1), prosaposin receptor GPR37 (GPR37), bone marrow stromal antigen 2 (BST2), chitinase-3-like protein 1 (CHI3L1), galectin-9 (LGALS9), LGALS4, TIMP1, paired immunoglobulin-like type 2 receptor alpha (PILRA), HAVCR1, follistatin-related protein 3 (FSTL3), angiopoietin-2 (ANGPT2), and WAP four-disulfide core domain protein 2 (WFDC2). For the surface of brain cortical regions, the left inferior parietal, bilateral medial orbitofrontal, right fusiform, left insula, left middle temporal, left lateral orbitofrontal, and left lateral occipital were associated with at least five proteins (Figure 4B; Table S28). For the cortical volumes in the different brain hemispheres, the left middle temporal gyrus, bilateral insula, left medial orbitofrontal, right superior frontal gyrus, and left lateral orbitofrontal were associated with at least three plasma protein levels, and most of the links were negative except between UMOD and the left middle temporal gyrus (Figure 4C; Table S29). For the volumes of subcortical regions, the bilateral hippocampus, bilateral thalamus, bilateral lateral ventricle, bilateral pallidum, and right caudate were significantly associated with at least ten proteins. Subcortical volumes of the bilateral hippocampus, bilateral pallidum, bilateral thalamus, and right caudate were negatively associated with most protein levels, while the bilateral lateral ventricles were positive (Figure 4D; Table S30). In addition, several major white matter fiber tracts including left superior longitudinal fasciculus, bilateral anterior thalamic radiation, left inferior frontal-occipital fasciculus, bilateral inferior longitudinal fasciculus, bilateral posterior thalamic radiation, and bilateral superior longitudinal fasciculus were related to at least three plasma proteins (Figure 4E; Table S31). These findings emphasize the potentially complex relationship between epilepsy-associated plasma proteins and brain regions involved in the pathogenesis of epilepsy.
Association of epilepsy-associated proteins with PRS for epilepsy and stress-related events
Then we calculated the PRS for epilepsy at five p value thresholds (pT_0.005, pT_0.05, pT_0.1, pT_0.5, and pT_1) to explore the role of genetic factors in the association between plasma protein levels and epilepsy. The results showed that 15 epilepsy-associated plasma proteins were nominally significantly associated with PRS, but none of these significances passed multiple tests (Figure 5A; Table S32).
Figure 5.
Association of epilepsy-associated proteins with genetic and stress-related events
(A) Association of epilepsy-associated proteins with genetic risk for epilepsy under five p thresholds (pT_0.005, pT_0.05, pT_0.1, pT_0.5, and pT_1). Linear regression models adjusted for the same covariates as survival analysis and the top 20 genetic principal components. Nominally significant associations are indicated by an asterisk (∗).
(B) Association of epilepsy-associated proteins with stress-related events. The associations were examined using linear regression models. Statistical significance was determined based on two-tailed p values, and multiple tests were performed using FDR.
(C-E) Mediation by protein levels of the association between stress-related events of different periods and incident epilepsy. (C) adulthood stress; (D) childhood stress; (E) lifetime stress. Latent variables including stress-related events, top 3 stress-related proteins, and epilepsy status were estimated in the model. Wald tests were used in the structural equation modeling analyses to obtain the two-tailed p values. ∗p < 0.05, ∗∗p < 0.01, and ∗∗∗p < 0.001. Abbreviations: std, standardized coefficients; DE, direct effect; IE, indirect effect.
Furthermore, we investigated the influence of environmental factors on the relationship between plasma protein levels and epilepsy using stress-related events. Depending on the period of growth, the types of stressful life events consisted mainly of childhood stress, adulthood stress, and lifetime stress. After FDR correction, 56 epilepsy-associated proteins were significantly associated with at least one stress type (Figure 5B; Table S33). Of these, 39 proteins were associated with adult stress (top 3 proteins: GDNF family receptor alpha-1 [GFRA1], layilin [LAYN], and protein AMBP [AMBP]), 35 with childhood stress (top 3 proteins: polymeric immunoglobulin receptor [PIGR], Coxsackievirus and adenovirus receptor [CXADR], and LAYN), and 44 with lifetime stress (top 3 proteins: LAYN, PIGR, and GFRA1). Three types of stressful life events were identified to be simultaneously associated with 19 proteins. Notably, childhood stress (β = 0.027, p = 0.012) and lifetime stress (β = 0.029, p = 0.003) were associated with higher GDF15 levels. In addition, the structural equation model (SEM) analyses suggested that several significant plasma proteins may partly mediate the effects of stressful life events on incident epilepsy (Figure 5C).
Predictive performance of identified proteins for future epilepsy risk
To predict the onset of epilepsy, we analyzed the baseline levels of 2,920 proteins in an initially healthy population, covering all incidents over a span of 10 years. Our machine learning model identified the top 100 proteins, including several associated with epilepsy we identified, based on their predictive significance. In the subsequent computational phase, these 100 proteins were incrementally integrated into the model and ranked according to their weights.
To highlight the most significant proteins, we focused on the top 50 based on their influence on prediction performance, as shown in the supplemental information (Figures 6A, S1, and S2;Tables S34–S36). More than 30 proteins were selected for each time condition, as additional evaluations did not significantly enhance the area under the curve (AUC) after four iterations. For all epilepsy incidents, the model demonstrated robust performance, with the 95% CI for AUC ranging from 0.654 to 0.708 (Figure 6B; Table S37). Specifically, for predictions within 10 years, the AUC ranged from 0.666 to 0.736 (95% CI). For predictions over 10 years, the model achieved an AUC between 0.719 and 0.789 (95% CI) (Figures 6C and 6D). Moreover, we performed subtype-specific predictive analyses for focal and generalized epilepsy to investigate whether each subtype exhibits a distinct protein profile. The results showed that the sets of top-ranked predictive proteins for the two subtypes were largely distinct, with minimal overlap (Figures S3A and S3B). The predictive model based on the protein panel yielded an AUC of 0.68 for generalized epilepsy, whereas its performance for focal epilepsy was lower, with an AUC of 0.62 (Figures S3C and S3D).
Figure 6.
Prediction of epilepsy occurrence and ROC curves for predicting future epilepsy
(A) The importance of the forecasters in the episode incident model. The bar chart demonstrates the ordering of the predictor variables based on their importance in the model categorization. The line graph outlines the receiver operating characteristic (ROC) curve (AUC) value and incorporates a projected value at each cycle. In the end, nearly 30 predictors (highlighted in red) were chosen to construct the final machine learning model.
(B–D) Receiver operating curves show the accuracy of the levels of proteins used to predict future epilepsy. (B) Epilepsy at any time from baseline; (C), epilepsy occurring within 10 years from baseline; (D), epilepsy occurring after 10 years from baseline.
To further explore the protein panel’s specificity for predicting the risk of all epilepsy incidents, the same plasma protein panel was applied to predict the risk of other common neurological conditions. Only dementia (AUC = 0.77) and Parkinson’s disease (PD, AUC = 0.76) exhibited higher predictive performance than epilepsy (Figure S4). This may be because the panel includes key proteins involved in dementia26 and PD17 risk prediction, such as NEFL and GFAP. Further evaluation of the effectiveness of proteins in predicting epilepsy incidence within and beyond 10 years demonstrated improved performance.
Drug target implications of epilepsy-associated proteins
Finally, we further investigated whether the genes encoding 103 epilepsy-associated proteins were enriched for the currently approved drug targets. A total of 17 genes were identified as drug targets for treating 15 diseases, mainly malignant neoplasms and respiratory diseases (Table S38). Of these, 12 genes were concurrently related to different types of malignant neoplasms, including serine/threonine-protein kinase receptor R3 (ACVRL1), ANGPT2, carbonic anhydrase 12 (CA12), CD274, ephrin type-A receptor 2 (EPHA2), ephrin type-B receptor 4 (EPHB4), interleukin-4 receptor subunit alpha (IL4R), tumor necrosis factor receptor superfamily member 3 (LTBR), placenta growth factor (PGF), TGF-beta receptor type-2 (TGFBR2), TNFRSF10A, and TNFRSF10B.
Discussion
The search for signature proteins associated with epilepsy risk has helped to provide insights into the pathophysiological mechanisms and time-varying characteristics and to identify blood biomarkers predictive of epilepsy. By analyzing the most comprehensive plasma proteomics data to date, 103 proteins have been identified as strongly associated with incident epilepsy, of which NEFL and GDF15 have the strongest correlations. Notably, these selected protein levels were altered in vivo many years before the onset of epilepsy. These epilepsy-associated proteins were inextricably linked to pathways involved in the immune system response, highlighting the close link between the immune system and epileptic seizures. In addition, the finding of significant associations of protein levels with neuroimaging, genetic predisposition, and stress-related events supports and to some extent explains their effect on epilepsy. Parsimonious models constructed solely on protein levels achieved an AUC of more than 0.7 in predicting future epilepsy, especially for long-term risk.
Previous large-scale proteomic studies on epilepsy in plasma have been relatively rare, yet they can partially validate our identified associations. Consistent with prior research, changes in the expression levels of plasma proteins such as mannose-binding protein C (MBL2),27 FYN-binding protein 1 (FYB1),27,28 apolipoprotein C-I (APOC1),27 nectin-2 (NECTIN2),27 afamin (AFM),29 C4b-binding protein beta chain (C4BPB),29 and alpha-2-antiplasmin (SERPINF2)29 have been reported to be associated with epilepsy. Moreover, in exploring the mechanisms underlying the associations between proteins and epilepsy, we found that immune inflammation and oxidative stress may all be involved. This is consistent with the views reported by previous studies, which suggested that neuroinflammation,29,30,31 intracellular signaling,29 and oxidative stress32 are the main processes involved during epilepsy.
In recent years, the burgeoning field of mass spectrometry-based plasma proteomics has steered researchers toward using protein assays from human plasma to seek therapeutic targets for diseases. The present study is the largest and most comprehensive proteomic analysis of epilepsy to date, which supports some of the findings of previous studies, such as identifying GDF15 as an early predictive biomarker of poor prognosis in status epilepticus associated with fever33 and the association of GFAP with incident epilepsy.34 GFAP is a key marker for astrocytes, and the glial scar formed after astrocyte activation may be directly or indirectly epileptogenic.35 Increased levels of GFAP in patients with epilepsy may reflect the activation state of astrocytes. After comparing the 103 epilepsy-associated proteins we identified with a previous study, we found 7 proteins encoded by the epilepsy-associated genes summarized by Wang et al.,36 including collagen alpha-1(IV) chain (COL4A1), collagen alpha-3(VI) chain (COL6A3), GFAP, NPC intracellular cholesterol transporter 2 (NPC2), occludin (OCLN), ribonuclease T2 (RNASET2), and lysosome membrane protein 2 (SCARB2). Notably, SCARB2 is encoded by a gene mutation that is specifically linked to epilepsy or epilepsy syndromes.37 Mutations in the SCARB2 gene can lead to neurophysiological abnormalities as well as the occurrence of myoclonic epilepsy, and the seizures are not frequent. Moreover, Banote et al.27 conducted a quantitative proteomic analysis of human plasma samples, revealing 41 proteins with significant differential expression. Among these, four proteins were strongly associated with epilepsy in our study, including MBL2, FYB1, and APOC1 in the cross-sectional analysis and NECTIN2 in the survival analysis. FYB1 is a key adapter of the FYN signaling network,38 which is closely linked to epileptogenesis.28 NECTIN2, a type of immunoglobulin-like cell adhesion molecule, is expressed in both neurons and astrocytes in mice,39,40,41 while interactions between neurons, astrocytes, and endothelial cells collectively form the neurovascular unit (NVU).42 Thus, NECTIN2 may contribute to the development of neurodegenerative diseases by affecting NVU function.43 In addition, Sun et al.44 performed plasma protein differential expression assays in children with Rolandic epilepsy and identified 217 differentially expressed proteins, of which C7, COL6A3, LRG1, and UMOD were significantly associated with epilepsy in our study. However, these previous epilepsy plasma proteomic studies27,44,45 had much smaller sample sizes and detected fewer proteins than the present study.
NEFL and GDF15, identified as the risk proteins with the most substantial association to epilepsy in our study, have consistently emerged in prior research as strongly correlated with a spectrum of neurological disorders.26,33,46,47,48 Recently, NEFL has been predominantly recognized for its robust association with the progression of Alzheimer’s disease (AD)26,48 and Charcot-Marie-Tooth disease49,50; however, it has been observed that NEFL levels are elevated in individuals with epilepsy as well.51 This observation is congruent with our findings, suggesting a broader role of NEFL in neurological conditions beyond AD. GDF15, a transforming growth factor β superfamily member,52 can be expressed in multiple tissues and cells and is upregulated by cellular stressors (e.g., hypoxia and mitochondrial dysfunction).53 Yamaguchi et al.33 found that serum GDF15 levels within 6 h of seizure onset could be used for early prediction of poor prognosis and that neuroimaging abnormalities were present in most patients with poor prognosis. We found that higher levels of plasma GDF15 have a positive effect on the risk of developing epilepsy, with levels gradually increasing 11 years before the onset of epilepsy. Our study also found that GDF15 significantly negatively affected brain structures such as the frontotemporal lobe and the hippocampus, which is in line with the findings of Andersson et al.54 A previous study55 mentioned that GDF15 is highly expressed during stress and starvation, which is consistent with our findings that stress-related events increase GDF15 levels, leading to an increased risk of developing epilepsy.
Furthermore, the genes encoding proteins implicated in epilepsy were significantly enriched for immune-related phenotypes in the MGI. Our PPI network analysis revealed that TNFRSF1A plays a key role in regulating epilepsy-related molecular pathways and biological processes. A previous study has shown that TNFRSF1A is involved in the modulation of cell survival, apoptosis, and inflammatory responses, underscoring its pivotal role in inflammation and immune regulation.56 A study conducted by Gledhill et al.31 specifically investigated the relationship between epilepsy and plasma proteins linked to neuroinflammation. They discovered an association between tumor necrosis factor alpha-related plasma proteins, which are connected to the immune response and similar to those identified in our study, and epilepsy. Additionally, a separate PPI analysis conducted by Hou et al.25 has also identified TNFRSF1A, along with CSF1R, as central hub genes in epilepsy, findings that are congruent with our own results. Collectively, these findings highlight the significance of inflammatory processes in the initiation and progression of epilepsy, and the interplay between inflammatory mediators and anti-inflammatory molecules may influence the trajectory of pathological mechanisms within epilepsy.57
Previous studies17,26,31,58,59 have consistently demonstrated the significant predictive efficacy of plasma proteins for future disease risk, particularly for neurodegenerative diseases. Several plasma proteomic studies based on the UKB have shown that plasma proteomes can predict the risk of AD,58 dementia,26 and PD17 with accuracies of 83.5%, 84.1%, and 80.9%, respectively. In our study, we selected a suitable protein panel for predicting epilepsy risk, which achieved an AUC of approximately 0.7. For these neurological disorders, including epilepsy, we found that NEFL and GFAP were frequently included in various protein prediction panels, with NEFL being the most important predictive protein. This partly explains why our protein panels achieved AUCs of 0.77 and 0.76 in predicting dementia and PD, respectively.
In conclusion, this large epilepsy proteomic study reveals extensive changes in plasma proteins and identifies epilepsy-associated proteins. The underlying mechanisms of epileptogenesis are further elucidated by exploring the biological functions of the relevant proteins. In addition, we demonstrated the temporal evolution of proteins before the onset of epilepsy, illuminating the progression of pathological changes in plasma proteins throughout the disease. This study is important for finding early predictive markers for epilepsy and for refining therapeutic approaches.
Limitations of the study
Some limitations in this study are also undeniable. First, the population we studied was predominantly Caucasian, which may limit the generalizability of the findings and the exploration of differential protein expression between races. Second, because of the possible delay in the time of epilepsy diagnosis, there may be a discrepancy between the estimated date before epilepsy diagnosis and that before the onset of epilepsy. This difference may have introduced some bias into the results regarding the dynamics of plasma proteins. Third, the constraints imposed by the raw data limited our capacity to delve into proteomic analyses of epilepsy subtypes classified by etiology. The small number of cases within each subtype may compromise the robustness of the predictive models. Moreover, the limited number of participants with overlapping proteomics and brain structure data precludes a meaningful analysis of brain structure changes in these specific epilepsy syndromes. Future research, with a more nuanced categorization of epilepsy types and access to comprehensive and larger datasets, could significantly enhance and refine our current findings. Fourth, while our research has illuminated the role of plasma protein levels in the progression of epilepsy, it has not fully explored the protein expression levels in other tissues, which is crucial for understanding epilepsy pathogenesis. Finally, due to the absence of a large-scale prospective cohort with incident epilepsy and plasma protein data for result replication, we resorted to cross-sectional analysis to validate the findings from our longitudinal study.
Resource availability
Lead contact
Further information and requests for resources should be directed to and will be fulfilled by the lead contact, Jintai Yu (jintai_yu@fudan.edu.cn).
Materials availability
This study did not generate new materials.
Data and code availability
This study does not generate new original data. The data used in this study were obtained from the UKB database: https://www.ukbiobank.ac.uk/, and request approval for access. The study was conducted following the Declaration of Helsinki. The present study was approved by the UKB under application number 19542.
This paper does not report original code. All the software and code used in this study are publicly available. The code we used is publicly accessible at https://github.com/ddzhang877/epilepsy_proteomics. Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.
Acknowledgments
The data used in this study were obtained from the UK Biobank database. We thank all the researchers and participants in the UKB. J.Y. was supported by grants from the Science and Technology Innovation 2030 Major Projects (2022ZD0211600); National Natural Science Foundation of China (82071201); Shanghai Municipal Science and Technology Major Project (2018SHZDZX01); Research Start-up Fund of Huashan Hospital (2022QD002); Excellence 2025 Talent Cultivation Program at Fudan University (3030277001); Shanghai Talent Development Funding for The Project (2019074); ZHANGJIANG LAB; Tianqiao and Chrissy Chen Institute; the State Key Laboratory of Neurobiology and Frontiers Center for Brain Science of Ministry of Education; Shanghai Center for Brain Science and Brain-Inspired Technology, China; Fudan University, China; Zhangjiang Lab, China; Huashan Hospital, China; National Key R&D Program of China, China; 111 Project, China; and National Natural Science Foundation of China, China. W.C. was supported by grants from the National Key R&D Program of China (no. 2023YFC3605400) and the National Natural Science Foundation of China (no. 82472055 and no. 82071997). J.F. was supported by 111 Project (B18015).
Author contributions
J.Y., J.F., and W.C. designed the study. D.Z., Z.W., and Q.H. drafted the manuscript and accessed and verified the underlying data reported in the manuscript. D.Z., Z.W., Q.H., Z.L., and X.H. implemented model development and statistical analyses. P.G. and Y. Zhao. contributed to visualization. D.Z., Z.W., Q.H., and Y. Zhang. supported the study and contributed to the discussion of the results. Y. Zhang., L.T., and J.Y. were involved in revising the article. All authors had full access to all the study data and accepted the responsibility to submit it for publication.
Declaration of interests
The authors declare no competing interests.
STAR★Methods
Key resources table
Experimental model and study participant details
Human subject characteristics
The UKB is a large-scale prospective cohort including about 500,000 participants with comprehensive information on baseline characteristics, disease diagnoses, genetic information, brain imaging, metabolomics, and proteomics. The cohort recruited participants aged 37–73 (field 21022) between 2006 and 2010 and has been in continuous follow-up for 17 years. Other basic characteristics of the participants used in this study included sex (field 31), ethnicity (field 21000), qualification (field 6138), BMI (field 21001), socioeconomic status (field 22189, townsend deprivation index at recruitment), smoking status (field 20116), and alcohol drinker status (field 20117). The UKB has research tissue bank approval from the North West Multi-Center Research Ethics Committee (11/NW/0382). All participants offered their informed consent. This study was conducted under approved UKB application number 19542.
Epilepsy diagnoses were ascertained according to the initial occurrence records of the International Classification of Diseases version 10 (ICD-10), codes G40 and G41 (fields 131048 and 131050). Participants with epilepsy at baseline or only self-reported epilepsy, and with missing proteomic information were excluded from this study. In the primary analysis, a total of 52,372 individuals aged 56.8 years were included, of whom 28,275 participants (54.0%) were female, and 48,830 participants were White (Table 1). During the follow-up years, 440 individuals were identified with newly onset epilepsy and categorized into the case group. To reduce the potential impact of gender on the findings, we have incorporated gender as a covariate and conducted subgroup analyses stratified by gender.
Method details
Blood proteomics
The UKB collected baseline blood samples from participants and characterized the plasma proteome profiles of 54,306 subjects through a collaboration with thirteen biopharmaceutical companies. Blood samples were meticulously collected in EDTA-coated tubes and centrifuged at 2500g for 10 min at 4°C to separate and extract blood plasma efficiently for each participant. The collected fluid was evenly apportioned into aliquots and stored at −80°C. Finally, the quantification of the plasma samples was performed by Olink Analytical Services in Sweden using the antibody-based Olink Explore 3072 Proximity Extension Assay, measuring 2,941 protein analytes and capturing 2,923 unique proteins across four panels encompassing cardiometabolic, inflammation, neurology, and oncology proteins. The inter-plate and intra-plate coefficients of variation for all Olink panels were less than 20% and 10% respectively. Protein levels were quantified and reported as Normalized Protein Expression (NPX) values. Stringent quality control of the Olink NPX dataset was implemented and the detailed procedure was described in https://biobank.ndph.ox.ac.uk/showcase/ukb/docs/PPP_Phase_1_QC_dataset_companion_doc.pdf. A total of 2920 proteins were included in our study after removing samples with more than 30% missing. For missing values of the remaining proteins, we analyzed them in subsequent analyses using statistical functions (e.g., `lm()`, `coxph()`) in R, which automatically omit any rows with missing values in the variables included in the model.
Brain structure
Our study used structural magnetic resonance imaging (MRI) data from the brain, which was assessed using a standard Siemens Skyra 3T scanner equipped with a 32-channel head coil. Detailed quality control, acquisition, and image processing pipelines can be queried and obtained from previous studies66,67 and the UKB official website (https://biobank.ndph.ox.ac.uk/showcase/showcase/docs/brain_mri.pdf). Cortical and subcortical gray matter volumes,68,69 T1 image, and imaging-derived phenotypes (IDPs)70 were extracted and processed using FreeSurfer software. Gray matter microstructure was measured by four metrics in gray matter regions of the brain, including cortical surface area, volume, and subcortical volume. A total of 68 cortical regions (category 192) and 16 subcortical regions (category 190) defined by the Desikan-Killiany and the ASEG atlas were employed in this study.70 The total volume of deep WMH (field 24486) was calculated based on T1 and T2 FLAIR images. The diffusion-tensor-imaging (DTI) data were preprocessed and analyzed using the FMRIB software library. Then AutoPtx plugin was utilized to map 27 major white matter fiber tracts.71,72 FA and MD were obtained for each fiber bundle using the tool DTIFIT.
Assessment of stress-related events
Stress-related events were assessed based on several online follow-up questionnaires from the UKB according to previous studies, categorized mainly into childhood stress and adulthood stress.73,74,75 Childhood stress was estimated primarily by asking participants about the extent to which they were loved and cared for while growing up (fields 20487–20489). Adulthood stress was then assessed by asking participants about potentially abusive relationships experienced after the age of sixteen (fields 20521–20525). The detailed questions on childhood and adulthood stress are shown in Table S1. Responses to the survey questions were measured on a five-point scale, and we reverse-coded some items to ensure consistency in the directionality (Table S1). The scores for childhood and adulthood stress were the sum of the five questions answered at different growth periods. We further derived a lifetime stress score by aggregating the childhood and adult stress scores.
Quantification and statistical analysis
Trajectory modeling and clustering
Plasma protein trajectories associated with epilepsy onset were modeled as a function of follow-up years. Individuals with epilepsy were matched to controls in a 1:10 ratio58,76 using the “MatchIt” package in R, based on all covariates included in the survival analysis (age, gender, ethnicity, education, socioeconomic status, BMI, smoking status, and alcohol consumption). Using the matched controls, the mean and standard deviation (sd) were determined for each protein level at different time points and the protein levels were then converted to z-scores. Then we applied the LOESS regression model77,78 to simulate the association between each protein and epilepsy progression and selected proteins above the 0.25 Z score level.58 Finally, the hclust function was applied to hierarchically cluster these proteins and classify the predicted trajectories into three groups. Pairwise differences between LOESS predictions were assessed using the Euclidean distance, while the Ward method was used for hierarchical clustering.
Functional enrichment analysis and PPI network
The KEGG and GO analyses were performed using the Database for Annotation Visualisation and Integrated Discovery (DAVID)63,79 database to evaluate the biological properties of epilepsy-associated proteins. The cut-off conditions were set as p-value <0.05 after FDR correction.
We also performed phenotypic enrichment analysis using the MGI database18,64 (http://www.informatics.jax.org/) to investigate the biological significance of epilepsy-associated proteins. There are 19,326 genes with phenotypic annotations in the database, and we used 2,406 genes encoding the tested proteins as the background gene set. Fisher’s exact test was utilized to evaluate the significance of the differences in the proportions of genes associated with certain phenotypes in the gene set and the proportion of related genes in the background gene set. In addition, we investigated the upstream transcriptional regulators of the epilepsy-associated proteins using the TRRUST from Metaspace,65,80 which contain data on 8,444 transcription factor-target regulatory relationships.
The PPI network was meticulously assembled utilizing STRING23 and graphically represented through Cytoscape software (v3.10.1).61 The physical subnetwork was selected by STRING. The cytoHubba24 plugin in Cytoscape was used to pinpoint the hub genes within the PPI network.
Polygenic risk scoring
Genotype data were obtained from 500,000 participants publicly available on the official UKB website. The detailed genotyping and the quality control process of the UKB cohort have been mentioned in a previous study.81 We excluded SNPs with call rates <95%, minor allele frequency <0.01, deviation from the Hardy-Weinberg equilibrium with p < 1 × 10−6, and not outliers in heterozygosity and missing rates, no sex chromosome aneuploidy, of British ancestry, with no more than ten putative third-degree relatives in the kinship table, and used them for the calculation of genetic principal components (PCs). To exclude the effect of sample overlap, we used summary statistics from genome-wide association analysis from the FinnGen study.82 PRSice60 software was used to generate polygenic risk scores with computational parameters set to --clump-kb 250kb, --clump-p 1.0 --clump-r2 0.1. We calculated PRS at five p-value thresholds (pT), including p < 0.005, p < 0.05, p < 0.1, p < 0.5, and p < 1.
Statistical analyses
The multiple Cox proportional hazard regression model was used to explore the association between 2,920 plasma protein NPX values and incident epilepsy. The Cox model was adjusted for age, gender, ethnicity, qualification, BMI, socioeconomic, smoking status, and alcohol drinker status. To reduce potential confounding from comorbidities in protein associations, we further adjusted for diabetes and kidney function. Kidney function was assessed using the eGFR, which was calculated from serum creatinine (field 30700) and cystatin C (field 30720) using the method described by Inker et al.83 According to the latest grouping criteria of KDIGO 2024,84 eGFR (ml/min per 1.73 m2) was categorized into six groups: G1 (≥90, normal or high), G2 (60–89, mildly decreased), G3a (45–59, mildly to moderately decreased), G3b (30–44, moderately to severely decreased), G4 (15–29, severely decreased), and G5 (<15, kidney failure). Subsequently, we conducted subgroup analyses by age, gender, diabetes, eGFR stages, follow-up periods, and epilepsy subtypes. The classification of epilepsy subtypes is based on the ICD-10 (field 41270). To address potential delayed diagnosis and reverse causation,85 we performed a sensitivity analysis by removing epilepsy cases diagnosed within two years after baseline. In addition, we further performed linear regression analysis to explore the association between proteins and epilepsy diagnosed at baseline to validate the longitudinal results. The employed covariates were consistent with those in the Cox model. The Bonferroni correction method was used to test significant associations of survival analyses (p < 0.05/2920). For the Cox regression model, we calculated hazard ratios (HRs), 95% confidence intervals (CIs), and p-values.
The linear regression model was applied to investigate the associations of selected epilepsy-associated proteins with brain structure, adjusted for assessment center (field 54), total intracranial volume (TIV, field 26521), and the same covariates as in survival analysis. However, for DTI and brain cortical surface, the linear model was not adjusted for TIV. Brain structure was considered as dependent variables, while proteins were considered as independent variables.
The linear model was also utilized to examine the associations between proteins and stress-related events, adjusting for the same covariates as in the survival analysis. In addition, the associations between selected proteins and PRS of epilepsy were examined using the linear regression model. The model was adjusted for the top 20 PCs and the same covariates as in the Cox model except for ethnicity. The stress-related events and the PRS of epilepsy were separately considered as the independent variable, with protein levels treated as the dependent variable. Ultimately, we conducted SEM analyses to evaluate the intermediary role of the top 3 protein levels in the association between stress-related events and the propensity for epilepsy onset. Statistical significance was ascertained at a two-tailed p-value <0.05 after correction for multiple tests. The regression analyses were performed using R statistical software (version 4.2.2) and R package ‘survival’.
Epilepsy prediction
The study investigated the potential of plasma proteins to predict epilepsy incidence through a two-stage machine learning approach, encompassing protein selection and model development. In our method, the training set (80% of the data) was used for both feature selection and model development to prevent data leakage and ensure independence from the testing set. We constructed models to determine whether baseline healthy participants would develop epilepsy within an approximately ten-year follow-up. Additionally, the analysis included predictions within ten years, over ten years, and the entire dataset.
The detection process of potential plasma proteins encompassed arranging proteins in accordance with their predictive input using the intrinsic algorithm of the LightGBM86 model. The top 100 proteins were chosen, reordered, and subjected to a progressive forward protein selection procedure. We employed the LightGBM classifier for model implementation and conducted a randomized hyperparameter search within a 1,000-candidate field. The optimal parameter set was selected based on the area under the receiver operating characteristic (ROC) curve (AUC). The implementation of the model leveraged the LightGBM and Scikit-learn87 libraries within the Python environment, with effective hyperparameters such as ‘max_depth’: 12, ‘n_estimators’: 685, and ‘num_leaves’: 12.
The development and assessment of the model were executed by a randomized data split, allocating 80% for training and reserving 20% for testing using the StratifiedKFold function from the scikit-learn package (random state = 2022) to ensure balanced class distribution and reproducibility. The AUC quantified the efficacy of the model, and confidence intervals were estimated via a bootstrap technique over 1,000 iterations. To further explore the specificity of the predictive model’s protein panel for epilepsy risk prediction, we applied the panel to predict the risk of several other common neurological (dementia, PD, sleep disorders, and stroke) and psychiatric disorders (anxiety, bipolar, major depressive disorder, and schizophrenia). Additionally, we conducted predictive modeling of proteins for focal epilepsy and generalized epilepsy risks to determine whether different etiologies and syndromes exhibit unique patterns.
To further assess the generalizability and robustness of our model, we conducted additional analyses for predictions within 10 years and over 10 years. For the within-10-year analysis, individuals exhibiting epilepsy symptoms over 10 years were classified as healthy. Conversely, individuals with epilepsy within 10 years were excluded from the over 10-year analysis to investigate the predictive role of these proteins in those developing epilepsy more than 10 years from the baseline.
Drug ability analyses
The Genome for Repositioning (GREP)62 software was used to investigate the enrichment of genes encoding plasma proteins in the ICD-10 category and to capture potentially repositionable drugs targeting genes. For enrichment analysis, we chose target genes associated with proteins exhibiting FDR-corrected p-value <0.05, and then enriched these genes for approved or under-investigated drugs in DrugBank and the Therapeutic Target Database.
Published: November 17, 2025
Footnotes
Supplemental information can be found online at https://doi.org/10.1016/j.xcrm.2025.102330.
Contributor Information
Lan Tan, Email: dr.tanlan@163.com.
Jintai Yu, Email: jintai_yu@fudan.edu.cn.
Supplemental information
References
- 1.GWAS meta-analysis of over 29,000 people with epilepsy identifies 26 risk loci and subtype-specific genetic architecture. Nat. Genet. 2023;55:1471–1482. doi: 10.1038/s41588-023-01485-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Reid C.A., Berkovic S.F., Petrou S. Mechanisms of human inherited epilepsies. Prog. Neurobiol. 2009;87:41–57. doi: 10.1016/j.pneurobio.2008.09.016. [DOI] [PubMed] [Google Scholar]
- 3.World Health O. World Health Organization; 2019. Epilepsy: A Public Health Imperative. [Google Scholar]
- 4.Scheffer I.E., Berkovic S., Capovilla G., Connolly M.B., French J., Guilhoto L., Hirsch E., Jain S., Mathern G.W., Moshé S.L., et al. ILAE classification of the epilepsies: Position paper of the ILAE Commission for Classification and Terminology. Epilepsia. 2017;58:512–521. doi: 10.1111/epi.13709. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Verrotti A., Carrozzino D., Milioni M., Minna M., Fulcheri M. Epilepsy and its main psychiatric comorbidities in adults and children. J. Neurol. Sci. 2014;343:23–29. doi: 10.1016/j.jns.2014.05.043. [DOI] [PubMed] [Google Scholar]
- 6.Gaitatzis A., Sisodiya S.M., Sander J.W. The somatic comorbidity of epilepsy: a weighty but often unrecognized burden. Epilepsia. 2012;53:1282–1293. doi: 10.1111/j.1528-1167.2012.03528.x. [DOI] [PubMed] [Google Scholar]
- 7.Duncan J.S., Sander J.W., Sisodiya S.M., Walker M.C. Adult epilepsy. Lancet (London, England) 2006;367:1087–1100. doi: 10.1016/s0140-6736(06)68477-8. [DOI] [PubMed] [Google Scholar]
- 8.Motika P.V., Spencer D.C. Treatment of Epilepsy in the Elderly. Curr. Neurol. Neurosci. Rep. 2016;16:96. doi: 10.1007/s11910-016-0696-8. [DOI] [PubMed] [Google Scholar]
- 9.Geyer P.E., Holdt L.M., Teupser D., Mann M. Revisiting biomarker discovery by plasma proteomics. Mol. Syst. Biol. 2017;13:942. doi: 10.15252/msb.20156297. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Pires G., Leitner D., Drummond E., Kanshin E., Nayak S., Askenazi M., Faustin A., Friedman D., Debure L., Ueberheide B., et al. Proteomic differences in the hippocampus and cortex of epilepsy brain tissue. Brain Commun. 2021;3 doi: 10.1093/braincomms/fcab021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.He S., Wang Q., He J., Pu H., Yang W., Ji J. Proteomic analysis and comparison of the biopsy and autopsy specimen of human brain temporal lobe. Proteomics. 2006;6:4987–4996. doi: 10.1002/pmic.200600078. [DOI] [PubMed] [Google Scholar]
- 12.Walker A., Russmann V., Deeg C.A., von Toerne C., Kleinwort K.J.H., Szober C., Rettenbeck M.L., von Rüden E.L., Goc J., Ongerth T., et al. Proteomic profiling of epileptogenesis in a rat model: Focus on inflammation. Brain Behav. Immun. 2016;53:138–158. doi: 10.1016/j.bbi.2015.12.007. [DOI] [PubMed] [Google Scholar]
- 13.Leitner D.F., Kanshin E., Askenazi M., Faustin A., Friedman D., Devore S., Ueberheide B., Wisniewski T., Devinsky O. Raphe and ventrolateral medulla proteomics in epilepsy and sudden unexpected death in epilepsy. Brain Commun. 2022;4 doi: 10.1093/braincomms/fcac186. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Gürol G., Demiralp D.Ö., Yılmaz A.K., Akman Ö., Ateş N., Karson A. Comparative Proteomic Approach in Rat Model of Absence Epilepsy. J. Mol. Neurosci. 2015;55:632–643. doi: 10.1007/s12031-014-0402-8. [DOI] [PubMed] [Google Scholar]
- 15.Danış O., Demir S., Günel A., Aker R.G., Gülçebi M., Onat F., Ogan A. Changes in intracellular protein expression in cortex, thalamus and hippocampus in a genetic rat model of absence epilepsy. Brain Res. Bull. 2011;84:381–388. doi: 10.1016/j.brainresbull.2011.02.002. [DOI] [PubMed] [Google Scholar]
- 16.Sun B.B., Chiou J., Traylor M., Benner C., Hsu Y.H., Richardson T.G., Surendran P., Mahajan A., Robins C., Vasquez-Grinnell S.G., et al. Plasma proteomic associations with genetics and health in the UK Biobank. Nature. 2023;622:329–338. doi: 10.1038/s41586-023-06592-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Gan Y.H., Ma L.Z., Zhang Y., You J., Guo Y., He Y., Wang L.B., He X.Y., Li Y.Z., Dong Q., et al. Large-scale proteomic analyses of incident Parkinson's disease reveal new pathophysiological insights and potential biomarkers. Nat. Aging. 2025;5:642–657. doi: 10.1038/s43587-025-00818-0. [DOI] [PubMed] [Google Scholar]
- 18.Blake J.A., Baldarelli R., Kadin J.A., Richardson J.E., Smith C.L., Bult C.J., Mouse Genome Database Group Mouse Genome Database (MGD): Knowledgebase for mouse-human comparative biology. Nucleic Acids Res. 2021;49:D981–D987. doi: 10.1093/nar/gkaa1083. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Kim K.M., Zhang Y., Kim B.Y., Jeong S.J., Lee S.A., Kim G.D., Dritschilo A., Jung M. The p65 subunit of nuclear factor-kappaB is a molecular target for radiation sensitization of human squamous carcinoma cells. Mol. Cancer Ther. 2004;3:693–698. [PubMed] [Google Scholar]
- 20.Vizcaíno C., Mansilla S., Portugal J. Sp1 transcription factor: A long-standing target in cancer chemotherapy. Pharmacol. Ther. 2015;152:111–124. doi: 10.1016/j.pharmthera.2015.05.008. [DOI] [PubMed] [Google Scholar]
- 21.Hashimoto R., Ohi K., Yasuda Y., Fukumoto M., Yamamori H., Takahashi H., Iwase M., Okochi T., Kazui H., Saitoh O., et al. Variants of the RELA Gene are Associated with Schizophrenia and their Startle Responses. Neuropsychopharmacology. 2011;36:1921–1931. doi: 10.1038/npp.2011.78. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Li Y., Gao J., Kamran M., Harmacek L., Danhorn T., Leach S.M., O’Connor B.P., Hagman J.R., Huang H. GATA2 regulates mast cell identity and responsiveness to antigenic stimulation by promoting chromatin remodeling at super-enhancers. Nat. Commun. 2021;12:494. doi: 10.1038/s41467-020-20766-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Szklarczyk D., Gable A.L., Lyon D., Junge A., Wyder S., Huerta-Cepas J., Simonovic M., Doncheva N.T., Morris J.H., Bork P., et al. STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 2019;47:D607–D613. doi: 10.1093/nar/gky1131. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Chin C.H., Chen S.H., Wu H.H., Ho C.W., Ko M.T., Lin C.Y. cytoHubba: identifying hub objects and sub-networks from complex interactome. BMC Syst. Biol. 2014;8:S11. doi: 10.1186/1752-0509-8-s4-s11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Hou Y., Chen Z., Wang L., Deng Y., Liu G., Zhou Y., Shi H., Shi X., Jiang Q. Characterization of Immune-Related Genes and Immune Infiltration Features in Epilepsy by Multi-Transcriptome Data. J. Inflamm. Res. 2022;15:2855–2876. doi: 10.2147/jir.S360743. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Guo Y., You J., Zhang Y., Liu W.S., Huang Y.Y., Zhang Y.R., Zhang W., Dong Q., Feng J.F., Cheng W., Yu J.T. Plasma proteomic profiles predict future dementia in healthy adults. Nat. Aging. 2024;4:247–260. doi: 10.1038/s43587-023-00565-0. [DOI] [PubMed] [Google Scholar]
- 27.Banote R.K., Larsson D., Berger E., Kumlien E., Zelano J. Quantitative proteomic analysis to identify differentially expressed proteins in patients with epilepsy. Epilepsy Res. 2021;174 doi: 10.1016/j.eplepsyres.2021.106674. [DOI] [PubMed] [Google Scholar]
- 28.Kojima N., Ishibashi H., Obata K., Kandel E.R. Higher seizure susceptibility and enhanced tyrosine phosphorylation of N-methyl-D-aspartate receptor subunit 2B in fyn transgenic mice. Learn. Mem. 1998;5:429–445. [PMC free article] [PubMed] [Google Scholar]
- 29.Glazyrin Y.E., Veprintsev D.V., Timechko E.E., Minic Z., Zamay T.N., Dmitrenko D.V., Berezovski M.V., Kichkailo A.S. Comparative Proteomic Profiling of Blood Plasma Revealed Marker Proteins Involved in Temporal Lobe Epilepsy. Int. J. Mol. Sci. 2024;25 doi: 10.3390/ijms25147935. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Panina Y.S., Timechko E.E., Usoltseva A.A., Yakovleva K.D., Kantimirova E.A., Dmitrenko D.V. Biomarkers of Drug Resistance in Temporal Lobe Epilepsy in Adults. Metabolites. 2023;13 doi: 10.3390/metabo13010083. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Gledhill J.M., Brand E.J., Pollard J.R., St Clair R.D., Wallach T.M., Crino P.B. Association of Epileptic and Nonepileptic Seizures and Changes in Circulating Plasma Proteins Linked to Neuroinflammation. Neurology. 2021;96:e1443–e1452. doi: 10.1212/wnl.0000000000011552. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.García-Peral C., Ledesma M.M., Herrero-Turrión M.J., Gómez-Nieto R., Castellano O., López D.E. Proteomic and Bioinformatic Tools to Identify Potential Hub Proteins in the Audiogenic Seizure-Prone Hamster GASH/Sal. Diagnostics. 2023;13 doi: 10.3390/diagnostics13061048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Yamaguchi H., Nishiyama M., Tomioka K., Hongo H., Tokumoto S., Ishida Y., Toyoshima D., Kurosawa H., Nozu K., Maruyama A., et al. Growth and differentiation factor-15 as a potential prognostic biomarker for status-epilepticus-associated-with-fever: A pilot study. Brain Dev. 2022;44:210–220. doi: 10.1016/j.braindev.2021.10.003. [DOI] [PubMed] [Google Scholar]
- 34.Sitovskaya D., Zabrodskaya Y., Parshakov P., Sokolova T., Kudlay D., Starshinova A., Samochernykh K. Expression of Cytoskeletal Proteins (GFAP, Vimentin), Proapoptotic Protein (Caspase-3) and Protective Protein (S100) in the Epileptic Focus in Adults and Children with Drug-Resistant Temporal Lobe Epilepsy Associated with Focal Cortical Dysplasia. Int. J. Mol. Sci. 2023;24 doi: 10.3390/ijms241914490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Robel S., Buckingham S.C., Boni J.L., Campbell S.L., Danbolt N.C., Riedemann T., Sutor B., Sontheimer H. Reactive astrogliosis causes the development of spontaneous seizures. J. Neurosci. 2015;35:3330–3345. doi: 10.1523/jneurosci.1574-14.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Wang J., Lin Z.J., Liu L., Xu H.Q., Shi Y.W., Yi Y.H., He N., Liao W.P. Epilepsy-associated genes. Seizure. 2017;44:11–20. doi: 10.1016/j.seizure.2016.11.030. [DOI] [PubMed] [Google Scholar]
- 37.Rubboli G., Franceschetti S., Berkovic S.F., Canafoglia L., Gambardella A., Dibbens L.M., Riguzzi P., Campieri C., Magaudda A., Tassinari C.A., Michelucci R. Clinical and neurophysiologic features of progressive myoclonus epilepsy without renal failure caused by SCARB2 mutations. Epilepsia. 2011;52:2356–2363. doi: 10.1111/j.1528-1167.2011.03307.x. [DOI] [PubMed] [Google Scholar]
- 38.Zhang K., Lu J., Fang F., Zhang Y., Yu J., Tao Y., Liu W., Lu L., Zhang Z., Chu X., et al. Super Enhancer Regulatory Gene FYB1 Promotes the Progression of T Cell Acute Lymphoblastic Leukemia by Activating IGLL1. J. Immunol. Res. 2023;2023 doi: 10.1155/2023/3804605. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Miyata M., Mandai K., Maruo T., Sato J., Shiotani H., Kaito A., Itoh Y., Wang S., Fujiwara T., Mizoguchi A., et al. Localization of nectin-2δ at perivascular astrocytic endfoot processes and degeneration of astrocytes and neurons in nectin-2 knockout mouse brain. Brain Res. 2016;1649:90–101. doi: 10.1016/j.brainres.2016.08.023. [DOI] [PubMed] [Google Scholar]
- 40.Shiotani H., Miyata M., Itoh Y., Wang S., Kaito A., Mizoguchi A., Yamasaki M., Watanabe M., Mandai K., Mochizuki H., Takai Y. Localization of nectin-2α at the boundary between the adjacent somata of the clustered cholinergic neurons and its regulatory role in the subcellular localization of the voltage-gated A-type K(+) channel Kv4.2 in the medial habenula. J. Comp. Neurol. 2018;526:1527–1549. doi: 10.1002/cne.24425. [DOI] [PubMed] [Google Scholar]
- 41.Shiotani H., Miyata M., Kameyama T., Mandai K., Yamasaki M., Watanabe M., Mizutani K., Takai Y. Nectin-2α is localized at cholinergic neuron dendrites and regulates synapse formation in the medial habenula. J. Comp. Neurol. 2021;529:450–477. doi: 10.1002/cne.24958. [DOI] [PubMed] [Google Scholar]
- 42.Segarra M., Aburto M.R., Hefendehl J., Acker-Palmer A. Neurovascular Interactions in the Nervous System. Annu. Rev. Cell Dev. Biol. 2019;35:615–635. doi: 10.1146/annurev-cellbio-100818-125142. [DOI] [PubMed] [Google Scholar]
- 43.Mizutani K., Miyata M., Shiotani H., Kameyama T., Takai Y. Nectin-2 in general and in the brain. Mol. Cell. Biochem. 2022;477:167–180. doi: 10.1007/s11010-021-04241-y. [DOI] [PubMed] [Google Scholar]
- 44.Sun J., Jiang T., Gu F., Ma D., Liang J. TMT-Based Proteomic Analysis of Plasma from Children with Rolandic Epilepsy. Dis. Markers. 2020;2020 doi: 10.1155/2020/8840482. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Saengow V.E., Chiangjong W., Khongkhatithum C., Changtong C., Chokchaichamnankit D., Weeraphan C., Kaewboonruang P., Thampratankul L., Manuyakorn W., Hongeng S., et al. Proteomic analysis reveals plasma haptoglobin, interferon-γ, and interleukin-1β as potential biomarkers of pediatric refractory epilepsy. Brain Dev. 2021;43:431–439. doi: 10.1016/j.braindev.2020.11.001. [DOI] [PubMed] [Google Scholar]
- 46.Fang Y., Jiang Q., Li S., Zhu H., Xu R., Song N., Ding X., Liu J., Chen M., Song M., et al. Opposing functions of β-arrestin 1 and 2 in Parkinson's disease via microglia inflammation and Nprl3. Cell Death Differ. 2021;28:1822–1836. doi: 10.1038/s41418-020-00704-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Agrawal P.B., Joshi M., Marinakis N.S., Schmitz-Abe K., Ciarlini P.D.S.C., Sargent J.C., Markianos K., De Girolami U., Chad D.A., Beggs A.H. Expanding the phenotype associated with the NEFL mutation: neuromuscular disease in a family with overlapping myopathic and neurogenic findings. JAMA Neurol. 2014;71:1413–1420. doi: 10.1001/jamaneurol.2014.1432. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Quiroz Y.T., Zetterberg H., Reiman E.M., Chen Y., Su Y., Fox-Fuller J.T., Garcia G., Villegas A., Sepulveda-Falla D., Villada M., et al. Plasma neurofilament light chain in the presenilin 1 E280A autosomal dominant Alzheimer's disease kindred: a cross-sectional and longitudinal cohort study. Lancet Neurol. 2020;19:513–521. doi: 10.1016/s1474-4422(20)30137-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Kotaich F., Caillol D., Bomont P. Neurofilaments in health and Charcot-Marie-Tooth disease. Front. Cell Dev. Biol. 2023;11 doi: 10.3389/fcell.2023.1275155. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Reilly M.M. NEFL-related Charcot-Marie-tooth disease: an unraveling story. Ann. Neurol. 2009;66:714–716. doi: 10.1002/ana.21848. [DOI] [PubMed] [Google Scholar]
- 51.Akel S., Asztely F., Banote R.K., Axelsson M., Zetterberg H., Zelano J. Neurofilament light, glial fibrillary acidic protein, and tau in a regional epilepsy cohort: High plasma levels are rare but related to seizures. Epilepsia. 2023;64:2690–2700. doi: 10.1111/epi.17713. [DOI] [PubMed] [Google Scholar]
- 52.Lockhart S.M., Saudek V., O’Rahilly S. GDF15: A Hormone Conveying Somatic Distress to the Brain. Endocr. Rev. 2020;41 doi: 10.1210/endrev/bnaa007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Wang D., Day E.A., Townsend L.K., Djordjevic D., Jørgensen S.B., Steinberg G.R. GDF15: emerging biology and therapeutic applications for obesity and cardiometabolic disease. Nat. Rev. Endocrinol. 2021;17:592–607. doi: 10.1038/s41574-021-00529-7. [DOI] [PubMed] [Google Scholar]
- 54.Ji F., Chai Y.L., Liu S., Kan C.N., Ong M., Richards A.M., Tan B.Y., Venketasubramanian N., Pasternak O., Chen C., et al. Associations of Blood Cardiovascular Biomarkers With Brain Free Water and Its Relationship to Cognitive Decline: A Diffusion-MRI Study. Neurology. 2023;101:e151–e163. doi: 10.1212/wnl.0000000000207401. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Tsygankova P.G., Itkis Y.S., Krylova T.D., Kurkina M.V., Bychkov I.O., Ilyushkina A.A., Zabnenkova V.V., Mikhaylova S.V., Pechatnikova N.L., Sheremet N.L., Zakharova E.Y. Plasma FGF-21 and GDF-15 are elevated in different inherited metabolic diseases and are not diagnostic for mitochondrial disorders. J. Inherit. Metab. Dis. 2019;42:918–933. doi: 10.1002/jimd.12142. [DOI] [PubMed] [Google Scholar]
- 56.McDermott M.F., Aksentijevich I., Galon J., McDermott E.M., Ogunkolade B.W., Centola M., Mansfield E., Gadina M., Karenko L., Pettersson T., et al. Germline mutations in the extracellular domains of the 55 kDa TNF receptor, TNFR1, define a family of dominantly inherited autoinflammatory syndromes. Cell. 1999;97:133–144. doi: 10.1016/s0092-8674(00)80721-7. [DOI] [PubMed] [Google Scholar]
- 57.Miller J.W. Inflammation as a Target for Epilepsy Therapy. Neurology. 2021;97:845–846. doi: 10.1212/WNL.0000000000012768. [DOI] [PubMed] [Google Scholar]
- 58.Zhang Y., Guo Y., He Y., You J., Zhang Y., Wang L., Chen S., He X., Yang L., Huang Y., et al. Large-scale proteomic analyses of incident Alzheimer's disease reveal new pathophysiological insights and potential therapeutic targets. Mol. Psychiatry. 2025;30:2347–2361. doi: 10.1038/s41380-024-02840-x. [DOI] [PubMed] [Google Scholar]
- 59.You J., Guo Y., Zhang Y., Kang J.J., Wang L.B., Feng J.F., Cheng W., Yu J.T. Plasma proteomic profiles predict individual future health risk. Nat. Commun. 2023;14:7817. doi: 10.1038/s41467-023-43575-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Choi S.W., O'Reilly P.F. PRSice-2: Polygenic Risk Score software for biobank-scale data. GigaScience. 2019;8 doi: 10.1093/gigascience/giz082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Shannon P., Markiel A., Ozier O., Baliga N.S., Wang J.T., Ramage D., Amin N., Schwikowski B., Ideker T. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13:2498–2504. doi: 10.1101/gr.1239303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Sakaue S., Okada Y. GREP: genome for REPositioning drugs. Bioinformatics. 2019;35:3821–3823. doi: 10.1093/bioinformatics/btz166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Huang D.W., Sherman B.T., Lempicki R.A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 2009;4:44–57. doi: 10.1038/nprot.2008.211. [DOI] [PubMed] [Google Scholar]
- 64.Baldarelli R.M., Smith C.M., Finger J.H., Hayamizu T.F., McCright I.J., Xu J., Shaw D.R., Beal J.S., Blodgett O., Campbell J., et al. The mouse Gene Expression Database (GXD): 2021 update. Nucleic Acids Res. 2021;49:D924–D931. doi: 10.1093/nar/gkaa914. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Zhou Y., Zhou B., Pache L., Chang M., Khodabakhshi A.H., Tanaseichuk O., Benner C., Chanda S.K. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat. Commun. 2019;10:1523. doi: 10.1038/s41467-019-09234-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Alfaro-Almagro F., Jenkinson M., Bangerter N.K., Andersson J.L.R., Griffanti L., Douaud G., Sotiropoulos S.N., Jbabdi S., Hernandez-Fernandez M., Vallee E., et al. Image processing and Quality Control for the first 10,000 brain imaging datasets from UK Biobank. Neuroimage. 2018;166:400–424. doi: 10.1016/j.neuroimage.2017.10.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Miller K.L., Alfaro-Almagro F., Bangerter N.K., Thomas D.L., Yacoub E., Xu J., Bartsch A.J., Jbabdi S., Sotiropoulos S.N., Andersson J.L.R., et al. Multimodal population brain imaging in the UK Biobank prospective epidemiological study. Nat. Neurosci. 2016;19:1523–1536. doi: 10.1038/nn.4393. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Fischl B., Salat D.H., Busa E., Albert M., Dieterich M., Haselgrove C., van der Kouwe A., Killiany R., Kennedy D., Klaveness S., et al. Whole brain segmentation: automated labeling of neuroanatomical structures in the human brain. Neuron. 2002;33:341–355. doi: 10.1016/s0896-6273(02)00569-x. [DOI] [PubMed] [Google Scholar]
- 69.Dale A.M., Fischl B., Sereno M.I. Cortical surface-based analysis. I. Segmentation and surface reconstruction. Neuroimage. 1999;9:179–194. doi: 10.1006/nimg.1998.0395. [DOI] [PubMed] [Google Scholar]
- 70.Desikan R.S., Ségonne F., Fischl B., Quinn B.T., Dickerson B.C., Blacker D., Buckner R.L., Dale A.M., Maguire R.P., Hyman B.T., et al. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage. 2006;31:968–980. doi: 10.1016/j.neuroimage.2006.01.021. [DOI] [PubMed] [Google Scholar]
- 71.de Groot M., Vernooij M.W., Klein S., Ikram M.A., Vos F.M., Smith S.M., Niessen W.J., Andersson J.L.R. Improving alignment in Tract-based spatial statistics: evaluation and optimization of image registration. Neuroimage. 2013;76:400–411. doi: 10.1016/j.neuroimage.2013.03.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Cox S.R., Ritchie S.J., Tucker-Drob E.M., Liewald D.C., Hagenaars S.P., Davies G., Wardlaw J.M., Gale C.R., Bastin M.E., Deary I.J. Ageing and brain white matter structure in 3,513 UK Biobank participants. Nat. Commun. 2016;7 doi: 10.1038/ncomms13629. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Glaesmer H., Schulz A., Häuser W., Freyberger H.J., Brähler E., Grabe H.J. [The childhood trauma screener (CTS) - development and validation of cut-off-scores for classificatory diagnostics] Psychiatr. Prax. 2013;40:220–226. doi: 10.1055/s-0033-1343116. [DOI] [PubMed] [Google Scholar]
- 74.Gheorghe D.A., Li C., Gallacher J., Bauermeister S. Associations of perceived adverse lifetime experiences with brain structure in UK Biobank participants. J. Child Psychol. Psychiatry. 2021;62:822–830. doi: 10.1111/jcpp.13298. [DOI] [PubMed] [Google Scholar]
- 75.McManus E., Haroon H., Duncan N.W., Elliott R., Muhlert N. The effects of stress across the lifespan on the brain, cognition and mental health: A UK biobank study. Neurobiol. Stress. 2022;18 doi: 10.1016/j.ynstr.2022.100447. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Simonet C., Bestwick J., Jitlal M., Waters S., Ben-Joseph A., Marshall C.R., Dobson R., Marrium S., Robson J., Jacobs B.M., et al. Assessment of Risk Factors and Early Presentations of Parkinson Disease in Primary Care in a Diverse UK Population. JAMA Neurol. 2022;79:359–369. doi: 10.1001/jamaneurol.2022.0003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Lehallier B., Gate D., Schaum N., Nanasi T., Lee S.E., Yousef H., Moran Losada P., Berdnik D., Keller A., Verghese J., et al. Undulating changes in human plasma proteome profiles across the lifespan. Nat. Med. 2019;25:1843–1850. doi: 10.1038/s41591-019-0673-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Piehl N., van Olst L., Ramakrishnan A., Teregulova V., Simonton B., Zhang Z., Tapp E., Channappa D., Oh H., Losada P.M., et al. Cerebrospinal fluid immune dysregulation during healthy brain aging and cognitive impairment. Cell. 2022;185:5028–5039.e13. doi: 10.1016/j.cell.2022.11.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Dennis G., Jr., Sherman B.T., Hosack D.A., Yang J., Gao W., Lane H.C., Lempicki R.A. DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol. 2003;4:P3. [PubMed] [Google Scholar]
- 80.Han H., Cho J.W., Lee S., Yun A., Kim H., Bae D., Yang S., Kim C.Y., Lee M., Kim E., et al. TRRUST v2: an expanded reference database of human and mouse transcriptional regulatory interactions. Nucleic Acids Res. 2018;46:D380–D386. doi: 10.1093/nar/gkx1013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Bycroft C., Freeman C., Petkova D., Band G., Elliott L.T., Sharp K., Motyer A., Vukcevic D., Delaneau O., O'Connell J., et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–209. doi: 10.1038/s41586-018-0579-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Kurki M.I., Karjalainen J., Palta P., Sipilä T.P., Kristiansson K., Donner K.M., Reeve M.P., Laivuori H., Aavikko M., Kaunisto M.A., et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature. 2023;613:508–518. doi: 10.1038/s41586-022-05473-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Inker L.A., Eneanya N.D., Coresh J., Tighiouart H., Wang D., Sang Y., Crews D.C., Doria A., Estrella M.M., Froissart M., et al. New Creatinine- and Cystatin C-Based Equations to Estimate GFR without Race. N. Engl. J. Med. 2021;385:1737–1749. doi: 10.1056/NEJMoa2102953. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.KDIGO 2024 Clinical Practice Guideline for the Evaluation and Management of Chronic Kidney Disease. Kidney Int. 2024;105:S117–S314. doi: 10.1016/j.kint.2023.10.018. [DOI] [PubMed] [Google Scholar]
- 85.Clifton L., Liu X., Collister J.A., Littlejohns T.J., Allen N., Hunter D.J. Assessing the importance of primary care diagnoses in the UK Biobank. Eur. J. Epidemiol. 2024;39:219–229. doi: 10.1007/s10654-023-01095-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Ke G., Meng Q., Finley T., Wang T., Chen W., Ma W., Ye Q., Liu T.-Y. In: Guyon I., Luxburg U.V., Bengio S., Wallach H., Fergus R., Vishwanathan S., Garnett R., editors. Vol. 30. Curran Associates, Inc.; 2017. LightGBM: A Highly Efficient Gradient Boosting Decision Tree; pp. 3146–3154. (Advances in Neural Information Processing Systems). Retrieved from http://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree.pdf. [Google Scholar]
- 87.Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Louppe G., Prettenhofer P., Weiss R., et al. Scikit-learn: Machine Learning in Python. arXiv. 2011 abs/1201.0490 Preprint at. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
This study does not generate new original data. The data used in this study were obtained from the UKB database: https://www.ukbiobank.ac.uk/, and request approval for access. The study was conducted following the Declaration of Helsinki. The present study was approved by the UKB under application number 19542.
This paper does not report original code. All the software and code used in this study are publicly available. The code we used is publicly accessible at https://github.com/ddzhang877/epilepsy_proteomics. Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.






