Abstract
Parkinson’s disease (PD) remains incurable, with a long preclinical phase currently undetectable by existing methods. In the largest proteomic study in neurodegenerative diseases to date, we analyzed blood samples from ~74,000 individuals across discovery and validation cohorts. In the EPIC4PD discovery case-cohort, large-scale profiling of 7,285 proteins (SomaScan-7K) in 4,538 initially unaffected participants (574 incident cases) identified 17 proteins that predict PD up to 28 years before diagnosis. Additional proteins revealed sex-specific effects and time-dependent effect trajectories, capturing disease progression before symptom onset. Replication in three prospective cohorts (n=64,856; 1,034 incident cases) confirmed at least 12 key pre-diagnostic biomarkers with strong evidence, including TPPP2, HPGDS, ALPL, MFAP5, OGFR, ACAD8, TCL1A, GPC4, GSTA3, LCN2, KRAS, and GJA1. Preclinical biomarkers showed 86% concordant effect directions in independent prevalent PD cases (n=2,592; p=1.6×10−19). Furthermore, in the longitudinal Tracking PD cohort (n=794), HPGDS and MFAP5 also predicted cognitive decline. Notably, several of the identified PD biomarkers overlapped with those for incident Alzheimer’s disease and amyotrophic lateral sclerosis, indicating shared molecular signatures. A machine learning-derived protein score improved PD risk prediction in external validation. This extensive proteomics effort identified novel, actionable biomarkers opening new avenues for early PD risk stratification and precision medicine.
Keywords: Parkinson’s disease, neurodegeneration, prediction, proteomics, protein risk score, SomaLogic, aptamers, EPIC
INTRODUCTION
Parkinson’s disease (PD) is the most common movement disorder and the second most common neurodegenerative disease after Alzheimer’s disease (AD)1. It is characterized by progressive, disabling motor and non-motor symptoms, substantially reducing quality of life and life expectancy2,3. By the time PD is clinically diagnosed, significant neuronal loss has already occurred, and available treatments are only symptomatic, with diminishing benefits over time1. Consequently, the disease poses a significant burden not only on patients but also on their families and caregivers. Furthermore, its prevalence is projected to rise sharply, particularly in aging societies, leading to increasing financial and social costs2.
Importantly, non-specific symptoms may begin up to 20 years before diagnosis, suggesting that PD pathophysiology starts even earlier4. This time course points to a critical window for preventive or early therapeutic interventions that could delay or halt disease onset. However, identifying at-risk but clinically unaffected individuals remains a significant challenge. Developing a reliable PD risk score — such as one based on molecular biomarkers — will be essential to address this gap. Such a score would be especially valuable for risk stratification in vulnerable populations, including workers in rural areas with pesticide exposure or individuals with a family history of PD. By identifying those at highest risk, targeted monitoring and the initiation of early clinical and preventive trials could be implemented more effectively. In this context, high-throughput proteomic platforms such as SomaScan® (SomaLogic, Inc.) and Olink® (Olink Proteomics AB) offer new exciting opportunities to identify novel biological signatures of PD development. These methods can quantify thousands of proteins simultaneously and hold strong potential for identifying biomarkers that can detect diseases in their earliest or even preclinical stages5–7. In neurodegenerative diseases interest in blood-based biomarkers has grown, owing to their minimally invasive collection, easier clinical integration, and strong performance of candidate blood biomarkers in AD8. In PD, such markers could enhance early diagnosis, enable risk stratification in clinical trials, and support preventive strategies. However, given PD’s comparatively low prevalence (2–3% in individuals aged 65+)1, prospective cohorts with long-term follow-up are rare. Even more rare are cohorts with baseline blood samples — collected before PD onset — which are required to identify predictive biomarkers. This is why most prior biomarker studies have compared prevalent PD patients to controls, risking reverse causation and limiting predictive value9.
To our knowledge, only one prospective cohort, i.e., the UK Biobank (UKB), has hitherto been used to discover predictive PD biomarkers. Based on this resource, a recent study analyzed 2,920 plasma proteins measured with Olink technology in up to 859 incident PD patients10. Although 38 candidate biomarkers were identified, the findings of that study remain unvalidated in independent prospective cohorts. In addition to closing this gap, our study represents a major advance by nearly tripling the number of protein markers (n=7,285; SomaScan 7K array) analyzed in pre-diagnostic blood samples from ~4,600 participants in a large, well-characterized case-cohort (‘European Investigation into Cancer and Nutrition for Parkinson’s disease’, EPIC4PD)11 within the EPIC study12,13. Overall, EPIC4PD features up to 28 years of follow-up time and 574 specialist-confirmed incident PD diagnoses. Crucially, we externally validated our top biomarkers in ~65,000 individuals, including >1,000 incident PD cases, across three independent prospective cohorts. Overall, this resulted in the largest and most comprehensive investigation of predictive biomarkers for PD to date, both in terms of sample size and proteomic resolution. To assess these markers across the disease timeline — including diagnosis and progression — we also incorporated independent SomaScan 7K data from the Global Neurodegenerative Proteomics Consortium (GNPC; ~300 prevalent cases and ~2,300 controls) and the Tracking PD cohort (~800 cases with progression data). Using the top results from these analysis arms, we also performed enrichment and network analyses to gain insights into biological pathways, and evaluated PD biomarker profiles in AD and amyotrophic lateral sclerosis (ALS) to assess their potential as shared markers of neurodegeneration (Figure 1). Finally, using machine-leaning we constructed and externally validated a predictive protein risk score (PPRS) in EPIC4PD. This score improved risk prediction decades before clinical PD onset. Combined with traditional risk factors and markers from other molecular domains, the biomarkers identified in this study have the potential to substantially advance ultra-early PD risk stratification and diagnosis paving the way for early treatment and prevention.
Figure 1. Visual summary of study design.
This study leveraged plasma proteomics from 4,538 individuals in the ‘European Prospective Investigation into Cancer and Nutrition for Parkinson’s disease’ case-cohort (EPIC4PD) to discover preclinical protein biomarkers of Parkinson’s disease (PD). Using the SomaScan 7K platform, 7,285 aptamers were analyzed across multiple time windows. Identified biomarkers were externally validated in three large cohorts (‘Age, Gene/Environment Susceptibility–Reykjavik Study’ [AGES], ‘Atherosclerosis Risk in Communities’ study [ARIC], and ‘United Kingdom Biobank’ [UKB]) and assessed in clinical PD cases from the ‘Global Neurodegenerative Proteomics Cohort’ [GNPC] and the longitudinal ‘Tracking Parkinson’s disease’ cohort. Functional characterization included pathway and cell-type enrichment, protein-protein networks, cross-trait analyses with Alzheimer’s disease (AD) and amyotrophic lateral sclerosis (ALS). Predictive protein risk scores were calculated in EPIC4PD and externally validated in AGES.
RESULTS
EPIC4PD case-cohort
In the discovery phase of this study (Figure 1), we leveraged data from the EPIC cohort to build a large case-cohort (EPIC4PD) of 4,538 participants (including 574 incident PD cases; Table 1) across 10 European centers in Spain, Italy, the Netherlands, the UK, and Germany, combining deep proteomic profiling (6,382 proteins via 7,285 aptamers) in baseline, i.e, pre-disease plasma samples, with long-term prospective follow-up. EPIC4PD participants had a median baseline age of 53 years (range: 35–79), comprising 63% women (57% excluding the Utrecht center, which recruited only women), and were followed for up to 28 years (median: 16 years). Among PD cases, the median time from blood sampling to diagnosis was 11 years (range: 2–25), enabling biomarker discovery long before clinical onset. Consistent with known PD risk profiles, PD cases were older at baseline (median: 59 vs. 52 years), more likely male (58% vs. 40%, excluding Utrecht), less likely to be current smokers (15% vs. 25%), and slightly more often overweight (64% vs. 60% with body mass index (BMI) ≥25) compared to non-cases (Supplementary Tables 1–2).
Table 1.
Overview of cohort datasets included in this study
| Cohort | Population | Phenotype | Platform | Cases | Non-cases | N total | |||
|---|---|---|---|---|---|---|---|---|---|
| Number (% women) | AAE (range) | AAD (range) | Number (% women) | AAE (range) | |||||
| EPIC4PD | Europe | incident PD | SomaScan 7K | 574 (45%) | 59 (38–77) | 71 (45–89) | 3,964 (66%) | 52 (35–79) | 4,538 |
| ARIC | USA | incident PD | SomaScan 5K | 164 (41%) | 59 (48–68) | 75 (56–87) | 11,041 (56%) | 57 (46–70) | 11,205 |
| AGES | Iceland | incident PD | SomaScan 7K | 121 (38%) | 78 (69–88) | 84 (72–93) | 4,969 (58%) | 78 (68–98) | 5,090 |
| GNPC | Europe, USA | prevalent PD | SomaScan 7K | 335 (37%) | 75 (49–90) | - | 2,257 (60%) | 71.5 (27–90) | 2,592 |
| TRACKING PD | UK | PD progression | SomaScan 7K | 794 (34%) | 66 (38–85)* | 66 (38–85) | - | - | 794 |
| EPIC4AD | Europe | incident AD | SomaScan 7K | 652 (71%) | 59 (36–69) | 78 (53–93) | 1,274 (63%) | 49 (35–69) | 1,926 |
| EPIC4ALS | Europe | incident ALS | SomaScan 7K | 187 (57%) | 58 (38–76) | 73 (48–95) | 4,528 (66%) | 52 (35–79) | 4,715 |
| UK Biobank | UK | incident PD | Olink Explore 3072 | 749 (37%) | 64 (41–70) | 71 (44–83) | 47,812 (54%) | 58 (40–70) | 48,561 |
| TOTAL | 3,572 | 70,523 | 74,095 | ||||||
This table presents the cohort datasets included in the study. AAE = median age at examination (baseline), AAD = median age at diagnosis. Note that the individual EPIC-derived case-cohorts (EPIC4PD, EPIC4AD, EPIC4ALS) the vast majority of non-cases, thus the total number of non-cases is not the sum of non-cases of the individual datasets. See Methods for full cohort name abbreviations.
Mean age at visit 1.
Association analyses in the full EPIC4PD dataset
We used Cox proportional hazards regression models across 7,285 aptamers to identify proteins associated with a future PD onset (Supplementary Table 3). In the basic model (stratified by age in 5-year intervals, sex, and center) covering the full follow-up period (2–28 years), 17 proteins were significantly associated with PD risk after a false-discovery rate (FDR) control of 0.05 (Table 2, Supplementary Table 4). We applied two approaches in the discovery phase to address outliers: one excluded extreme values as potential noise, and the other retained them via capping, as they may represent strong PD signals. Nine proteins were FDR-significant in both approaches; for the other eight (six upon exclusion, two upon capping), the alternative model still showed consistent, nominally significant associations (mean ΔHRX=X0.03, maximum ΔHR=0.13; Supplementary Figure 1), supporting robustness of our findings. For simplicity, we report only one result (from the better-performing model) in the main text, with capped data labeled “cap”; full results are available in Supplementary Table 3. Top proteins associated with PD incidence included VAMP3 (hazard ratio [per 1 standard deviation (SD) increase in log10-transformed protein levels]=1.26, p=1.46E-08), TPPP2 (HR=0.70, p=8.52E-08), HPGDS (HR=0.73, p=8.59E-08), ALPLcap (HR=1.25, p=1.30E-07), FCAR (HR=1.34, p=3.48E-07), and CERS5 (HR=1.23, p=5.64E-06; Figure 2, Table 2). All FDR-significant associations remained stable after additional adjustment for key demographic and lifestyle factors (education level, BMI, smoking, and physical activity), with minimal impact on effect sizes (mean ΔHR=0.02, maximum ΔHR=0.07; Supplementary Figure 1, Supplementary Table 4).
Table 2.
Plasma protein biomarkers significantly associated with future onset of Parkinson’s disease
| Protein | HR (95% CI) | p | q | Replication | Cross-disease | Prior evidence |
|---|---|---|---|---|---|---|
| 2–28 years of follow-up | ||||||
| VAMP3 | 1.26 (1.16–1.36) | 1.46E-08 | 1.07E-04 | moderate | ||
| TPPP2 | 0.70 (0.61–0.80) | 7.06E-08 | 3.15E-04 | strong | ||
| HPGDS | 0.73 (0.65–0.82) | 8.59E-08 | 2.09E-04 | strong | AD | |
| ALPLcap | 1.25 (1.15–1.36) | 1.30E-07 | 3.16E-04 | strong | ||
| FCAR | 1.34 (1.20–1.50) | 3.48E-07 | 6.34E-04 | suggestive | Irmady et al., 202344 | |
| CERS5 | 1.23 (1.12–1.34) | 5.64E-06 | 6.85E-03 | suggestive | ALS | |
| MFAP5cap | 0.73 (0.64–0.84) | 1.14E-05 | 1.38E-02 | strong | ||
| RSPO2 | 1.19 (1.10–1.29) | 2.09E-05 | 2.10E-02 | suggestive | AD, ALS | |
| CD3G | 1.19 (1.10–1.29) | 2.30E-05 | 2.10E-02 | suggestive | ||
| GTF2A2cap | 1.22 (1.11–1.34) | 2.30E-05 | 2.40E-02 | moderate | ||
| LTF | 1.28 (1.14–1.43) | 3.21E-05 | 2.34E-02 | AD | ||
| NAMPTcap | 1.28 (1.14–1.44) | 4.51E-05 | 3.65E-02 | ALS | Santiago et al., 201645 | |
| OGFR | 0.77 (0.68–0.87) | 5.04E-05 | 3.67E-02 | strong | AD | |
| FGF8 | 1.21 (1.10–1.33) | 5.75E-05 | 3.46E-02 | moderate | ||
| PSG6 | 1.25 (1.12–1.39) | 6.17E-05 | 3.46E-02 | |||
| ACAD8 | 1.18 (1.09–1.28) | 7.05E-05 | 3.65E-02 | strong | AD | |
| CRYGD | 1.20 (1.10–1.32) | 7.52E-05 | 3.65E-02 | |||
| 2–10 years of follow-up | ||||||
| TIMP1cap | 0.72 (0.62–0.83) | 4.52E-06 | 1.65E-02 | moderate | ||
| IL11 | 1.30 (1.15–1.47) | 2.04E-05 | 4.23E-02 | |||
| TCL1A | 0.71 (0.61–0.83) | 2.23E-05 | 4.23E-02 | strong | Kedmi et al., 201146 | |
| 10–20 years of follow-up | ||||||
| GUCA1Bcap | 1.31 (1.18–1.45) | 3.95E-07 | 1.44E-03 | |||
| UBE2L3|UBBcap | 1.29 (1.16–1.42) | 6.08E-07 | 1.48E-03 | |||
| DCAF12 | 1.24 (1.12–1.37) | 2.63E-05 | 2.47E-02 | |||
| GPC4cap | 1.26 (1.13–1.40) | 2.80E-05 | 3.33E-02 | strong | Tatenhorst et al., 202447 | |
| BECN1 | 1.26 (1.13–1.40) | 2.92E-05 | 2.47E-02 | Miki et al., 201848 | ||
| HEPACAM2cap | 1.25 (1.12–1.39) | 3.05E-05 | 3.33E-02 | suggestive | ||
| F13A1|F13Bcap | 1.34 (1.17–1.54) | 3.20E-05 | 3.33E-02 | |||
| ABHD12 | 1.25 (1.12–1.38) | 3.37E-05 | 2.47E-02 | moderate | ||
| IRAG2 | 1.24 (1.12–1.37) | 3.38E-05 | 2.47E-02 | AD | ||
| TAFA4cap | 1.25 (1.12–1.39) | 5.93E-05 | 3.93E-02 | |||
| GSTA3 | 1.23 (1.11–1.37) | 6.08E-05 | 3.69E-02 | strong | AD | |
| SPASTcap | 1.24 (1.11–1.38) | 7.94E-05 | 4.45E-02 | AD, ALS | ||
| CUL4B | 1.32 (1.15–1.51) | 9.85E-05 | 4.22E-02 | Yao et al., 202233 | ||
| OSTF1 | 1.30 (1.14–1.49) | 9.92E-05 | 4.22E-02 | |||
| PYDC1 | 1.24 (1.11–1.38) | 1.07E-04 | 4.22E-02 | AD, ALS | D. Torshizi et al., 202418 | |
| LCN2 | 1.30 (1.14–1.49) | 1.10E-04 | 4.22E-02 | strong | Fan et al., 202430 | |
| OTX2 | 1.27 (1.13–1.44) | 1.10E-04 | 4.22E-02 | AD | ||
| MAGEA8 | 0.66 (0.54–0.82) | 1.16E-04 | 4.22E-02 | |||
| ZNF41 | 1.22 (1.10–1.35) | 1.32E-04 | 4.57E-02 | |||
| Men | ||||||
| RS1 | 1.42 (1.22–1.65) | 4.34E-06 | 9.10E-03 | AD | ||
| KRAScap | 1.34 (1.17–1.53) | 1.30E-05 | 1.89E-02 | strong | ||
| LYZcap | 1.37 (1.19–1.59) | 2.06E-05 | 2.50E-02 | |||
| GBA3cap | 1.33 (1.16–1.52) | 3.91E-05 | 3.17E-02 | AD | ||
| GJA1 | 1.32 (1.16–1.50) | 3.70E-05 | 3.37E-02 | strong | AD | |
| AMN | 1.40 (1.19–1.64) | 5.07E-05 | 3.69E-02 | moderate | ALS | Elango et al., 202331 |
| Women | ||||||
| ITGA6 | 1.27 (1.14–1.42) | 1.87E-05 | 4.29E-02 | suggestive | ||
| CXCL11 | 0.69 (0.58–0.82) | 2.34E-05 | 5.67E-02 | moderate | Hepp et al., 202332 | |
This table presents all plasma proteins showing significant (false discovery rate [FDR]=0.05) association with incident Parkinson’s disease (PD) in Cox proportional hazard regressions in the EPIC4PD case-cohort stratified for center, sex (where non-sex-stratified) and 5-year age categories. Analyses were conducted for the entire follow-up period (2–28 years), as well as for proximal (2–10 years) and distal (10–20 years) time windows before disease onset or censoring, and additionally analyzed per sex stratum over the full follow-up. Strong evidence for replication: At least one nominally significant replication (for SomaScan in the same effect direction; for Olink, direction not required), and all additional SomaScan-based datasets showing effect estimates in the same direction. Moderate evidence for replication: Effects observed in the same direction in all (at least two) SomaScan datasets datasets, but non-significant results, or alternatively, one nominally significant SomaScan-based replication result (same direction), another SomaScan dataset showing the same and one showing an opposite direction. Suggestive evidence for replication: One nominally significant replication result (same direction for SomaScan, for Olink, direction not required) but heterogeneous effect directions among remaining datasets, or only two SomaScan datasets with consistent (but non-significant) effects. Cross-disease=at least nominally significant (uncorrected p<0.05) associations with instantaneous risk of Alzheimer’s disease (AD) and amyotrophic lateral sclerosis (ALS) based on EPIC4AD and EPIC4ALS, respectively. Prior evidence: publications describing differential protein and/or mRNA levels in blood in (always prevalent) PD patients compared to controls. Note that these were always in line with effect directions in EPIC4PD. HR=hazard ratio; CI=95% confidence interval; q=FDR-adjusted p-value (FDR=0.05). Bolded words for ‘cross-disease’ indicate significance at FDR=0.05. Note that from both outlier handling approaches, i.e., excluding outliers beyond 5 standard deviations from the log-mean or capping these outliers at the value of 5 standard deviations (indicated as “cap”), only the more significant result (smaller q) is shown. However, in all instances, at least nominally significant evidence was also observed for the alternative outlier removal method.
Figure 2. Cox proportional hazard regression analyses across different time-windows in EPIC4PD.
Association results of preclinical protein biomarkers with incident Parkinson’s disease across different follow-up windows in EPIC4PD. Panels A–C: effect estimates for three time windows: full follow-up (2–28 years), proximal (2–10 years), and distal (10–20 years). (D) summary of associations of significant biomarkers (false-discovery rate of 0.05 in at least one time-window) across all follow-up periods.
External replication of predictive biomarkers in the full follow-up period
Of the 17 FDR-significant biomarkers identified in EPIC4PD, replication data were available in a total of 64,856 participants (including 1,034 incident PD cases) from three prospective studies: the Gene/Environment Susceptibility–Reykjavik Study (AGES), Atherosclerosis Risk in Communities study (ARIC), and UKB. Specifically, all 17 proteins were available in AGES (5,090 participants, 121 cases; SomaScan 7K), 15 in ARIC (11,205 participants, 164 cases; SomaScan 5K), and 8 in UKB (48,561 participants, up to 749 cases; Olink 3072 Explore; Supplementary Table 5, Figure 3, Supplementary Figure 2). The other 9 proteins were not measured in UKB, as the current Olink platform captures only ~30% of the proteins assayed by SomaScan 7K. We also included independent SomaScan 7K-based data from prevalent cases as ‘indirect’ replication. To this end, we re-analyzed proteomic profiles from 2,592 participants (335 prevalent PD cases and 2,257 controls) across three GNPC case-control datasets (also see below for comparison of incident and prevalent PD), using similar methodology as with our discovery phase analyses.
Figure 3. Validation and characterization of top preclinical biomarkers.
(A) Forest plot of results of preclinical biomarkers for incident PD in the EPIC4PD case-cohort, the ‘Age, Gene/Environment Susceptibility–Reykjavik Study’ (AGES), the ‘Atherosclerosis Risk in Communities’ study (ARIC), and ‘United Kingdom Biobank’ [UKB] cohorts. (B) Comparison of top preclinical PD biomarkers with their associations in prevalent PD within the ‘Global Neurodegenerative Proteomics Cohort’ (GNPC); left: false-discovery rate (FDR)-significant proteins from EPIC4PD; right: nominally significant proteins. Color-coding refers to risk (red) or protective (blue) effects in GNPC. (C) Cross-disease analysis of preclinical PD-associated proteins compared to incident Alzheimer’s disease (AD) and amyotrophic lateral sclerosis (ALS) in EPIC4AD and EPIC4ALS. The top panels display all FDR-significant preclinical PD proteins from the full follow-up analysis, while the bottom panels show proteins with nominal significance (p<0.05). Protein names in orange indicate nominal significance in AD (top left) or ALS (top right). Orange bolded names indicate FDR-significant proteins. For each disease, the bottom right plot illustrates the correlation of effect estimates using ‘mirrored’ beta values (see Methods). (D) Protein-protein interaction network of FDR-significant predictive proteins (black border) and GWAS-identified hits (no border). Eight clusters were identified using igraph’s greedy optimisation of modularity. Node size increases for nodes with 10 or more edges, proportional to the number of edges. (E) Integrated overview of datasets for validation and characterization of FDR-significant biomarkers across different proteomic and transcriptomic layers.
When interpreting the replication results, platform-related differences between Olink and SomaScan must be considered. In line with the previous UKB study10, we observed a global protein downregulation in incident PD cases in our re-analyses of the UKB Olink data (i.e., 87% of nominally significantly associated proteins). However, this pattern was not seen in SomaScan-based datasets from EPIC4PD, AGES, or ARIC (proportion of downregulated proteins: 43% to 56%; Supplementary Table 6), underscoring the challenges of cross-platform comparisons.
Given the higher comparability of results from datasets generated using the same technology, we first considered replication evidence in the SomaScan-based datasets (AGES, ARIC, and GNPC). Overall, six of the 17 proteins — TPPP2, HPGDS, ALPL, MFAP5, OGFR, and ACAD8 — showed strong evidence for replication, i.e., nominally significant evidence for association in at least one dataset and, moreover, fully consistent effect estimates across all independent studies with available data (Figure 3). Furthermore, VAMP3, GTF2A2, and FGF8 showed moderate evidence for replication, i.e., highly consistent effect directions across all available datasets albeit without any replication dataset individually reaching nominal significance (Figure 3, Table 2, Supplementary Table 5). Upon reviewing the replication evidence in the UKB, the evidence of association was further substantiated for TPPP2, HPGDS, MFAP, and OGFR, while ALPL, ACAD8, VAMP3, GTF2A2, and FGF8 were not measured by Olink. Several additional proteins showed suggestive replication evidence in individual datasets, but the overall replication evidence was more heterogeneous (Table 2, Supplementary Figure 2, Supplementary Table 6).
In summary, nine of the 17 biomarkers (53%; ACAD8, HPGDS, TPPP2, OGFR, MFAP5, ALPL, FGF8, VAMP3, GTF2A2) were replicated across different technologies, populations, and study designs, reinforcing their specific predictive potential, and, more generally, support the generalizability of our biomarker signals.
Biomarker associations relative to disease onset
Stratifying analyses by temporal proximity to PD diagnosis revealed remarkable biomarker dynamics. In the window closest to disease onset (2–10 years; n=4,230 participants including 245 incident cases), three additional proteins — IL11, TCL1A, and TIMP1cap — emerged as FDR-significant. In the more distal window (10–20 years; n=4,023 including 307 incident cases), 19 additional proteins showed strong associations with PD incidence (Figure 2, Table 2, Supplementary Table 4). These proteins had not been identified as significant overall due to negligible effects in the other time window (Supplementary Table 3, Supplementary Figure 2). Dynamic biomarker patterns were further supported by HR trajectories derived from 5-year sliding windows (Figure 2, Supplementary Figure 3). Together, these findings demonstrate that key biomarkers for PD may emerge at distinct preclinical phases, underscoring the value of deep proteomic profiling with extended follow-up to reveal dynamic, stage-specific signatures of preclinical disease.
External replication of the temporal analysis results
Of the three newly emerging biomarkers from the proximal follow-up window (2–10 years before diagnosis), TCL1A showed strong independent replication evidence and TIMP1 showed moderate support, while IL11 did not replicate (Figure 3, Supplementary Figure 2, Supplementary Table 5). Among the 19 biomarkers emerging from the 10–20 year window, GPC4, GSTA3, and LCN2 exhibited strong evidence of replication, and ABHD12 showed moderate support (Figure 3). For LCN, we observed nominally significant evidence for replication in ARIC with the same effect direction as in EPIC4PD. However, in UKB, LCN2 also reached nominal significance (p=3.80E-03) but showed an inverse direction of effect (Figure 3, Supplementary Table 5). This discrepancy likely reflects methodological differences between the SomaScan and Olink proteomic platforms. In line with this and the general trend mentioned above, all but one biomarker (MAGEA8) were upregulated in EPIC4PD, but in all but one instance (HEPACAM2) downregulated in UKB. Generally, replication datasets were fewer in number and had smaller sample sizes within the shorter time windows (Methods and Supplementary Table 5), which may partly explain the generally more modest replication evidence observed.
Sex-stratified analyses
In EPIC4PD, 18% (3 of 17) of proteins significantly associated with PD risk in the full follow-up period showed clear sex-specific effect size differences and significant interaction with sex after FDR control. These proteins included NAMPTcap (ΔHR = 48%), FCAR (ΔHR = 42%), and ALPLcap (ΔHR = 34%; Supplementary Figure 1, Supplementary Table 7).
Furthermore, sex-stratified proteome-wide analyses uncovered six additional proteins (RS1, KRAScap, LYZcap, GBA3cap, GJA1, AMN) FDR-significantly associated with PD risk in men and two (CXCL11, ITGA6) in women, none of which showed strong associations in the opposite sex (HRs 0.92–1.10, all p≥0.05; Supplementary Figure 1, Supplementary Figure 4; Supplementary Table 4). Interaction analyses confirmed FDR-significant sex-based risk modifications for all male-specific biomarkers except GJA1. However, the latter and CXCL11 still showed nominally significant evidence for interaction (Supplementary Table 7). The results of these 11 FDR-significant proteins in the sex-stratified analyses remained stable after adjusting for lifestyle variables and, in women, for menopausal status and postmenopausal hormone therapy (Supplementary Figure 1, Supplementary Table 4). KRAScap and GJA1 demonstrated strong evidence of replication in men, while AMN (men) and CXCL11 (women) provided moderate evidence for replication (Figure 3). The limited sample sizes in the sex strata of the AGES and ARIC cohorts precluded analyses of interaction with sex. In the UKB, 5 of the 11 candidate sex-specific biomarkers had been quantified; however, none showed statistically significant associations in the sex strata, possibly reflecting methodological differences between SomaScan and Olink platforms (see Discussion). Collectively, these findings underscore the importance of accounting for sex-specific effects in biomarker discovery, as such effects may reveal associations that remain hidden in sex-combined analyses.
Inter-correlations of predictive biomarkers in EPIC4PD
Several FDR-significant biomarkers showed moderate inter-correlations (r=0.4–0.6) in EPIC4PD, suggesting shared regulation or involvement in the same pathway. Correlations were observed between MFAP5 and both HPGDS and TPPP2 (r=0.54 each) and between LCN2 and LTF (r=0.60). FCAR also featured in several correlation clusters, underscoring its broader connectivity (detailed in Supplementary Figure 5 and Supplementary Table 8).
Characterization of preclinical biomarkers in prevalent PD and PD progression
Next, we investigated whether the effects of the identified preclinical biomarkers extended beyond the risk phase into the symptomatic stage of PD by leveraging the SomaScan 7K proteomic data from 2,592 independent GNPC participants (335 prevalent cases and 2,257 controls; Figure 3, Supplementary Table 9; see the companion manuscript of Imam et al.14 for full dataset description). Of all 47 preclinical biomarkers detectable in GNPC, 7 (15%; TPPP2, HPGDS, ALPL, NAMPT, GSTA3, KRAS, GJA1) were nominally significantly associated with prevalent PD, and 4 (TPPP2, HPGDS, NAMPT, GSTA3) remained significant after FDR control. All but NAMPT showed consistent effect directions. Overall, 38 (81%) of the 47 top proteins and 125 (86%) of the 146 nominally significant aptamers results shared between both traits showed concordant directions of effect — significantly more than expected by chance (p=1.25E-5 and p=1.64E-19, respectively; one-sided exact binomial test; Figure 3; Supplementary Table 9). In serum samples from the Tracking Parkinson’s cohort (n=794), we assessed associations between FDR-significant preclinical biomarkers and clinical outcomes related to disease progression, using motor (Unified Parkinson’s Disease Rating Scale Part III, UPDRS-III) and cognitive (Montreal Cognitive Assessment, MoCA MoCA) scores in cross-sectional and longitudinal mixed-effects models (Figure 3, Supplementary Table 10). Two biomarkers, HPGDS (β=2.61, p=6.82E-04) and MFAP5 (β=1.31, p=1.49E-03), were FDR-significantly associated with higher MoCA scores over time, indicating that lower levels of these proteins predict both increased PD risk and faster cognitive decline. Associations remained robust after adjusting for BMI and disease duration (data not shown). These results demonstrate that a considerable fraction of preclinical biomarkers persist in their dysregulation in prevalent PD, and a small subset is even associated with progression.
Comparing incident and prevalent disease settings, such as between EPIC4PD and GNPC, is essential for determining which biomarkers are robust across the disease continuum and which are specific to particular stages. This distinction enables identification of biomarkers suitable for early detection versus those better suited for diagnosis or disease monitoring.
Comprehensive assessment of previously described biomarker candidates
Of 291 unique proteins previously linked to PD across methodologically diverse sources10,15–17, 191 proteins were assessable in the EPIC4PD data (Supplementary Table 11): Most notably, 38 proteins had recently been reported as putative predictive PD biomarkers in the UKB Olink study10, with 27 also covered in our independent dataset. Of these, 11 (41%; HPGDS, CD209, TPPP3, EPS8L2, FCRL1, BAG3, TNXB, IL13RA1, NEFL, PPY, RET) showed at least nominally significant associations in one or more follow-up intervals. HPGDS also represents a top biomarker in EPIC4PD (Table 2). The next two most significant proteins were CD209 (best model for follow-up of 10–20 years: HR=0.75, p=1.32E-04) and TPPP3 (best result: 2–28 years: HR=0.80, p=1.58E-03), which aligns well with TPPP2, one of the lead findings in EPIC4PD (see Discussion). All but one (PPY) of the at least nominally significantly associated biomarkers showed the same effect direction as in the previous study10 despite platform differences. Of the 127 candidate neurological biomarkers from the CNS NULISA panel (Alamar, Inc), 106 were tested in EPIC4PD; of these, UBE2L3|UBB and CCL11 represented top biomarkers in EPIC4PD (Table 2). Other nominally significant findings included for instance SNCA, DDC, NEFL, TREM2, GFAP, and TIMP3. Furthermore, PYDC1 nominated by a recent Mendelian randomization (MR) and colocalization study18 was among the top results in EPIC4PD suggesting a causative link to PD risk. Several other candidates from an additional MR16 or GWAS17 showed nominal associations in EPIC4PD (Supplementary Table 11).
Conversely, 5 of the 39 FDR-significant PD biomarkers identified in our non–sex-stratified EPIC4PD analyses (CRYGD, LTF, HPGDS, TCL1A, and LCN2) were also included in the MR study based on SomaScan data16. Of these, two biomarkers (CRYGD and LTF) showed nominal evidence of a causal association with PD, suggesting that higher circulating levels may increase disease risk, in line with the EPIC4PD findings (Supplementary Table 12). The limited overlap in assessed proteins likely reflects the comparatively stringent analysis criteria applied in that study16.
Cross-disease comparisons
The 47 FDR-significant PD biomarkers identified in EPIC4PD were also evaluated in incident AD (n=1,926, including 652 cases) and ALS (n=4,715, including 187 cases) using Cox regression analyses in the EPIC4AD and EPICALS case-cohorts (Supplementary Table 1), respectively. Sixteen (34%) preclinical PD biomarkers showed at least nominally significant associations with AD and/or ALS, all with concordant effect directions. Correlation plots of hazard ratios showed alignment in effect size estimates (Figure 3). Notably, three PD proteins — RSPO2, SPASTcap, PYDC1 — were FDR-significant for AD and also showed nominally significant associations with ALS (Table 2, Figure 3, Supplementary Table 13).
At the proteome-wide level, 17% of proteins nominally significantly associated with PD over the full follow-up period (non-sex-stratified) also showed associations with AD and/or ALS, whereas 83% appeared to be PD-specific. Among the shared proteins, the majority (68%) overlapped exclusively with AD, 25% with ALS, and 7% were common to all three neurodegenerative diseases (Supplementary Figure 6). Nominally significant preclinical AD biomarkers were significantly enriched among nominally significant PD biomarkers (p=1.19E-03; Fisher’s exact test), while ALS biomarkers were not (p=0.107). Notably, 85 of 92 PD–AD and 33 of 39 PD–ALS shared proteins showed concordant effect directions (p=1.92E-18 and p=7.15E-06; exact binomial test; Figure 3). Additionally, among proteins nominally associated with PD, effect sizes modestly correlated with AD and ALS (Pearson’s r=0.18 [95% CI: 0.11–0.25] and r=0.21 [0.14–0.28], respectively; Figure 3). While these comparisons highlight a largely PD-specific biomarker signature, they also point to partially shared preclinical pathways.
Functional characterization of preclinical PD biomarkers
Gene ontology (GO) enrichment analysis of suggestive PD biomarkers (n=154 proteins with p<1.0E-03 and consistent effect directions; see Methods) did not yield FDR-significant results but revealed 68 gene sets with strong nominal enrichment (p<1.0E-02), highlighting numerous biologically plausible and likely disease-relevant pathways based on prior evidence. Strikingly, despite the long time between blood draw and clinical diagnosis, we observed several well-established mechanisms in PD among the top ‘biological processes’ categories, including vesicle trafficking (e.g., endoplasmatic-reticulum-to-Golgi transport, vesicle docking), dopaminergic neuron differentiation, microtubule dynamics, and immune-related responses (e.g., to interleukin-6, cytokines, neutrophil-mediated immunity). Lastly, several vitamin-related processes, including response to vitamin D, also emerged, pointing to molecular pathways that were especially prominent in earlier PD research. Enrichments in ‘cellular component’ and ‘molecular function’ categories reinforced the role of vesicle-mediated activity, while Reactome analysis further emphasized immune system involvement, including antimicrobial peptides (Supplementary Tables 14).
Expression heatmaps across 50 human bulk tissues and single brain cell types based on RNA-seq data revealed relatively high expression of several FDR-significant biomarkers in peripheral and central immune-related tissues and cells compared to other tissue types (Supplementary Figures 7–8). Protein-level data from the Human Protein Atlas supported this observation (Supplementary Figure 9). Consistent with these findings, single-cell RNA-seq–based enrichment analyses of FDR-significant and suggestive biomarkers in mice and humans supported a role for immune cells in preclinical PD. In mouse cortex, suggestive biomarkers showed nominally significant enrichment in microglia (Supplementary Table 15). Analyses across 23 representative human cell types (including immune, neuronal, intestinal, muscle, liver, and fibroblast cells) demonstrated nominally significant enrichment in fibroblasts, monocytes and Kupffer cells, with borderline significance in granulocytes and macrophages (Supplementary Table 15). Collectively, these single-cell RNA-seq–based findings align with our Gene Ontology and Reactome pathway results, highlighting early immune system involvement as a key feature of preclinical PD.
Interestingly, protein–protein interaction (PPI) analysis revealed that FDR-significant biomarkers were significantly more likely to interact with GWAS-nominated PD risk genes17 than expected by chance (p=1.0E-04; permutation test; Figure 3, Supplementary Table 16). Within this network, SNCA emerged as a central hub, exhibiting several direct edges with FDR-significant biomarkers (Figure 3). Consistently, SNCA expression in both brain and immune cells was significantly correlated with several of these biomarkers (Supplementary Table 17). In brain, strong correlations (r≥0.5) were observed with RS1, RSPO2, and OSTF1, and in immune cells, with LYZ, VAMP3, F13A1, and FCAR. Furthermore, SNCA showed significant correlations with FDR-significant preclinical biomarkers more often than expected by chance (p=1.12E-02 for immune cells; p=9.1E-02 for brain cells; permutation t test). These findings provide converging evidence that the identified preclinical biomarkers capture early pathophysiological processes consistent with independent lines of prior research.
Finally, a semi-systematic literature search revealed prior functional evidence for a major part of the 47 preclinical PD biomarkers (details in the Supplementary Material [Supplementary Table 18]).
Druggability assessment of preclinical biomarkers
Given their potential pathophysiological implications, we next evaluated whether any of the 47 FDR-significant preclinical PD biomarkers represent druggable targets using the Open Targets database19. Twenty-one proteins (45%) showed at least one supporting line of evidence for interaction with small-molecule drugs (Supplementary Table 19). Notably, four proteins — NAMPT, KRAS, CD3G, and F13A1 — were associated with compounds currently undergoing clinical trials in non-neurological diseases (mostly cancer), underscoring their translational potential. These findings indicate that a substantial share of preclinical PD biomarkers may be pharmacologically modifiable.
Disease prediction
To assess predictive utility, we applied least absolute shrinkage and selection operator (LASSO)-based machine learning in EPIC4PD with repeated subsampling and regularization. Across 100 subsamples, the most frequently selected proteins were extracted with their corresponding average weights (Supplementary Table 20). As external validation, we calculated weighted predictive protein risk scores (PPRS) in the AGES cohort (n=5,090; 121 incident PD cases), leveraging the shared SomaScan 7K platform. Risk scores were based on EPIC4PD results, varying by the number of proteins and using average LASSO-derived coefficients. Given the marked sex differences in EPIC4PD biomarker signatures, primary models were developed separately for men and women, with additional combined models adjusted for sex for comparison (Supplementary Tables 20–21). In men (n=2,136; 75 PD cases), the best-performing PPRS (50 proteins+age, C-index: 0.63, SE: 0.03) largely outperformed the age-only model with ΔC=0.08. Nine proteins that were FDR-significant in the univariate Cox models in EPIC4PD were also included in this PPRS (including GSTA3, RS1, and HPGDS; Supplementary Table 20). In women (n=2,954; 46 cases), the best PPRS (20 proteins+age, C-index: 0.61; SE: 0.04) improved prediction similarly with ΔC=0.07. This PPRS included three (CXCL11, TPPP2, and HPGDS) of the univariate analyses in EPIC4PD. As expected, improvement of risk prediction in the combined cohort was attenuated (ΔC=0.03 for best score). These findings emphasize the necessity of incorporating sex-stratified analyses into disease prediction research to enable the development of high-precision risk models.
DISCUSSION
In this large, prospective, and multi-center proteomic study, we identified and independently validated novel plasma biomarkers predictive of PD onset up to two decades or more before clinical diagnosis. Using the high-resolution SomaScan 7K platform in pre-diagnostic samples from over 4,500 EPIC4PD participants, we discovered 47 proteins significantly associated with incident PD. Several top markers — TPPP2, HPGDS, ALPL, MFAP5, OGFR, ACAD8, TCL1A, GPC4, GSTA3, LCN2, KRAS, and GJA1 — showed strong replication across incident (AGES, ARIC, UKB) and prevalent (GNPC) PD cases, underscoring their robustness across platforms and populations. Importantly, owing to the much larger proteomic resolution our study uncovered numerous novel PD biomarkers not assayed in the previous analyses in the UKB10. In addition, while we observed that pre-clinical protein biomarker profiles are partially shared between PD and other neurodegenerative diseases (i.e., AD and ALS), delineated PD biomarker profile appears to be largely disease-specific. Overall, our study delivers the most comprehensive proteomic resource for preclinical PD to date. It is unique by incorporating diverse preclinical validation cohorts, by spanning the full disease course, and by demonstrating some overlap with the preclinical phase of other neurodegenerative disorders. As such it sets a new benchmark for biomarker discovery in neurodegeneration.
Our findings offer several new key insights: First, predictive PD biomarkers can be reliably identified in blood many years, even decades, before diagnosis. This relates to signals such as HPGDS, MFAP5, OGFR, and ACAD8, which remain stable across different platforms and populations. Second, we also observed dynamic, stage-dependent biomarker patterns: while some proteins persist throughout all stages, others appear only within certain time windows before diagnosis (such as TCL1A, GPC4, GSTA3, and LCN2), possibly reflecting underlying pathophysiological shifts. This temporal resolution is critical for distinguishing genuine early biomarker signals from late-stage or treatment-related changes. Third, we observed evidence for sex-specific effects for several biomarkers in EPIC4PD, an observation not seen in our re-analysis of the Olink-based UKB data. This suggests that some sex-dependent protein isoforms — such as specific glycosylation patterns or cleavage products — may be captured by SomaScan but not by Olink’s dual-antibody format, which relies on simultaneous binding to two defined epitopes20. In this context, it is striking that in the previous UKB study10, all top preclinical biomarkers were decreased in incident PD cases.
While we confirmed this pattern in our re-analysis of the UKB data, it was not observed in any of the SomaScan-based datasets. At the same time, the substantial concordance in effect directions across the independent SomaScan-7K-based data from EPIC4PD and GNPC supports the validity of our findings. Thus, the observed discrepancies between this study and UKB10 likely reflect methodological differences between proteomic platforms used. While recent evidence suggests that the latest SomaScan platforms provide higher measurement precision in comparison to Olink21–25, importantly, the two platforms capture partly distinct protein features, underscoring their complementary strengths21. A comprehensive understanding of these differences will require benchmarking against mass spectrometry. Until then, it is important to recognize that genuine protein associations may fail to replicate or even show opposite effects when comparing results across these two technologically different platforms.
Fourth, we observed high concordance (>80%) in effect size directions between preclinical and clinical biomarkers across two independent datasets (EPIC4PD and GNPC, both SomaScan 7K), supporting the robustness of our findings and providing indirect replication. These comparisons are essential to identify which biomarkers are relevant across all stages vs. those that are stage-specific, informing their use for early detection, diagnosis, or monitoring. Fifth, we were in the unique position to probe for an overlap of preclinical biomarker profiles of PD and two other neurodegenerative diseases (AD and ALS). About one-third of PD biomarkers also showed consistent associations with incident AD and/or ALS, including RSPO2, SPAST, and PYDC1. While effect sizes correlated modestly across diseases, the clear enrichment of shared biomarkers — especially between PD and AD — suggests partially overlapping preclinical mechanisms. Notwithstanding these overlaps, we emphasize that most identifed biomarkers were specific to PD.
Sixth, this latter conclusion is confirmed by a wide array of in silico analyses supporting the biological plausibility of many of our biomarker findings. For instance, we saw an overrepresentation of key PD-relevant processes, including vesicle trafficking, dopaminergic neuron biology, and immune signaling. Thus, it appears that these disease-related mechanisms are already active and detectable in blood over two decades before diagnosis. Furthermore, cell-type enrichment analyses implicated the involvement of microglia and peripheral immune cells already in the earliest disease phases, aligning with current neuroinflammation models of PD26,27. Moreover, our PPI analyses suggest that the preclinical biomarkers interacted more frequently than expected with known PD proteins and significantly correlated with SNCA expression, strongly emphasizing their functional relevance to the disease. Along these lines, several of our newly detected preclinical biomarkers had been implicated in biomarker as well as in functional studies in PD before. For example, TCL1A28, GPC429, LCN230, AMN31, CXCL1132, CUL4B33, FCAR34, BECN135, and NAMPT36 have previously been reported as potential blood biomarkers in prevalent PD in studies with smaller sample sizes. In addition, some of the preclinical PD biomarkers were reported to show differential expression in post-mortem brain tissue from PD patients compared to controls (e.g., LCN237, GJA138, TIMP139, AMN31). Lastly, several of our top biomarker proteins are implicated in microtubule dynamics (TPPP2), vesicle trafficking in the presynapsis (VAMP3), mitochondrial functions (ACAD8), dopaminergic neuronal function and maturation (GPC4, FGF8) and immune functioning/neuroinflammation (TCL1A, CXCL12), which are all highly relevant in PD pathophysiology40. A detailed summary of key experimental findings and supporting evidence, can be found in the Supplementary Material. Seventh, our druggability assessments revealed that nearly half of the FDR-significant biomarkers show evidence for small-molecule interactions. While blood-based biomarkers of brain diseases (like PD) do not automatically translate into therapeutic targets, this overlap may help prioritize candidates for future functional studies and repurposing efforts.
Last, we developed a sex-specific protein risk score (n=50 men, n=20 women), which showed robust predictive value in external validation within the AGES cohort and outperformed sex-combined models. These findings underscore both the potential of proteomic risk profiling for early PD risk stratification and the importance of considering sex differences. Incorporating genetic, lifestyle, and environmental factors is expected to further improve prediction. Along these lines, existing tools, such as the updated MDS criteria (specialist settings) and the PREDICT-PD algorithm (broader screening)41,42, show promise but remain insufficient, particularly for detecting preclinical PD more than five years before diagnosis (e.g., ref.43). These tools are continuously evolving, and integrating our protein biomarkers alongside sex-stratified analyses offers a logical next step to improve predictive accuracy. Our study has several strengths, including representing the largest collection of incident PD proteomic datasets to date, an exceedingly long-term follow-up, rigorous case ascertainment, the inclusion of several external datasets ensuring population and platform cross-validations, as well as extension to other disease phenotypes and diseases. Notwithstanding, our study also has several potential limitations. First, use of distinct proteomic platforms with different technologies, particularly SomaScan versus Olink, and coverages, comparatively limited case numbers despite large replication cohorts, usage of different blood sample types (citrate plasma, EDTA plasma, or serum) as well as differences in study design (e.g., recruitment, case ascertainment, follow-up) may have contributed to non-replication of some biomarkers. Second, as outlined above, while the SomaScan 7K assay offers extensive coverage, it is semi-quantitative; signal changes may reflect different proteoforms rather than absolute protein levels, complicating interpretation. Mass spectrometry-based approaches are needed to clarify the underlying biology. Third, although we adjusted for key covariates in sensitivity analyses, residual confounding remains possible. Fourth, lacking neuropathological confirmation, some PD cases may have been misclassified — a common problem in large cohorts and a general limitation in epidemiological studies. However, given that medical charts of PD cases were reviewed by clinical specialists, the risk of misclassification in EPIC4PD is likely lower than in cohorts such as UKB. Finally, all datasets consisted exclusively or predominantly of non-Hispanic White participants. While the ARIC study also included a subset of Black participants (n=2,625) and their exclusion did not affect effect estimates, the sample size was too limited to allow for dedicated ethnicity-specific analyses of this subset. Thus, while our findings are likely generalizable to populations of European ancestry, their applicability to other ethnicities requires further assessment.
In conclusion, our study identifies and validates robust blood-based biomarkers that predict PD several decades before clinical onset. Our results underscore the importance of applying prospective cohort designs with incident case ascertainment to detect truly preclinical biomarkers. By capturing early molecular changes, our work lays the foundation for developing prevention strategies using precision medicine approaches in PD and potentially other neurodegenerative diseases.
METHODS
Discovery cohort and study design
The EPIC cohort is a large multi-center prospective study that recruited 519,978 participants (70.5% women, mostly aged 35–70 years at baseline) from 23 centers across ten European countries. With approximately 30 years of follow-up, EPIC was originally established to explore the relationships between diet and cancer, but it has since expanded to study other chronic conditions, including neurodegenerative diseases12,13. Briefly, baseline data including information on diet, physical activity, alcohol and tobacco use, past and current illnesses, reproductive history, and anthropometric measurements were collected between 1991 and 2000. Citrate plasma samples from ~75% of all participants were obtained using standardized protocols, then aliquoted and stored long-term in liquid nitrogen (−196 °C) at a central biobank housed at the International Agency for Research on Cancer (IARC)/World Health Organization in Lyon, France. Since baseline recruitment, participants have been followed up regularly by data linkage and/or questionnaires at each center. The EPIC study was approved by the ethical committee of IARC and by the ethical review boards of each participating study center. All participants signed a written informed consent.
The ‘EPIC for Parkinson’s disease’ case-cohort (EPIC4PD) analyzed in this study is embedded within EPIC and aims to prospectively identify molecular biomarkers in PD11. The case ascertainment protocol has been described previously11,49. It was based on a source population of 157,288 subjects from ten EPIC centers across five European countries including Spain (Murcia, Navarra, and San Sebastian), Italy (Florence, Turin, and Varese), Netherlands (Utrecht and Bilthoven), UK (Norfolk/Cambridge), and Germany (Heidelberg). Potential cases were ascertained based on data linkage through a combination of center-specific methods using hospital records, general practitioner records, mortality records, drug repositories, and questionnaire data followed by re-review of medical records by expert neurologists for PD. Since the initial ascertainment, 163 incident PD cases were added following an update of the case ascertainment in selected centers (Heidelberg, Florence, Varese, Turin, Utrecht, Bilthoven)11. The subcohort for EPIC4PD was randomly drawn from these centers from EPIC-interact, which is a random subcohort within EPIC50. The data freeze for the current project was set to the time point of diagnosis of the last patient per center. For the purpose of identifying predictive PD biomarkers, we applied a washout phase after blood draw of two years prior to diagnosis to avoid reverse causation resulting in an exclusion of 23 PD patients.
Generation and quality control of proteomic data in EPIC4PD
Protein abundances were measured in citrate plasma samples using the SomaScan 7K (v4.1) platform (SomaLogic, Inc.). SomaScan is a high-throughput proteomic platform that uses modified nucleotides (Slow Off-rate Modified Aptamers — SOMAmers) which specifically bind to protein targets and quantifies them in relative fluorescence units (RFUs) using DNA microarray51. The 7K version quantifies 7,596 aptamers targeting 6,432 unique proteins52. Measurements of overall 17,841 EPIC participants were performed at SomaLogic (Boulder, CO, USA) as previously described. Participants included also individuals who were selected for other endpoints (such as cancer). All samples were processed in a single procedure. SomaScan assays were performed on 96-well plates, with 11 control wells per plate used to monitor batch effects, accuracy, precision, and background. Each plate included five pooled calibrator replicates, three quality control (QC) replicates, and three buffer replicates. In addition, 233 replicates of one EPIC QC sample were included. Readout was conducted using Agilent hybridization and scanning technology. To control readout variability, twelve hybridization control aptamers were added during elution. Initial QC was conducted by SomaLogic and included normalization of protein RFUs based on the following steps: hybridization normalization, intraplate median normalization, plate scaling and calibration, and adaptive normalization to a population reference.
We computed intraclass correlation coefficients (ICCs) for all 7,596 aptamers based on the 233 EPIC replicate measurements. Aptamers were removed if they were not a protein (n=46) [Hybridization Control Elution (n=12), non-biotin (n=10), non-cleavable (n=4), spuriomer (n=20)] or non-human (n=261). Aptamers were further excluded if both (i) the ICC was below 0.5 and (ii) if the aptamer levels were below the limit of detection (LOD) in more than 90% of the samples (n=4). If the measured value of an aptamer was below the LOD, it was set to LOD/2. The final number of quality-controlled aptamers was 7,285 (targeting 6,381 unique proteins). Samples for which any SomaLogic normalization scale factor was outside [0.4–2.5] (n=247) and those detected as 0s using an approach based on PCA and the local outlier factor statistic (https://privefl.github.io/blog/detecting-outlier-samples-in-pca/) were excluded (n=0 for the method excluding outliers and n=5 for capping outliers). Plate correction was further applied through a residual approach: for each SOMAmer, its measurements were corrected for plate effect estimated in linear mixed effect models adjusted for center, age, sex, BMI, smoking status, and incidence of multiple cancer types, CVD, T2D, neurodegenerative diseases and death, to preserve possible biological variation due to these factors. Finally, only the samples belonging to the EPIC4ND case-cohort were retained (n=6,543 for both outlier handling methods).
Measurements of each SOMAmer were log10-transformed and further centered and scaled so that their mean was 0 and their standard deviation was 1 in the subcohort. Measurements deviating more than 5 SD from the mean of the normalized log10-transformed data were either excluded or ‘capped’ i.e., set to the value of 5 SD from the normalized log10-transformed mean. The first approach is based on the assumption that outliers represent technical or biological variation unrelated to PD (e.g., outliers due to blood type or sex or genetic variation not associated with PD) obscuring any association signal nearer to the mean. However, as we also acknowledge that outliers may represent strong PD biomarker signals, in the discovery phase, we equally ran all statistical analyses by capping these outliers. For all statistically significant findings, log10-transformed values were visually inspected (Supplementary Figure 10) to verify the appropriateness of the data transformation and the cutoff for the outlier removal. In the subsequent text and main tables, protein names based on results using capped data are labeled with an ‘cap’.
Statistical analysis in EPIC4PD
The final effective EPIC4PD case-cohort included 4,538 EPIC4PD participants including 574 incident PD cases with 7,285 aptamers (corresponding to 6,381 proteins) available for statistical analysis. We used Cox proportional hazard regression analyses to examine the association of 7,285 aptamers with instantaneous risk of PD using age as time scale and Prentice weights to take into account the case-cohort setting53. Our basic model was stratified (i.e., separate baseline hazard functions estimated) for 5-year age intervals, sex, and center. Cox proportional hazards regression was performed using three distinct time windows relative to disease onset: the full follow-up time period (2–28 years from baseline to diagnosis/censoring), as well as two shorter periods therein: 2–10 and 10–20 years. Furthermore, we performed sex-specific analyses after stratification for men and women using the full follow-up time period. Differences between sex-stratified effect estimates were tested by interaction analyses as previously described54,55.
Proteomic results were defined as significant at a FDR of 0.05. Furthermore, we defined “suggestively significant biomarker signals” as those findings that did not withstand FDR control in the proteomic analyses but had a p-value <10−3 and demonstrated consistency across all EPIC4PD analyses — specifically, at least two nominally significant (p<0.05) results and all effect estimates pointing in the same direction of effect.
In sensitivity analyses for all FDR-significant results, we tested a ‘full’ adjustment model including the baseline covariates school education (no education/primary school vs higher education), smoking status (dummy-coded as ‘never’, ‘former’, ‘current’), body mass index (BMI; weight/height2 [kg/m2]), and physical activity (dummy-coded as ‘inactive’, ‘moderately inactive’, ‘moderately active/active’). Furthermore, for FDR-significant results derived from analyses stratified for women, we adjusted for postmenopausal status (dummy-coded as premenopausal, postmenopausal, perimenopausal, and surgical postmenopausal status) and use of hormones for menopause (yes/no).
HR trajectories were computed using 5-year sliding windows in one-year steps. For the analyses of the sliding windows close to age at diagnosis, the 23 participants diagnosed with PD within two years of baseline and previously excluded from the main analyses were re-included. Due to more limited numbers of cases toward the end of follow-up, the final window covered diagnoses occurring 16 to 25 years after baseline. Curves were fitted using the geom_smooth() function from the ggplot2 package in R.
Independent replication of predictive biomarkers in prospective cohorts
Replication analyses of FDR-significant predictive biomarkers were performed on 64,856 individuals (1,034 incident PD cases) across three different prospective cohorts (AGES56, ARIC57, UKB58) based on Olink (UKB) and SomaScan (AGES, ARIC) data, respectively.
The first replication cohort, the Age Gene/Environment Susceptibility-Reykjavik (AGES-Reykjavik) Study59, is a population-based study of adults aged 65 years and older in Iceland, designed to investigate genetic susceptibility and gene-environment interactions related to age-associated diseases. Between 2002 and 2006, 5,764 participants, which were survivors of the original Reykjavik Study, underwent comprehensive interviews, examinations, and measurements. The study was approved by the Icelandic National Bioethics Committee (VSN: 00–063), the Icelandic Data Protection Authority, and the NIA IRB; all participants provided written informed consent. Proteomic profiling of baseline serum samples from the AGES cohort has been described previously60. After quality control and exclusion of prevalent PD cases, 5,308 participants had available proteomic data, further reduced to 5,090 when applying a 2-year washout period. Over the follow-up, 121 participants developed incident PD. In AGES, aptamer levels were log10-transformed. Further details on recruitment can be found in ref.61, and prevalent PD case ascertainment in ref.62. Medical and mortality records (ICD10: G20) up to 2019-03-19 were reviewed to ascertain incident PD cases.
The 2nd replication cohort, ARIC57, is an ongoing community-based cohort study, for which participants were enrolled from four communities across the United States: Washington County, MD; Forsyth County, NC; northwestern suburbs of Minneapolis, MN; and Jackson, MS. The ARIC participants who had SomaScan 5K data available (4,953 aptamers passing quality control) comprised 11,205 middle-aged participants at baseline of whom 164 developed incident PD applying a washout period of two years and had SomaScan data in EDTA plasma samples available. For details on the cohort recruitment and ascertainment of incident PD cases therein please see ref.63 and the Supplementary Material. All participants provided appropriate written consent.
The 3rd replication cohort, UKB58, is a general population cohort study, which performed proteomic profiling of blood plasma samples using the Olink Explore 3072 platform (n=2,923 proteins, cohort size analyzed in this study: n=48,561 UKB participants including 749 incident PD cases). The plasma samples were collected in volunteers aged 40 to 69 years at their first attendance to a UKB assessment center in 22 centres across the United Kingdom between 2006 to 2010. Follow-up began from the date of attendance to an assessment center (field 53) to the first recorded date of diagnosis (ICD-10 code G20) in the patient-linked hospital episode statistics (fields 41270 and 41280) or death registration (fields 40001 and 40002), whichever occurred first, with the last recorded date in January 2023. Due to small case numbers in other strata, the analysis was restricted to white participants (field 21000). Participants with missing proteomics data were removed prior to each model.
Due to smaller sample sizes in the restricted time windows, we did not analyze top EPIC4PD biomarkers in AGES for the time-window of 2–10 years of follow-up (n cases=26) and for ARIC for the time window of 10–20 years of follow-up (n cases=32). Consistent with the EPIC4PD analyses, outliers in AGES and UKB were either capped at or excluded beyond 5 SD, depending on the approach used for the specific biomarker to be replicated. In AGES and UKB, outliers were either capped at or excluded beyond 5 SD, in line with EPIC4PD analyses. Outliers in ARIC were capped at 5 SD. Cox proportional hazards regression models were harmonized for AGES, ARIC and UKB with EPIC4PD: The Cox model was based on age as time scale and stratified (separate baseline hazard functions estimated) for age in 5-year categories, race (in ARIC), and sex. Significance for the replication arm was defined at α=0.05. As sensitivity analysis, we also performed the same analyses only on white participants in ARIC (excluding 2,625 black participants). However, as results did not change substantially, we only report results on all participants in this study.
Indirect replication and characterization of preclinical EPIC4PD biomarkers for clinical phenotypes in GNPC
The FDR-significant preclinical biomarkers identified in EPIC4PD across the three follow-up periods and after stratification for sex were assessed for an association with prevalent PD leveraging a large case-control dataset of the GNPC (335 prevalent cases and 2,257 controls from 3 contributors) by re-analyzing the data described in the companion manuscript of Imam et al.14 All individuals included had undergone SomaScan 7K proteomic profiling as part of the GNPC initiative. For this analysis, we restricted inclusion to datasets that contained both PD cases and controls. Control participants were those without a documented clinical diagnosis of mild cognitive impairment, dementia, AD, frontotemporal dementia, PD, or ALS. To minimize bias from population stratification and ensure comparability with EPIC4PD, we only included white participants in datasets where the variable “race” was available. If no race information was provided, all participants were included. The protein data was log10-transformed and standardized and values above or below five standard deviations from the mean were excluded. Then, the case-control datasets were residualized using the removeBatchEffect function from the R package limma to regress out the effects of age, sex, and contributor, thereby controlling for confounding sources of variation in the high-dimensional data. A principal component (PC) analysis was then performed on the residuals; based on the scree plot, the first two PCs were retained as additional covariates. Finally, logistic regression analyses were conducted on the original data, adjusting for age, sex, dummy-coded contributor, and the first two PCs (all results from the re-analyses with p<0.05 can be found in Supplementary Table 22). Analogous to the EPIC4PD analyses, preclinical biomarkers that were FDR-significant in one sex stratum were analyzed for the GNPC data within that specific sex group.
Concordance of effect estimates in EPIC4PD and GNPC (for all 47 significant preclinical biomarkers as well as for nominally significant biomarkers shared by both studies) we applied a one-sided binomial exact test. All analyses were performed in R.
We analyzed serum proteomic data from the Tracking Parkinson’s disease cohort using the SomaScan v4.1 (7K) assay (SomaLogic). Our cohort included 794 PD patients with serum samples and clinical progression data across the three study visits (at 0 months, 18 months, 36 months). To correct for batch effects introduced by collection site, we used the ComBat function from the sva R package (v3.50.0), with site as the batch variable, followed by log10-transformation. For association analyses with PD progression, we investigated associations between protein levels and two clinical outcomes: the Unified Parkinson’s Disease Rating Scale Part III (UPDRS-III) as a measure of motor function, and the Montreal Cognitive Assessment (MoCA) as a measure of cognitive function.
Cross-sectional associations were assessed using generalized linear models, adjusting for age, sex, and Levodopa Equivalent Daily Dose (LEDD) as a measure of treatment exposure. For longitudinal analyses across up to to three visits per participant, we applied linear mixed-effects models using the lmer() function from the lme4 R package. These models included a random intercept for participant ID to account for repeated measures, and fixed effects for protein level (log10 RFU), visit, age, sex, and LEDD. In additional sensitivity analyses, we included BMI and disease duration (time since diagnosis at baseline) as covariates. Protein associations were evaluated separately for each outcome, and p-values were adjusted for multiple testing using an FDR=0.05.
Comparison to previous studies
We compiled a list of 291 unique candidate biomarkers previously linked to PD based on findings from studies that employed a range of different methodologies (several proteins were nominated more than once): These included i) 38 proteins from the only previous proteomic study on future PD performed on 2,920 proteins based on Olink measurements in the UKB10, ii) 10 of ~2,700 analyzed blood plasma proteins (7 derived from Olink and 3 from SomaScan studies) recently proposed as potential predictive biomarkers for PD by a comprehensive MR study16, iii) 18 proteins nominated by a recent MR and colocalization study64, iv) 109 proteins corresponding to nominated functional candidate and/or the nearest genes located in 78 genetic risk loci described in the most recent Caucasian genome-wide association study for PD17, and v) 127 preselected neurodegenerative disease candidate biomarkers included in the ‘CNS NULISA panel’ (Alamar, Inc.). Proteins available in our SomaScan 7K data from this list (n=180) were tested for association in EPIC4PD. Results were controlled applying an FDR=0.05 across all 180 candidate proteins available for testing in our SomaScan 7K data.
Cross-disease comparisons
For cross-disease comparisons between PD and AD as well as between PD and ALS, we performed Cox proportional hazard regression analyses for all FDR-significant PD proteins in EPIC4AD (1,926 EPIC participants including 652 incident AD cases during a follow-up of 2–29 years) and EPIC4ALS (4,715 EPIC participants including 187 incident ALS cases during a follow-up of 2–30 years) using Prentice-weighted Cox proportional hazard regression analysis stratified for 5-year age groups, sex, and center, as described above. Results were controlled at an FDR=0.05 in the context of all tested top biomarkers per disease. Furthermore, we assessed whether significant associations in incident AD and ALS were enriched among all proteins that were at least nominally significant in PD (based on the full follow-up period, outliers excluded), using a one-sided Fisher’s exact test. To assess whether overlapping nominally significant proteins exhibited concordant effect directions more often than expected by chance, we performed a one-sided binomial test. Furthermore, we assessed the correlation of effect sizes between traits by calculating Pearson’s correlation coefficients based on the beta estimates. To prevent bias in the correlation due to the discontinuity introduced by limiting the analysis to nominally significant results, we aligned effect directions by mirroring the beta coefficients: for proteins with negative betas in PD, we multiplied all corresponding betas in EPIC4PD and GNPC by −1 (generating ‘mirrowed beta values’).
Brain expression data and protein abundance of top biomarkers
Gene expression and protein abundance of top PD biomarkers across tissues and in the brain were assessed based on RNA-Seq and immunohistochemisty (IHC) data obtained from the Human Protein Atlas (HPA v24.0, based on Ensembl version 109, accessed May 6th, 2025 at https://www.proteinatlas.org/about/download)65. FDR-significant proteins were probed for tissue expression using bulk RNA sequencing (RNA-Seq) data across 50 human tissues available as gene-level ’Consensus RNA data’ (rna_tissue_consensus.tsv.zip). Furthermore, brain expression was assessed based on gene-level bulk RNA-Seq data across 13 brain regions based on samples from the Human Brain Tissue Bank (Semmelweis University, Budapest; rna_brain_region_hpa.tsv.zip) and based on single nucleus RNA-Seq (snRNA-Seq) data across 11 brain regions (rna_single_nuclei_cluster_type.tsv.zip). Protein measurements for the FDR-significant biomarkers across 45 human tissues were obtained from HPA based on IHC using tissue microarrays (normal_ihc_data.tsv.zip). To visualize expression patterns of the top biomarkers, heatmaps were generated using hierarchical clustering, with expression values scaled per gene highlighting tissue-specific variation.
Gene set, cell type enrichment and network analyses
Gene set enrichment analyses were performed based on the GO classification system using the PANTHER resource (Protein ANalysis THrough Evolutionary Relationships; version 19.0; released 2024-06-20; https://geneontology.org/). A list of all unique proteins that showed a suggestive signal in EPIC4PD and was uniquely mapped to a gene by PANTHER (n=154) was tested for enrichment for ‘biological process’, ‘molecular function’, and ‘cellular component’ GO categories as well as ‘Reactome pathways’. The statistical overrepresentation test was conducted using Fisher’s exact test with the reference list set to all analyzed SomaScan 7K proteins uniquely mapped by PANTHER (n=6,332). To avoid spurious associations, only GO categories with at least ten proteins present in the background list were considered.
Cell enrichment analyses were performed using the the Expression Weighted Cell Type Enrichment (EWCE) method to predict the likely primary disease-associated cells based on our prioritised proteins66. This approach generates a probability distribution of gene expression for specific cell types by comparing a target gene list to a background gene set. We applied EWCE to our protein lists and to either single-cell mouse brain expression data67 or RNA expression data from 23 selected cell types sourced from the Human Protein Atlas (dataset: rna_single_cell_type.tsv.zip)65. Each analysis was controlled at an FDR=0.05.
CorrelationAnalyzeR was used to investigate the coexpression of all FDR-significant preclinical biomarkers. Coexpression with SNCA was specifically examined in 12,712 normal GEO samples from immune cells and 3,198 normal GEO samples from brain. Permutation t tests against random gene lists of the same size were used to determine correlation significance with SNCA.
Protein-protein interaction network analysis
Furthermore, protein-protein interaction (PPI) networks were built using the STRING database (v12.0, accessed May 6th, 2025 at https://string-db.org)68 based on all FDR-significant EPIC4PD biomarkers and plot using igraph (v2.1.4). Interactions with a minimum interaction score of 0.4 were based on the following sources: ‘textmining’, ‘experiments’, ‘databases’, ‘co-expression”, “neighborhood’, ‘gene fusion’, and ‘co-occurrence’. Furthermore, to examine the list of suggestive preclinical proteins in relation to known PD genes, we utilized 109 proteins described in the most recent genome-wide association study for PD (see above)17. P values were calculated based on the observed interactions compared to 10,000 random permutations. Nodes not connected to the largest component were removed from the PPI network Figure 3.
Drug Repurposing
We leveraged the Open Targets platform19 to explore druggability and drug-repurposing potential for all FDR-significant, putative preclinical PD biomarkers. Open Targets integrates multiple data sources to systematically link genes and proteins with drug molecules and disease associations. We queried the database focusing on small-molecule evidence, excluding other therapeutic modalities such as biologics or gene therapies. Specifically, approximately 17,000 small-molecule drugs and 21,087 protein targets were screened for evidence of interaction with the 47 FDR-significant biomarkers. Evidence of druggability was defined by reported direct binding or predicted high-quality pharmacological modulation supported by curated experimental, 3D protein structure or clinical data within the Open Targets resource.
Prediction
To assess the proteins’ performance in risk stratification for PD, we performed variable selection in a Cox proportional hazards model using the Least Absolute Shrinkage and Selection Operator (LASSO) for variable selection and regularization. LASSO was implemented via the R package glmnet, using function cv.glmnet(family="cox", type.measure="C"). As the univariate Cox models had suggested substantial differences between sex strata, we developed the PPRS for each sex stratum separately. and center by stratified Cox models, using the StratifySurv() function69. For robust results we employed stability selection70. Specifically, individually for each stratum, we fitted LASSO penalized Cox models to 100 resampled datasets, each comprising a 63.2% subsample of individuals. Complete follow-up in each resampling dataset was considered. Missing protein abundance was imputed using k-nearest neighbor imputation (package: impute, function impute.knn()). Imputation was performed within each training subset to avoid data leakage. In each subsample, the optimal amount of regularization (lambda) was inferred through 5-fold cross-validation using the C-index as performance criterion. The model coefficients (weights) from all 100 runs were retained for downstream analysis. We computed the inclusion frequency and average coefficient for each protein across the 100 permutations. To reduce the risk of overfitting, we evaluated sparse protein subsets, using the 5, 10, 20, 50 and 100 most frequently selected proteins from the initial LASSO selection (Supplementary Table 20). For each EPIC4PD participant, a multivariate risk score (PPRS) was computed as the sum of the protein values multiplied by the respective average weights across permutations. Cox proportional hazards regression models were then fitted for each sex stratum with the sex-specific risk score as the predictor, stratified by center and adjusted for age at examination. In order to quantify the incremental benefit of the proteins, the model was also calculated without the risk score as a predictor. Model performance was quantified using the concordance index (C-index) obtained from summary(model)$concordance. Accuracy was determined in AGES as an external validation cohort (n=5,090, including 121 incident PD cases). For comparison purposes, PPRS were also explored without stratifying by sex. In this case, the basic model contained age at examination and sex.
Acknowledgements
The authors thank all EPIC participants and the countless scientists who have contributed to the EPIC cohort over the past 30 years. We thank the National Institute for Public Health and the Environment (RIVM), Bilthoven, the Netherlands, for contributing cases and ongoing support to the EPIC study. We sincerely thank Bertrand Hemon for his assistance with data management, and Prof. Paolo Vineis, Prof. Beate Ritz, Prof. Martin Wolkewitz, Prof. Klaus Berger, Prof. Valentina Gallo, Dr. Mark Frasier, and Dr. Lazaros Belbasis for helpful discussions.
Funding
Updates of the PD case ascertainments, generation and processing of proteomic data were funded by the Michael J Fox Foundation (#008994 to C.M.L. and E.R.). Part of the proteomic data were also generated with funding from the Cure Alzheimer’s Fund (to C.M.L. and L.B.), and the ‘CReATe- Clinical Research in ALS and Related Disorders for Therapeutic Development’ Consortium (to C.M.L. and L.B.). CReATe (U54 NS092091) is part of the Rare Diseases Clinical Research Network (RDCRN), an initiative of the Office of Rare Diseases Research (ORDR), NCATS and funded through collaborations between NCATS, NINDS, and the ALS Association. Intramural funding was provided by the Interdisciplinary Centre for Clinical Research, University Münster (to C.M.L.). Additional funding was supplied by the Interdisciplinary Centre for Clinical Research, University Münster (to C.M.L., Lil3/001/25). C.M. Lill was supported by the Heisenberg program of the DFG (DFG; LI 2654/4-1).
The coordination of EPIC-Europe is financially supported by International Agency for Research on Cancer (IARC) and also by the Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London which has additional infrastructure support provided by the NIHR Imperial Biomedical Research Centre (BRC). The national cohorts are supported by Associazione Italiana per la Ricerca sul Cancro-AIRC-Italy, Italian Ministry of Health, Italian Ministry of University and Research (MUR), Compagnia di San Paolo (Italy); Dutch Ministry of Public Health, Welfare and Sports (VWS), the Netherlands Organisation for Health Research and Development (ZonMW), World Cancer Research Fund (WCRF), (The Netherlands); Instituto de Salud Carlos III (ISCIII), Regional Governments of Andalucía, Asturias, Basque Country, Murcia and Navarra, and the Catalan Institute of Oncology - ICO (Spain); Cancer Research UK (C864/A14136 to EPIC-Norfolk; C8221/A29017 to EPIC-Oxford), Medical Research Council (MR/N003284/1, MC-UU_12015/1 and MC_UU_00006/1 to EPIC-Norfolk; MR/Y013662/1 to EPIC-Oxford) (United Kingdom). Previous support has come from “Europe against Cancer” Programme of the European Commission (DG SANCO). SomaScan® data were generated under Master Research Agreement, 14th December 2021, between Imperial College London and SomaLogic Inc. SomaLogic were not involved in analyzing or interpreting the data; or in writing or submitting the manuscript for publication.
National Institute on Aging (NIA) contracts N01-AG-12100 and HHSN271201200022C (for Vi. G.) and Althingi (the Icelandic Parliament) financed the AGES-Reykjavik study. IHA and Novartis have collaborated on proteomics research since 2012. This study was also funded by the NIA (1R01AG065596-01A1 to Vi. G., 2R01AG065596-03 to Va.G.), the University of Iceland Postdoctoral Fund (to E.A.F.) and the Icelandic Research Fund (2511008-051 to Va.G.). This research was supported, in part, by the Intramural Research Program (IRP) of the NIA. K.A.W. and F.L. are supported by the NIA IRP. ARIC is carried out as a collaborative study supported by National Heart, Lung, and Blood Institute (NHLBI) contracts (75N92022D00001, 75N92022D00002, 75N92022D00003, 75N92022D00004, 75N92022D00005). The ARIC Neurocognitive Study was additionally supported by U01HL096812, U01HL096814, U01HL096899, U01HL096902, and U01HL096917 from the NIH (NHLBI, NIA, National Institute of Neurological Disorders and Stroke [NINDS], and National Institute on Deafness and Other Communication Disorders [NIDCD]).
Funding Statement
Updates of the PD case ascertainments, generation and processing of proteomic data were funded by the Michael J Fox Foundation (#008994 to C.M.L. and E.R.). Part of the proteomic data were also generated with funding from the Cure Alzheimer’s Fund (to C.M.L. and L.B.), and the ‘CReATe- Clinical Research in ALS and Related Disorders for Therapeutic Development’ Consortium (to C.M.L. and L.B.). CReATe (U54 NS092091) is part of the Rare Diseases Clinical Research Network (RDCRN), an initiative of the Office of Rare Diseases Research (ORDR), NCATS and funded through collaborations between NCATS, NINDS, and the ALS Association. Intramural funding was provided by the Interdisciplinary Centre for Clinical Research, University Münster (to C.M.L.). Additional funding was supplied by the Interdisciplinary Centre for Clinical Research, University Münster (to C.M.L., Lil3/001/25). C.M. Lill was supported by the Heisenberg program of the DFG (DFG; LI 2654/4-1).
The coordination of EPIC-Europe is financially supported by International Agency for Research on Cancer (IARC) and also by the Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London which has additional infrastructure support provided by the NIHR Imperial Biomedical Research Centre (BRC). The national cohorts are supported by Associazione Italiana per la Ricerca sul Cancro-AIRC-Italy, Italian Ministry of Health, Italian Ministry of University and Research (MUR), Compagnia di San Paolo (Italy); Dutch Ministry of Public Health, Welfare and Sports (VWS), the Netherlands Organisation for Health Research and Development (ZonMW), World Cancer Research Fund (WCRF), (The Netherlands); Instituto de Salud Carlos III (ISCIII), Regional Governments of Andalucía, Asturias, Basque Country, Murcia and Navarra, and the Catalan Institute of Oncology - ICO (Spain); Cancer Research UK (C864/A14136 to EPIC-Norfolk; C8221/A29017 to EPIC-Oxford), Medical Research Council (MR/N003284/1, MC-UU_12015/1 and MC_UU_00006/1 to EPIC-Norfolk; MR/Y013662/1 to EPIC-Oxford) (United Kingdom). Previous support has come from “Europe against Cancer” Programme of the European Commission (DG SANCO). SomaScan® data were generated under Master Research Agreement, 14th December 2021, between Imperial College London and SomaLogic Inc. SomaLogic were not involved in analyzing or interpreting the data; or in writing or submitting the manuscript for publication.
National Institute on Aging (NIA) contracts N01-AG-12100 and HHSN271201200022C (for Vi. G.) and Althingi (the Icelandic Parliament) financed the AGES-Reykjavik study. IHA and Novartis have collaborated on proteomics research since 2012. This study was also funded by the NIA (1R01AG065596-01A1 to Vi. G., 2R01AG065596-03 to Va.G.), the University of Iceland Postdoctoral Fund (to E.A.F.) and the Icelandic Research Fund (2511008-051 to Va.G.). This research was supported, in part, by the Intramural Research Program (IRP) of the NIA. K.A.W. and F.L. are supported by the NIA IRP. ARIC is carried out as a collaborative study supported by National Heart, Lung, and Blood Institute (NHLBI) contracts (75N92022D00001, 75N92022D00002, 75N92022D00003, 75N92022D00004, 75N92022D00005). The ARIC Neurocognitive Study was additionally supported by U01HL096812, U01HL096814, U01HL096899, U01HL096902, and U01HL096917 from the NIH (NHLBI, NIA, National Institute of Neurological Disorders and Stroke [NINDS], and National Institute on Deafness and Other Communication Disorders [NIDCD]).
Footnotes
Competing interests
N.F. is an employee and stockholder of Novartis. K.A.W. is an Associate Editor at Alzheimer’s & Dementia, a member of the Editorial Board of Annals of Clinical and Translational Neurology, and on the Board of Directors of the National Academy of Neuropsychology. K.A.W. and I.W. have given unpaid seminars on behalf of SomaLogic. None of the other authors reports any competing interest.
Disclaimer: Where authors are identified as personnel of the International Agency for Research on Cancer / World Health Organization, the authors alone are responsible for the views expressed in this article and they do not necessarily represent the decisions, policy or views of the International Agency for Research on Cancer / World Health Organization.
Data access
Data can be accessed by external researchers after approval by the EPIC4ND working group and the EPIC steering committee. Potential collaborators can contact the EPIC4ND working group chair, Christina Lill (christina.lill@uni-muenster.de and clill@ic.ac.uk) and EPIC administrator Sherry Morris (epicadmin@imperial.ac.uk).
References
- 1.Poewe W. et al. Parkinson disease. Nat Rev Dis Primers 3, 1–21 (2017). [DOI] [PubMed] [Google Scholar]
- 2.Ray Dorsey E. et al. Global, regional, and national burden of Parkinson’s disease, 1990–2016: a systematic analysis for the Global Burden of Disease Study 2016. Lancet Neurol 17, 939–953 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Ishihara L. S., Cheesbrough A., Brayne C. & Schrag A. Estimated life expectancy of Parkinson’s patients compared with the UK population. J Neurol Neurosurg Psychiatry 78, 1304–1309 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Postuma R. B. & Berg D. Advances in markers of prodromal Parkinson disease. Nature Reviews Neurology vol. 12 622–634 Preprint at 10.1038/nrneurol.2016.152 (2016). [DOI] [PubMed] [Google Scholar]
- 5.Carrasco-Zanini J. et al. Proteomic signatures improve risk prediction for common and rare diseases. Nat Med 30, 2489–2498 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Deng Y.-T. et al. Atlas of the plasma proteome in health and disease in 53,026 adults. Cell (2024) doi: 10.1016/j.cell.2024.10.045. [DOI] [PubMed] [Google Scholar]
- 7.Helgason H. et al. Evaluation of Large-Scale Proteomics for Prediction of Cardiovascular Events. JAMA 330, 725–735 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Zetterberg H. Clinically relevant factors for blood-based Alzheimer biomarker interpretation. Alzheimer’s & Dementia 19, (2023). [Google Scholar]
- 9.Gail M. H. et al. Design choices for observational studies of the effect of exposure on disease incidence. BMJ Open 9, (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Gan Y.-H. et al. Large-scale proteomic analyses of incident Parkinson’s disease reveal new pathophysiological insights and potential biomarkers. Nat Aging 5, 642–657 (2025). [DOI] [PubMed] [Google Scholar]
- 11.Lill C. M. et al. EPIC4ND: European Prospective Investigation into Cancer and Nutrition follow-up for neurodegenerative diseases. Preprint at 10.1101/2025.01.29.25321340 (2025). [DOI] [Google Scholar]
- 12.Riboli E. & Kaaks R. The EPIC Project: rationale and study design. European Prospective Investigation into Cancer and Nutrition. Int J Epidemiol 26, 6S–14 (1997). [DOI] [PubMed] [Google Scholar]
- 13.Riboli E. et al. European Prospective Investigation into Cancer and Nutrition (EPIC): study populations and data collection. Public Health Nutr 5, 1113–1124 (2002). [DOI] [PubMed] [Google Scholar]
- 14.Imam F. et al. The Global Neurodegeneration Proteomics Consortium: Biomarker and Drug Target Discovery Across ~35,000 Biosamples Analyses for AD, PD, ALS, FTD, and Aging. [#NMED-A141443]. Niklas Mattsson-Carlgren vol. 15. [Google Scholar]
- 15.Doostparast Torshizi A. et al. Proteogenomic network analysis reveals dysregulated mechanisms and potential mediators in Parkinson’s disease. Nat Commun 15, 6430 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Belbasis L., Morris S., van Duijn C., Bennett D. & Walters R. Mendelian randomization identifies proteins involved in neurodegenerative diseases. Brain awaf018 (2025) doi: 10.1093/brain/awaf018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Nalls M. A. et al. Identification of novel risk loci, causal insights, and heritable risk for Parkinson’s disease: a meta-analysis of genome-wide association studies. Lancet Neurol 18, 1091–1102 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Doostparast Torshizi A. et al. Proteogenomic network analysis reveals dysregulated mechanisms and potential mediators in Parkinson’s disease. Nat Commun 15, 6430 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Buniello A. et al. Open Targets Platform: facilitating therapeutic hypotheses building in drug discovery. Nucleic Acids Res 53, D1467–D1475 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Lundberg M., Eriksson A., Tran B., Assarsson E. & Fredriksson S. Homogeneous antibody-based proximity extension assays provide sensitive and specific detection of low-abundant proteins in human blood. Nucleic Acids Res 39, (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Pietzner M. et al. Synergistic insights into human health from aptamer- and antibody-based proteomic profiling. Nat Commun 12, (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Eldjarn G. H. et al. Large-scale plasma proteomics comparisons through genetics and disease associations. Nature 622, 348–358 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Eldjarn G. H. et al. Correction to: Large-scale plasma proteomics comparisons through genetics and disease associations (Nature, (2023), 622, 7982, (348–358), 10.1038/s41586-023-06563-x). Nature vol. 630 E3 Preprint at https://doi.org/10.1038/s41586-024-07549-z (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Rooney M. R. et al. Correlations Within and Between Highly Multiplexed Proteomic Assays of Human Plasma. Clin Chem 71, 677–687 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Wang B. et al. Comparative studies of 2168 plasma proteins measured by two affinity-based platforms in 4000 Chinese adults. Nat Commun 16, 1869 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Lill C. M. et al. (Dys)regulation of the Immune System in Parkinson’s Disease: Methodologies, Techniques, and Key Findings from Human Studies. Aging Dis (2025) doi: 10.14336/AD.2024.1163. [DOI] [PubMed] [Google Scholar]
- 27.Tansey M. G. et al. Inflammation and immune dysfunction in Parkinson disease. Nat Rev Immunol 22, 657–673 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Kedmi M., Bar-Shira A., Gurevich T., Giladi N. & Orr-Urtreger A. Decreased expression of B cell related genes in leukocytes of women with Parkinson’s disease. Mol Neurodegener 6, 66 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Tatenhorst L. et al. Glypican-4 serum levels are associated with cognitive dysfunction and vascular risk factors in Parkinson’s disease. Sci Rep 14, 5005 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Fan Y. et al. Increased plasma lipocalin-2 levels are associated with nonmotor symptoms and neuroimaging features in patients with Parkinson’s disease. J Neurosci Res 102, e25303 (2024). [DOI] [PubMed] [Google Scholar]
- 31.Elango R. et al. Potential Biomarkers for Parkinson Disease from Functional Enrichment and Bioinformatic Analysis of Global Gene Expression Patterns of Blood and Substantia Nigra Tissues. Bioinform Biol Insights 17, 11779322231166214 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Hepp D. H. et al. Inflammatory Blood Biomarkers Are Associated with Long-Term Clinical Disease Severity in Parkinson’s Disease. Int J Mol Sci 24, (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Yao L. et al. Bioinformatic Analysis of Genetic Factors from Human Blood Samples and Postmortem Brains in Parkinson’s Disease. Oxid Med Cell Longev 2022, 9235358 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Irmady K. et al. Blood transcriptomic signatures associated with molecular changes in the brain and clinical outcomes in Parkinson’s disease. Nat Commun 14, 3956 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Miki Y. et al. Alteration of autophagy-related proteins in peripheral blood mononuclear cells of patients with Parkinson’s disease. Neurobiol Aging 63, 33–43 (2018). [DOI] [PubMed] [Google Scholar]
- 36.Santiago J. A., Littlefield A. M. & Potashkin J. A. Integrative transcriptomic meta-analysis of Parkinson’s disease and depression identifies NAMPT as a potential blood biomarker for de novo Parkinson’s disease. Sci Rep 6, 34579 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Kim B.-W. et al. Pathogenic Upregulation of Glial Lipocalin-2 in the Parkinsonian Dopaminergic System. J Neurosci 36, 5608–22 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Hastings N. et al. Connexin 43 is downregulated in advanced Parkinson’s disease in multiple brain regions which correlates with symptoms. Sci Rep 15, 10250 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Lorenzl S., Albers D. S., Narr S., Chirichigno J. & Beal M. F. Expression of MMP-2, MMP-9, and MMP-1 and their endogenous counterregulators TIMP-1 and TIMP-2 in postmortem brain tissue of Parkinson’s disease. Exp Neurol 178, 13–20 (2002). [DOI] [PubMed] [Google Scholar]
- 40.Morris H. R., Spillantini M. G., Sue C. M. & Williams-Gray C. H. The pathogenesis of Parkinson’s disease. Lancet 403, 293–304 (2024). [DOI] [PubMed] [Google Scholar]
- 41.Heinzel S. et al. Update of the MDS research criteria for prodromal Parkinson’s disease. Movement Disorders vol. 34 1464–1470 Preprint at 10.1002/mds.27802 (2019). [DOI] [PubMed] [Google Scholar]
- 42.Bestwick J. P. et al. Improving estimation of Parkinson’s disease risk—the enhanced PREDICT-PD algorithm. NPJ Parkinsons Dis 7, (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Marini K. et al. Comparison of different risk scores for Parkinson disease in a population-based 10-year study. Eur J Neurol 30, 3347–3352 (2023). [DOI] [PubMed] [Google Scholar]
- 44.Irmady K. et al. Blood transcriptomic signatures associated with molecular changes in the brain and clinical outcomes in Parkinson’s disease. Nat Commun 14, 3956 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Santiago J. A., Littlefield A. M. & Potashkin J. A. Integrative transcriptomic meta-analysis of Parkinson’s disease and depression identifies NAMPT as a potential blood biomarker for de novo Parkinson’s disease. Sci Rep 6, 34579 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Kedmi M., Bar-Shira A., Gurevich T., Giladi N. & Orr-Urtreger A. Decreased expression of B cell related genes in leukocytes of women with Parkinson’s disease. Mol Neurodegener 6, 66 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Tatenhorst L. et al. Glypican-4 serum levels are associated with cognitive dysfunction and vascular risk factors in Parkinson’s disease. Sci Rep 14, 5005 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Miki Y. et al. Alteration of autophagy-related proteins in peripheral blood mononuclear cells of patients with Parkinson’s disease. Neurobiol Aging 63, 33–43 (2018). [DOI] [PubMed] [Google Scholar]
- 49.Gallo V. et al. Parkinson’s Disease Case Ascertainment in the EPIC Cohort: The NeuroEPIC4PD Study. Neurodegener Dis 15, 331–338 (2015). [DOI] [PubMed] [Google Scholar]
- 50.Consortium T. I. Design and cohort description of the InterAct Project: an examination of the interaction of genetic and lifestyle factors on the incidence of type 2 diabetes in the EPIC Study. Diabetologia 54, 2272–2282 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Gold L. et al. Aptamer-based multiplexed proteomic technology for biomarker discovery. PLoS One 5, (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Candia J., Daya G. N., Tanaka T., Ferrucci L. & Walker K. A. Assessment of variability in the plasma 7k SomaScan proteomics assay. Sci Rep 12, (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.PRENTICE R. L. A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73, 1–11 (1986). [Google Scholar]
- 54.Altman D. G. & Bland J. M. Interaction revisited: the difference between two estimates. BMJ 326, 219 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Lill C. M. et al. Impact of Parkinson’s disease risk loci on age at onset. Movement Disorders 30, 847–850 (2015). [DOI] [PubMed] [Google Scholar]
- 56.Harris T. B. et al. Age, Gene/Environment Susceptibility-Reykjavik Study: Multidisciplinary Applied Phenomics. Am J Epidemiol 165, 1076–1087 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Wright J. D. et al. The ARIC (Atherosclerosis Risk In Communities) Study. J Am Coll Cardiol 77, 2939–2959 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Sudlow C. et al. UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age. PLoS Med 12, e1001779- (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Harris T. B. et al. Age, Gene/Environment Susceptibility-Reykjavik Study: multidisciplinary applied phenomics. Am J Epidemiol 165, 1076–87 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Bankier S. et al. Circulating causal protein networks linked to future risk of myocardial infarction. medRxiv (2025) doi: 10.1101/2025.02.07.25321789. [DOI] [Google Scholar]
- 61.Harris T. B. et al. Age, Gene/Environment Susceptibility-Reykjavik Study: Multidisciplinary Applied Phenomics. Am J Epidemiol 165, 1076–1087 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Scher A. I. et al. Midlife migraine and late-life parkinsonism: AGES-Reykjavik study. Neurology 83, 1246–52 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Huang X. et al. Statins, plasma cholesterol, and risk of Parkinson’s disease: a prospective study. Mov Disord 30, 552–9 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Doostparast Torshizi A. et al. Proteogenomic network analysis reveals dysregulated mechanisms and potential mediators in Parkinson’s disease. Nat Commun 15, (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Karlsson M. et al. A single–cell type transcriptomics map of human tissues. Sci Adv 7, eabh2169 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Skene N. G. & Grant S. G. N. Identification of Vulnerable Cell Types in Major Brain Disorders Using Single Cell Transcriptomes and Expression Weighted Cell Type Enrichment. Front Neurosci Volume 10-2016, (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Zeisel A. et al. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science (1979) 347, 1138–1142 (2015). [DOI] [PubMed] [Google Scholar]
- 68.Szklarczyk D. et al. The STRING database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res 51, D638–D646 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Simon N., Friedman J., Hastie T. & Tibshirani R. Regularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent. J Stat Softw 39, (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Meinshausen N. & Bühlmann P. Stability Selection. J R Stat Soc Series B Stat Methodol 72, 417–473 (2010). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data can be accessed by external researchers after approval by the EPIC4ND working group and the EPIC steering committee. Potential collaborators can contact the EPIC4ND working group chair, Christina Lill (christina.lill@uni-muenster.de and clill@ic.ac.uk) and EPIC administrator Sherry Morris (epicadmin@imperial.ac.uk).

