Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Dec 1.
Published in final edited form as: Nat Genet. 2024 Nov 11;56(12):2672–2684. doi: 10.1038/s41588-024-01972-8

Proteogenomic analysis of human cerebrospinal fluid identifies neurologically relevant regulation and implicates causal proteins for Alzheimer’s disease

Daniel Western 1,2,3, Jigyasha Timsina 1,2, Lihua Wang 1,2, Ciyang Wang 1,2,3, Chengran Yang 1,2, Bridget Phillips 1,2,3, Yueyao Wang 1,2, Menghan Liu 1,2, Muhammad Ali 1,2, Aleksandra Beric 1,2, Priyanka Gorijala 1,2, Pat Kohlfeld 1,2, John Budde 1,2, Allan I Levey 4, John C Morris 5, Richard J Perrin 5,6,7, Agustin Ruiz 8,9,10, Marta Marquié 8,9, Mercè Boada 8,9, Itziar de Rojas 8,9, Jarod Rutledge 11, Hamilton Oh 11, Edward N Wilson 11,12, Yann Le Guen 12,13, Lianne M Reus 14,15, Betty Tijms 14,15, Pieter Jelle Visser 14,15,16, Sven J van der Lee 14,15,17, Yolande AL Pijnenburg 14,15, Charlotte E Teunissen 18, Marta del Campo Milan 19,20, Ignacio Alvarez 21, Miquel Aguilar 21; Dominantly Inherited Alzheimer Network (DIAN)22,*; the Alzheimer’s Disease Neuroimaging Initiative (ADNI)22,*, Michael D Greicius 11,12, Pau Pastor 21, David J Pulford 23, Laura Ibanez 1,2,5, Tony Wyss-Coray 11, Yun Ju Sung 1,2,24, Carlos Cruchaga 1,2,7,25,
PMCID: PMC11831731  NIHMSID: NIHMS2053206  PMID: 39528825

Summary

The integration of quantitative trait loci (QTL) with disease genome-wide association studies (GWAS) has proven successful at prioritizing candidate genes at disease-associated loci. QTL mapping has been focused on multi-tissue expression QTL or plasma protein QTL (pQTL). We generated a cerebrospinal fluid (CSF) pQTL atlas by measuring 6,361 proteins in 3,506 samples. We identified 3,885 associations for 1,883 proteins, including 2,885 novel pQTLs, demonstrating unique genetic regulation in CSF. We identified CSF-enriched pleiotropic regions on chr3q28 near OSTN and chr19q13.32 near APOE that were enriched for neuron-specificity and neurological development. We integrated our associations with Alzheimer’s disease (AD) through PWAS, colocalization and Mendelian Randomization and identified 38 putative causal proteins, 15 of which have drugs available. Finally, we developed a proteomics-based AD prediction model that outperforms genetics-based models. These findings will be instrumental to further understand the biology and identify causal and druggable proteins for brain and neurological traits.

Introduction

Genome-wide association studies (GWAS) have become increasingly common over the last 15 years. Many traits and diseases have been studied in hundreds of thousands of individuals, identifying robust disease-associated loci13. However, translation of the associations to pathways and treatments is challenging, as identifying the causal genes and how they interact requires additional downstream analyses and integration of several types of omic data.

Recently, studies analyzing the genetic regulation of gene expression (GTEx consortium, eQTLGen, and MetaBrain46), have described loci affecting mRNA levels. These resources have been integrated with disease GWAS analyses to prioritize genes at previously uncharacterized loci7. However, analyses focusing on mRNA miss disease-relevant biology. Many of the studies focus solely on genetic variants that are near the gene encoding the mRNA (cis-variants), preventing the identification of cross-genome regulatory effects. Additionally, the correlation between levels of mRNA and their encoded proteins is weak8,9, while proteins are typically the molecule that most-directly act on disease. Because of this weak correlation, the overlap between expression (mRNA) quantitative trait loci (QTL) and protein QTL is also low10.

Large studies have investigated genetic association of protein levels, but they have overwhelmingly focused on plasma1115; previous work has shown that plasma proteogenomics shares little overlap with the brain10. Small studies have analyzed the proteogenomic signature of cerebrospinal fluid (CSF)10,1619. Targeting CSF proteins has proven successful at elucidating causal genes at some disease GWAS loci, including an association for the TREM2 protein at the MS4A locus potentially an Alzheimer’s Disease (AD) risk locus7,20 and an association for GRN protein levels near LRRK2 potentially relevant for Parkinson’s Disease (PD)21. However, current studies are limited by comparatively small sample sizes or number of proteins analyzed.

AD is the most common highly heritable neurodegenerative disease, with heritability estimated to be approximately 70%22. GWAS have identified over 75 loci associated with AD23, but the genes driving the associations for most of them are still unknown, and little research has been done to understand how these genes interact in specific pathways and disease mechanisms.

Here, we present an investigation of the genomic signature of the human CSF proteome. We analyze the genetic regulation of 6,361 proteins using CSF from 3,506 healthy and neurologically impaired individuals. We classify proteogenomic hotspots that regulate multiple proteins, then integrate our results with an AD risk GWAS to identify novel causal and druggable proteins relevant to AD. Finally, we use our AD-associated proteins to build a prediction model that excels at classifying AD status.

Results

Unique genetic architecture of CSF proteomics

We performed a proteogenomic analysis of CSF (Fig. 1) using proteomic (aptamer-based assay: SOMAscan 7k24,25; Supp. Fig. 1; Supp. Table S1) and genetic data from 3,506 unrelated individuals of European ancestry (EUR, Supp. Fig. 2). We included 1,243 cognitively normal controls, 1,021 late-onset AD cases, and 1,242 individuals with other neurodegenerative disorders (Table 1). We performed a joint analysis on 7,008 aptamers (6,361 proteins) and samples from eight cohorts (see Methods, Fig. 2A, Supp. Fig. 3, Supp. Tables S1&S2), using stringent significance thresholds (cis: P<5×10−8; trans, P<3.45×10−11; see Methods) to define protein-variant associations (pQTLs).

Fig. 1: Study Design.

Fig. 1:

PPMI: Parkinson’s Progression Markers Initiative; ADNI: Alzheimer’s Disease; Neuroimaging Inititiative; DIAN: Dominantly Inherited Alzheimer’s Network; FACE: Fundació Ace; Knight-ADRC: Knight-ADRC Memory and Aging Project; MARS: Washington University Movement Disorder clinic; VEP: Variant Effect Predictor

Table 1:

Demographics of post-QC samples used in the joint analysis.

Cohort # Samples Avg. Age % Male % APOE4+ #CO #AD #ADAD #ADRD #PD #OT
ADNI 689 73.7 (SD 7.5) 58.2 50.2 149 521 0 0 0 19
Barcelona-1 198 68.9 (SD 7.4) 52.5 40.4 4 63 0 56 1 74
DIAN 195 38.6 (SD 10.9) 48.4 27.2 76 2 116 0 0 1
FACE 439 71.9 (SD 8.3) 41.0 35.8 128 239 0 0 0 72
Knight-ADRC (Oct 2021) 805 71.3 (SD 8.7) 46.6 39.3 566 175 2 13 3 46
Knight-ADRC (June 2023) 47 72.9 (SD 7.6) 59.6 38.3 40 7 0 0 0 0
MARS 176 66.4 (SD 8.9) 54.0 20.5 0 0 0 0 0 176
PPMI 785 61.8 (SD 9.4) 57.7 22.2 157 0 0 0 627 1
Stanford 172 69.5 (SD 6.2) 40.7 40.7 123 14 0 2 15 18

Total 3506 67.5 (SD 11.8) 51.4 35.7 1243 1021 118 71 646 407

ADNI: Alzheimer’s disease Neuroimaging Initiative; PPMI: Parkinson’s Progression Markers Initiative; FACE: Ace Alzheimer Center Barcelona; DIAN: Dominantly-Inherited Alzheimer’s Network; Knight-ADRC: Knight ADRC Memory and Aging Project; MARS: Washington University Movement Disorder Clinic; Stanford: Stanford Iqbal Farrukh and Asad Jamal ADRC and Aging Memory Study

%APOE4+: Percentage of individuals who are carriers of at least one C allele at rs429358.

#CO: Number of cognitively normal individuals

#AD: Number of individuals affected with late-onset Alzheimer’s disease, as determined by clinical status

#ADAD: Number of individuals affected with early-onset Alzheimer’s disease

#ADRD: Number of individuals affected with non-AD dementia (including frontotemporal dementia, lewy body dementia, etc.)

#PD: Number of individuals affected with Parkinson’s disease

#OT: Number of individuals with unclear pathology

Fig. 2: Cerebrospinal fluid pQTLs are consistent across disease and largely tissue and molecule-specific.

Fig. 2:

a. Combined Manhattan plot of the pQTL associations identified from linear regression analysis of 3,506 CSF samples and 7,008 aptamers. X-axis: genome position of an associated variant; y-axis: −log10(p-value) for association of each SNP with an aptamer. b. 2D Manhattan plot of 2,477 index pQTLs for 2,042 aptamers that were significant in the joint analysis. X-axis: genome position of the pQTL signal; y-axis: location of the protein-coding gene corresponding to the aptamer with a pQTL. Color represents cis or trans status of the index variant (cis, blue; trans, green). The top panel maps the pleiotropic regions of the genome (limited to 100 proteins with an association in the same region). c. Scatter plot of index pQTL variant absolute effect size (y-axis) vs effect allele frequency (x-axis). Color represents cis or trans status of the index variant (cis, blue; trans, green). Correlation was calculated using the Pearson method with two-sided P-values (cis P=1.36×10−73, trans P=2.97×10−115). d. Scatter plot of cis index pQTL −log10(P) (y-axis) vs distance from the transcription start site of the gene encoding the associated protein (x-axis). Color represents the minor allele frequency (darker=less common). e. Scatter plot of effect size of index pQTL variants in dichotomized amyloid/tau positive samples (x-axis) vs dichotomized amyloid/tau negative samples (y-axis). Color represents minor allele frequency (darker=less common). Correlation was calculated using the Pearson method with two-sided P estimated to be below the underflow value. f. Colocalization of CSF pQTL associations with plasma pQTL associations from Ferkingstad et al11. Plasma + CSF: pQTL associations that colocalized across tissues. CSF-specific: pQTL associations that did not colocalize with plasma pQTLs for the same protein. New Proteins: pQTL associations for proteins measured in CSF but not plasma. Color represents cis or trans status of the index CSF pQTL (cis, blue; trans, green). g. Colocalization of cis CSF pQTL associations with various QTL types. Green bars represent brain-relevant tissues, while blue bars represent other tissues. Bar labels represent the number of colocalizing QTLs.

We identified 2,477 index pQTL associations for 2,042 aptamers (1,883 proteins; Fig. 2B, Supp. Table S2). Correlations for aptamers binding the same protein are in Supp. Fig. 4 and Supp. Table S3. Of the 2,477 associations, 1,272 (51.4%) were cis and 1,205 (48.6%) were trans-pQTLs. We performed cohort-specific analyses and a cross-cohort meta-analysis to obtain heterogeneity estimates (Supp. Table S4) and observed highly concordant results (Pearson R=1, P<2.2×10−16, Supp. Fig. 5) between the joint and meta-analysis effect sizes. Cohort-specific effect sizes and standard error values are in Supp. Table S4. Comparing 388 index variants with overlapping proteins on an antibody-based (Olink) platform (Amsterdam Dementia Cohort, ADC; N=502), we observed good correlation (R=0.69, P<2.2×10−16, Supp. Fig. 6, Supp. Table S5) which is consistent with protein level correlations between SOMAscan and Olink in CSF26, and support the pQTLs identified in this study (see Supplementary Notes). For pathway enrichment analysis results for all proteins with a pQTL, see Supp. Fig. 7 and Supp. Table S6.

Given the prevalence of neurological disease in the dataset, we determined if the pQTLs were consistent across cognitively healthy and affected individuals. We classified individuals into AD-relevant biomarker groups using the amyloid/tau/neurodegeneration (A/T/N) framework27 (see methods, A+T+ n=798, AT n=945) and performed association analysis in each group. We observed strong correlation between all groups (R>0.97, P<2.2×10−16; Fig. 2E, Supp. Table S7, Supp. Fig. 8), indicating that our pQTLs are consistent across disease states. A small proportion of associations showed differences in their effects (see Supplementary Notes).

To identify independent signals in a locus, we performed conditional analyses on all index SNPs (Supp. Fig. 3G, Supp. Table S8). In total, we identified 3,885 conditionally independent associations (see Supplementary Notes, Supp. Table S8). Most proteins (n=1,025, 54.4%) had a single association, but we observed up to 16 independent cis associations for two proteins (SIRPB1 & GSTM1). We annotated each association by identifying variants in linkage disequilibrium (LD, R2>0.8) with each pQTL and used the Ensembl Variant Effect Predictor (VEP)28 to determine the most severe annotation for each variant set (Supp. Table S912, Supp. Fig. 9). Higher pQTL effect size was correlated with more severe annotation (R=0.26, P<2.2×10−16, Supp. Fig. 10). QTLs were enriched for protein-altering variants (PAVs, see Supplementary Notes, Supp. Fig. 9B, Supp. Table S12, P<0.001, Fold change=9.78), a trend that persisted after removing potential binding artifacts (P < 0.001, FC = 6.39), indicating that pQTL are enriched for coding variants in comparison with non-pQTL variants. Due to decreased power to detect rare variants with small effect sizes, common variants had lower effect sizes on average than rare variants (Fig. 2C). We also confirmed previous evidence10,13 that cis-pQTL effect sizes correlate with proximity to the protein coding gene (Fig. 2D; Supp. Fig. 3H).

We and others have reported limited pQTL overlap across tissues and with other molecular QTLs10,12. We analyzed the overlap of CSF pQTL with a plasma pQTL resource derived from 35,000 individuals and 5,000 proteins (SOMAscan5k; Supp. Table S13)11. In total, 4,735 aptamers overlapped between the plasma and this study, potentially covering 1,821 (73.5%, out of 2,477) CSF pQTLs. Of these, 1,232 (67.6%, PP.H4<0.8) failed to colocalize, representing CSF-specific signals(Fig. 2F). The colocalization results were consistent with LD overlap between plasma and CSF (Supp. Fig. 11, Supp. Table S14). Trans associations were more likely to be CSF-specific than cis (655/862 trans vs 577/959 cis, P=4.19×10−13), consistent with previous work10. We also identified 656 novel pQTL (26.4% of 2,477 total, 313 cis and 343 trans) for proteins only analyzed in CSF. A total of 1,888 CSF pQTLs (76.2%) were observed only in CSF.

Second, we assessed colocalization between the CSF pQTL and previous brain (n=380) and plasma (n=529) pQTL (SOMAscan1.3k; Supp. Tables S15&16)10. In total 1,004 aptamers and 520 associations overlapped with brain and 863 aptamers and 428 associations with plasma. We observed colocalization (PP.H4≥0.8) for 16.8% (72/428) of associations in plasma and 6.7% (35/520) in brain (Fig. 2G). The low overlap with brain may be driven by small sample sizes. We identified 1,861 index CSF pQTL (75.1% of all, 866 cis, 995 trans) that failed to colocalize with any pQTL across tissues (CSF-specific, 1,280) or were associated with newly-measured proteins (true novel, 581). When analyzing conditional associations 2,885 were novel to CSF, of which 2,007 were CSF-specific and 878 were novel.

We next compared our cis-pQTL associations to eQTL from whole blood4,5 and neurologically-relevant tissues4,6,29 (Supp. Tables S17S26). We observed the largest overlap with cortex and cerebellum eQTLs (Fig. 2G). In total, 624 (49.1%) cis-pQTL did not colocalize with any eQTL. Of the 648 that overlap, 78.9% (511) colocalize with a neurologically-relevant tissue and 50.8% (329) are unique to neurological tissues. Only 21.1% (137) colocalize between CSF and whole blood. Integrating the results for all comparisons, 428 (33.6%) CSF cis-pQTL were novel, failing to colocalize with any eQTL analyzed.

In summary, and consistent with previous research10, we observed a high degree of both tissue and molecule specificity for our CSF pQTL. For cis-pQTL specifically, the highest overlap was with plasma pQTLs (382/1,272, 30.0% shared) and cortex eQTLs (288/1,272, 22.6%, Fig. 2G, Supp. Table S26), pointing to similarities at the protein level and between CSF and brain. These findings emphasize that proteins are regulated differently than RNA, including through post-translational mechanisms (phosphorylation, glycosylation, cleavage, etc.) that affect protein localization, excretion, and conformational changes, all factors that can affect CSF protein levels.

Pleiotropic regions regulate neurological pathways

Pleiotropic genomic regions may represent key drivers of biological pathways but are often ignored in QTL studies. To identify regions that regulated multiple proteins, we grouped the index pQTL variants from each association using linkage disequilibrium (LD). We identified 166 genomic regions associated with at least two proteins (Fig. 2B, top; Supp. Fig. 3, Supp. Table S27), with three regions (chr3q28, chr6p22.2-21.32, and chr19q13.32) harboring associations for more than 50 proteins. While we focus on these here, more regions are discussed in the Supplementary Notes. Due to the complexity of the HLA region on chr6p22.2-21.32, we grouped all variants in this locus regardless of LD (see Methods). For these regions, we performed pathway and cell-type enrichment analyses to identify the cellular context of the regulated proteins. We then performed a pheWAS using the GWAS Catalog30 to determine other traits regulated by the pQTL variants in each region.

The chr3q28 region is intergenic, located between GMNC and OSTN, and consists of twelve index variants corresponding to one LD block (R2>0.5, Supp. Fig. 12I) associated with 208 unique proteins (Fig. 3A, Supp. Table S27), all of them trans. Only one pQTL was observed in this region in plasma11, suggesting this is a CSF-specific pQTL hotspot. Proteins regulated by this region included five members of the syntaxin family involved in synapse function31 and five ephrin family members involved in neural development and memory32,33. For each of the 208 proteins, we analyzed brain-relevant cell-type expression data34 (Supp. Table S28) to determine cell specificity and observed a significant enrichment in neurons (Fig. 3B, Supp. Fig. 13, Supp. Table S29), supporting relevance of this region for brain-related traits. In addition, we observed 303 pathways enriched for these proteins. The most-significant were almost exclusively neuronal and cell surface pathways (Fig. 3C, Supp. Fig. 12, Supp. Table S30), including neuron projection development (GO:0031175), cell junction (GO:0030054), axon development (GO:0061564), and SNAP receptor activity (GO:0048812). A PheWAS of the index variants in this region identified twelve associated traits, eleven of which were measures of brain morphology, volume, or surface area (Fig. 3D, Supp. Table S31). Importantly, we also observed an association between rs9877502 and CSF levels of phosphorylated tau (P=1×10−36), a key biomarker for AD35. We note that OSTN, the gene flanking the 3’ end of this region, is highly expressed in human neurons, regulates dendritic growth in the human brain and is proposed as a candidate gene at this locus17. GMNC, a gene involved in DNA replication, flanks the 5’ end of the region.

Fig. 3: Three pleiotropic regions make up hotspots of protein regulation involved in neurological processes.

Fig. 3:

a. Circos plot showing the genomic locations of all protein-coding genes whose proteins are regulated by the chr3q28 pleiotropic region. Colors represent the predominant brain-relevant cell type corresponding to that protein, as detailed in b. b. Enrichment of proteins associated with the chr3 region in brain-relevant cell types (based on classification shown in a). Fold change was calculated based on the number of cell type-specific proteins in the region compared to the number in the entire SOMAscan7k panel. Colors match those shown in a. Enrichment p-value was calculated using a one-sided hypergeometric test. c. Selected pathways enriched for proteins associated with the chr3 region. Gene Ratio represents the proportion of all proteins associated with the region that are part of each pathway. d. PheWAS of regression p-values of index pQTL SNPs located in the chr3 region, as found in the GWAS catalog30. P-values were directly obtained from the GWAS catalog. e. Circos plot showing the genomic locations of all protein-coding genes whose proteins are associated with the chr6p22.2-21.32 pleiotropic region. f. Enrichment of proteins associated with the chr6 region in brain-relevant cell types. Enrichment p-value was calculated using a one-sided hypergeometric test. g. Selected pathways enriched for proteins associated with the chr6 region. h. Selected regression p-values for traits and diseases associated with index pQTL SNPs located in the chr6 region, as determined by the GWAS catalog. P-values were directly obtained from the GWAS catalog. i. Circos plot showing the genomic locations of all protein-coding genes whose proteins are regulated by the chr19q13.32 pleiotropic region. j. Enrichment of proteins associated with the chr19 pleiotropic region in brain-relevant cell types. Enrichment p-value was calculated using a one-sided hypergeometric test. k. Selected pathways enriched for proteins associated with the chr19 region. l. Selected regression p-values for traits and diseases associated with index pQTL SNPs located in the chr19 region, as determined by the GWAS catalog. P-values were directly obtained from the GWAS catalog.

The chr6p22.2-21.32 pleiotropic region spanned about 8MB in the HLA region and included 161 associations in 63 LD blocks (R2>0.5) for 70 unique proteins (Fig. 3E, Supp. Table S27). The same region in plasma contained 1,756 associations11, significantly more than CSF (9.71% of all plasma associations vs 6.50% of all CSF associations; P=3.10×10−7). We identified 32 cis associations for 23 unique proteins and 129 trans associations for 47 unique proteins. These included multiple complement components such as complement C2 (C2), complement factor B (CFB), and C4a anaphylatoxin (C4A|C4B). The proteins with pQTLs in this region were enriched in microglia (Fig. 3F, Supp. Fig. 13, Supp. Table S29). Pathway analysis identified 86 enriched pathways using the 70 proteins (Fig. 3G, Supp. Fig. 14, Supp. Table S32). These were highly immune-specific, including regulation of immune response (GO:0050776), leukocyte mediated immunity (GO:0002443), and antigen processing and presentation (KEGG:hsa04612). Through a pheWAS of the index pQTL variants in this region, we identified 169 associations with traits and diseases (Fig. 3H, Supp. Table S33). The shared associations in this region with both immunological and neurodegenerative traits adds evidence to the known relationship between these two processes36.

In CSF, the APOE region at chr19q13.32 was associated with the largest number of proteins, with eleven unique index variants in three LD blocks (R2>0.5, Supp. Fig. 15) associated with the levels of 335 proteins (Fig. 3I, Supp. Table S27), of which four (three aptamers tagging APOE and APOC2) are in cis. The two variants that determine the APOE genotype, the strongest genetic risk factors for Alzheimer’s Disease (rs429358; APOE ε4, and rs7412; APOE ε2), were the index variants for 205 and 49 associations respectively. After conditioning on both, 25 aptamers (7.3%) still had significant associations in the region (Supp. Table S34, Supp. Fig 15G, see Supplementary Notes). These results were consistent in APOE ε33 carrier-specific results (Supp. Table S35), suggesting that additional variants in this region are relevant outside of APOE ε2 & ε4. These associations were consistent between controls and AD cases, suggesting APOE has an important role regardless of disease status (see Supplementary Notes). We observed significantly more associations in this region in CSF than in plasma11 (13.8% of all CSF pQTL vs 0.67% of all plasma pQTL, P<2.2×10−16). Proteins with associations in this region featured known AD biomarkers, including four members of the 14-3-3 protein family and neurofilament light and heavy chains37,38. Calcineurin (PPP3R1), associated with both phosphorylated tau levels and rate of decline in AD39, is also genetically regulated by this region. Interestingly, although we observed an enrichment of proteins with APOE-region associations in neurons (Fig. 3J, Supp. Fig. 13, Supp. Table S29), APOE is mainly expressed in astrocytes (Supp. Table S28). This suggests potential cell-to-cell communication between astrocytes and neurons. Pathway analysis results support this hypothesis (Fig. 3K, Supp. Fig. 15), with enrichment for neuronal pathways including long-term potentiation (Reactome:RHSA-9620244) and assembly and cell surface presentation of NMDA receptors (Reactome:R-HSA-9609736). We also observe apoptotic pathways, including activation of BH3-only proteins (Reactome:R-HSA-114452) and apoptosis (Reactome:R-HSA-109581, Supp. Table S36), suggesting a potential role of APOE4042 in cell death.

Besides AD43, APOE has been associated with at least eighteen other diseases, including heart disease and high cholesterol44. PheWAS of the index variants in this locus identified 1,337 associated traits, highlighting the highly pleiotropic nature of this region and its involvement in diverse biological processes (Fig. 3L, Supp. Table S37).

Novel proteins associated with Alzheimer’s disease

Approaches integrating QTL with disease have helped prioritize genes at GWAS loci. We sought to expand this in the context of AD by utilizing three complementary approaches: correlation of genetically-predicted protein levels with AD through proteome-wide association study (PWAS)45,46, prioritization of causal proteins for AD through Mendelian randomization (MR)47, and identification of shared associations with both protein and AD through colocalization (COLOC)48,49.

We performed a PWAS using FUSION45. After FDR correction and pleiotropy filtering50,51, 125 pQTL for 108 proteins were significantly associated with AD. Details of PWAS-specific findings are in the Supplementary Notes, Supp. Figs. 1621, and Supp. Tables 3839. We next performed MR47 and identified 17 unique proteins as putatively causal for AD (Fig. 4A, Supp. Table S40&41, FDR-corrected P<0.05) with confirmed directionality using the Steiger test52 (Supp. Table S42). Finally, we assessed colocalization48,49 between each of our 2,477 significant pQTL associations (cis and trans) and the AD GWAS. After exclusion of pleiotropic regions, 32 proteins had QTL that colocalized with AD risk (PP.H4>0.8, Fig. 4A, Supp. Tables S43&44). We further analyzed proteins associated with AD through at least two approaches (PWAS, MR, and colocalization).

Fig. 4: AD-related proteins are enriched in microglia and immune-relevant pathways.

Fig. 4:

a. Upset plot of protein overlap between PWAS, Colocalization, and Mendelian Randomization (MR) after removal of all associations in pleiotropic regions. b. Miami plot of proteins significant in at least two of PWAS, COLOC, and MR. X-axis: chromosome position of the transcription start site of the protein coding gene; y-axis: PWAS −log10(P) for association with AD calculated using FUSION. Red line: B&H FDR-corrected p-value threshold. Top plot: Positive association with AD; Bottom plot: negative association with AD. Triangle-shaped points correspond to the 38 proteins prioritized in (a). Color: predominant brain-relevant cell type for that protein. c. Circos plot showing trans associations linked to AD through two or more methods. Proteins and links labeled in red or blue are associated with AD through a trans association (red, positively associated; blue, negatively associated). Links start at the TSS of the associated protein-coding gene and end at the index pQTL variant. Associations are labeled in black by the proposed gene at the pQTL. d. Details of the 38 proteins that overlap between at least two of PWAS, colocalization, and MR. Major cell type: Predominant brain-relevant cell type for proteins of interest. Cis/trans pQTL: type of pQTL driving the AD association. Drug Target: Proteins targeted by a molecule described in the DrugBank database. e. Enrichment using a one-sided hypergeometric test of 38 proteins from d in brain-relevant cell types. ***: P=1.38×10−4; *: P=0.013. f. Selected gene sets enriched for proteins from d. Gene Ratio: proportion of the 38 proteins/genes that are part of the pathway. g. Prediction of case/control status in testing dataset through protein-based (ProtRS), PRS-based (PRS + Age + Sex), and covariate-based (APOE + Age + Sex) models & prediction of amyloid/tau positivity using the ProtRS model.

Of all proteins prioritized by PWAS, COLOC and MR, eight (all cis) were significant in all three and 38 (24 cis, 13 trans, and one cis and trans) were significant through two methods (Figs. 4AD; Supp. Table S45). We observed an overrepresentation in microglia/macrophages (P=1.38×10−4, Fig. 4E, Supp. Fig. 13, Supp. Table S29), astrocytes (P=0.012), and neurodegenerative pathways, including late-onset AD and brain atrophy. Consistent with the cell type findings, immune pathways were also enriched, including immune response and leukocyte proliferation (Fig. 4F, Supp. Fig. 22, Supp. Table S46). These proteins 1) provide additional evidence supporting nominated AD genes, 2) nominate alternative genes in certain loci, and 3) prioritize novel genes for AD.

Our findings support the nominated functional gene for ten GWAS23 loci: APOE, ACE, CR1, CTSH, EGFR, GRN, IL34, SHARPIN, TMEM106B, and TREM2. Among these we observed inverse associations (higher protein, decreased risk) for ACE, TREM2, and GRN, while CTSH, SHARPIN, and CR1 were associated with increased risk. Our findings reaffirm established candidate genes like TREM2 and CR1 and support genes at newly-identified loci (SHARPIN, EGFR, CTSH, among others).

Our analyses also nominate alternative candidates at three AD risk loci. First, the association at 1q32.2 has been linked to CR15355, but we observed a colocalized association for both CR1 and CR2 levels with AD risk (Supp. Fig. 23). Given the similarities between CR1 and CR256, both may act in neuroinflammation implicated in AD pathogenesis57. At the 7q22.1 AD locus, the nominated genes were ZCWPW1 and NYAP1. Our analyses indicate that PILRA is a functional gene in this locus, identifying a negative association between PILRA levels and AD risk (Supp. Fig. 23). Previous research suggests altered PILRA ligand binding driven by the index pQTL confers a protective effect in AD58. The index variant has also been fine-mapped as likely causal in another GWAS study59. At the 16p11.2, KAT8 locus, we nominate PRSS8 as the candidate gene. PRSS8 levels were associated with increasing risk of AD through MR, driven by a secondary pQTL (Supp. Fig. 23). This protein was also associated with AD in plasma12.

We also identified 14 proteins with AD associations driven by trans-pQTL including TMEM132C, LRP6, and CRADD (Supp. Tables S38, S40, S43, & S44, Fig. 4C). We found that the BIN1 AD GWAS locus was also associated with TMEM132C levels. TMEM132 family members are linked to traits including panic disorders60 and insomnia61, suggesting important brain-relevant function. We also observed a trans-pQTL for CRADD near ADAMTS8, driving an inverse association between CRADD and AD risk. Mutations in CRADD cause intellectual disability62 and the protein contributes to neuronal apoptosis63, potentially a mechanism for protection against AD. Both TMEM132C and CRADD were also associated with the APOE locus discussed above. Additionally, levels of LRP6 were associated with variants near ADAM10. ADAM10 contributes to amyloid precursor protein cleavage64 and overexpression reduces amyloid plaque burden65 and memory deficits66. LRP6 functions in Wnt signaling vital to synapse function, while variants in LRP6 have been associated with synapse degeneration in AD67. The integration of trans-pQTL with AD identifies novel distal regulatory mechanisms that may explain disrupted pathways in AD. All other prioritized proteins are discussed in the Supplementary Notes.

The AD-associated proteins were enriched in both immune cell types and pathways, so we next assessed their roles in these mechanisms. Immune-relevant proteins included TREM2, which regulates microglial response to neurodegeneration68. TREM2 complexes with DAP12, which modulates activation of microglia through its immunoreceptor tyrosine-based activation motif (ITAM)69. Several of our identified proteins, including PILRA70, CD3371, SIRPA72, and SIGLEC973, also contain ITAMs or their inhibitory counterparts (ITIMs, Fig. 5A). ITAM- and ITIM-containing proteins regulate processes including microglial phagocytosis and apoptosis74, suggesting convergent AD-relevant pathways mediated through microglial surface receptors.

Fig. 5: Cellular localization of immune and lysosomal proteins.

Fig. 5:

a. Involvement of 16 AD-associated immune-related proteins (IL34, PILRA, TREM2, LGALS3, APOE, CD33, SHARPIN, C1S, CR1, CR2, SIRPA, CD72, LILRB1, IL12A, FCGR3B, and SIGLEC9) in microglia signaling pathways. AD-prioritized proteins are highlighted in black text. b. Involvement of six lysosomal proteins (CLN5, EGFR, TMEM106B, GRN, CTSH, CST8) in various components of the lysosomal processing system. AD-prioritized proteins are highlighted in black text. Cell structure images in b were obtained from Servier via the open-source resource BioIcons.

Another highly significant pathway, the endolysosomal pathway, included novel AD-relevant proteins (CTSH, CST8, CLN5; Fig. 5B). Increased CTSH mRNA levels were observed in AD, while microglia with CTSH knockouts showed increased Aβ phagocytosis75. Cathepsins function in proteolysis of lysosomal proteins76. CST8 is part of a protein family that inhibits the activity of cathepsins77, offering a potential mechanism for its inverse AD association. Mutations in CLN5 contribute to a form of neuronal ceroid lipofuscinosis78, while CLN5-deficient neurons showed impaired lysosomal activity and movement78. CLN5 is processed through the ER and Golgi before being transported through the early endosome to lysosomes79, supporting enrichment for endosome proteins (Fig. 4F). We also identified two lysosomal proteins, GRN and TMEM106B80,81, that are associated with frontotemporal dementia (FTD)82,83 but were also identified in the latest AD risk GWAS23.

Finally, we searched DrugBank84 to identify therapeutic compounds targeting AD-associated proteins. Of the 38 proteins, 15 had reported molecules (Fig. 4D). FCGR3B is targeted by cetuximab (DrugBank accession number DB00002), a chemotherapy agent. Cetuximab is associated with decreased risk of AD85 (Supp. Table S47). Glutamic acid (DB00142) targets SLC25A18, a glutamate transporter86 and is involved in excitatory signaling in neurons87. CASQ1 is a ligand for calcium (DB11093) and studies suggest that targeting calcium receptors may be beneficial in AD88. Captopril (DB01197), an ACE inhibitor, showed a neuroprotective effect in an AD mouse model89. Finally, CA12 is targeted by carbonic anhydrase inhibitors like acetazolamide (DB00819), which was shown to reduce Aβ-induced mitochondrial toxicity90. Through our analyses, we identified numerous causal and druggable proteins that may offer promising targets for AD treatment in the future.

Identified proteins successfully predict AD status

The identification of accurate panels to prioritize individuals at risk of AD is a focus of the field. We sought to determine if proteogenomics (agnostic to disease status) could prioritize proteins as AD biomarkers. We developed a proteomic risk score91 by selecting PWAS-significant aptamers (n=456) and performing LASSO regression to select predictors of AD status (Npredictors=100) in a training dataset (n=1,325). We assessed the predictive ability of the model in an independent testing dataset (n=567). We also analyzed the association between each of the 100 predictors individually with AD status (Supp. Fig. 24, Supp. Table S48).

The model accurately classified participants in both the training (AUC=0.949; Supp. Fig. 25, Supp. Table S49) and testing datasets (AUC=0.902, Fig. 4G). The model performed better than a PRS with covariates or a model incorporating APOE genotype with covariates (P<2.2×10−16, Supp. Table S50). The model developed using clinical status was significantly better at classifying samples by amyloid/tau biomarker status (AUC 0.963, P=8.89×10−5), suggesting the model may be able to accurately classify individuals who have preclinical signatures of AD. The performance of the model was consistent across ages and APOE genotypes (Supp. Fig. 25, Supp. Table S50) and the independent Stanford dataset (ProtRS AUC=0.932, Supp. Fig. 26), which only included 65% of the proteins. We note that the individuals used in the prediction models were also used when identifying the PWAS-significant proteins, so further analysis is needed in independent cohorts to validate the models.

Discussion

Proteins, due to their active participation in biological processes, often directly contribute to disease. However, little is understood about the genetic underpinnings regulating protein levels across tissues, especially in the context of neurological traits. CSF is neurologically relevant as the fluid that surrounds the brain and spinal cord, while many CSF proteins are derived from the central nervous system92. The ability to extract CSF from living individuals and its shared biology with the brain highlights its relevance for the study of neurological disorders and diseases. Here, we present a proteogenomic analysis of CSF, aiming to understand the regulators of proteins in a privileged body system. This dataset represents an expansion of previous CSF pQTL work10,1619 in both sample size (n=3,506) and number of proteins (n=6,361). We identified 3,885 independent study-wide significant pQTL associations (2,605 cis, 1,280 trans) for 1,883 proteins and confirmed that they are highly CSF- and protein-specific, supporting the analysis of tissue-relevant proteomic datasets to understand complex traits. We confirmed many of our findings using a complementary proteomic platform, highlighting the robustness of our findings across cohorts and measurement techniques. We observed highly pleiotropic, CSF-biased genomic regions located on chromosome 3q28 and 19q13.32 (near APOE) that were enriched for neuronal proteins, demonstrating the importance of these oft-overlooked pQTL hotspots.

In AD, many genetic loci have been identified, but for some the functional gene is not clear23,59. We integrated our pQTL with AD GWAS and prioritized 38 proteins, 24 of which were not nominated by previous studies. Two of them (ACE, CTSH) were also significant in brain using a different proteomics method, reinforcing the relevance of CSF to brain biology46. The proteins were enriched in microglia and immune pathways, supporting a strong immunological role in AD9395 including TREM27,96, CD33, and IL34. We also highlighted a lysosomal role, with GRN and TMEM106B among others implicated in endosome-lysosome fusion. We identified drug repurposing candidates for AD, including cetuximab, acetazolamide, and calcium receptor-targeting drugs. Further, we used the AD-associated proteins to develop a predictive model that improved upon PRS in all facets, including in age- and APOE-stratified contexts, highlighting the proximity of proteins to disease compared to genetics.

We note several limitations warranting further investigation. We included only individuals of European ancestry, limiting applicability to other populations. Additionally, protein measurement using aptamers can introduce binding artifacts that affect the pQTL associations. The binding characteristics of each aptamer are not available, so we cannot determine which isoform the aptamers are binding to. This includes an instance where aptamers do not correlate with the expected genotype (see Supplementary Notes, Supp. Figs. 27&28). Nonetheless, we note that SOMAscan-based measurements correlate well with complementary approaches26,97. While we performed AD-specific analyses using a large AD GWAS to date23, this involved proxy-AD cases, potentially clouding the results due to inaccurate classification. In addition, the prioritization was limited to proteins in the panel, excluding genes from many AD-associated regions and limiting the search space for those where a protein is included. Our cell-type analyses were derived from healthy individuals and RNA levels, which may correlate poorly with proteins. Investigation of proteins in a cell-type specific and disease-stratified manner is an important direction. Additionally, while we attempted to control for horizontal pleiotropy in our MR analyses by utilizing only cis -pQTL and removing any variants associated with multiple proteins, we are limited to the proteins in the panel. With larger panels and sample sizes, it may become difficult to identify variants that satisfy MR’s assumptions. As pQTLs are already limited in the instrument variables that are robust enough to be included in the analysis, MR may not be an effective prioritization method. Therefore, validation of these proteins is necessary. We nonetheless believe that our findings will point to mechanisms through which the variants may act to affect disease development.

Evidence suggests that candidate proteins can be identified for many neurological traits, even with smaller sample sizes10,13, and this dataset has been applied to identify proteogenomic links between the LRRK2 locus and Parkinson’s disease21. To ensure the results can be used to study other traits, we have made joint analysis summary statistics available through the GWAS Catalog and easy to search in a PheWeb browser (ontime.wustl.edu).

Materials and Methods

Ethics declarations

The ethics committee of Washington University School of Medicine in St. Louis approved this study.

All participants provided informed consent for their data and specimens to be used for this study. The study was approved by the institutional review board of Washington University School of Medicine in St. Louis.

Cohorts

This study analyzed 4,989 cerebrospinal fluid (CSF) proteomics samples from 4,968 unique individuals in eight cohorts (Alzheimer’s Disease Neuroimaging Initiative [ADNI], Dominantly Inherited Alzheimer’s Network [DIAN], Knight-ADRC Memory and Aging Project [Knight-ADRC (Oct 2021) & Knight-ADRC (June 2023)], Ace Alzheimer Center Barcelona (FACE), Barcelona-1, Parkinson’s Progression Markers Initiative [PPMI], Stanford Iqbal Farrukh and Asad Jamal ADRC & Aging Memory Study [Stanford], and Washington University Movement Disorders clinic [MARS]). Details about each cohort are available in the Supplementary Notes.

Proteomic Data Processing

CSF samples were collected through lumbar puncture from participants after an overnight fast. Samples were processed and stored at −80 °C until they were sent for protein measurement.

Samples from ADNI, DIAN, Knight-ADRC (Oct 2021), Knight-ADRC (June 2023), MARS, Ace Alzheimer Center Barcelona (FACE), and Barcelona-1 were sent for protein measurement using the SOMAscan7k platform (Supp. Table S51)25. Samples from PPMI were sent for protein measurement on the SOMAscan5k platform25. Samples from the Stanford ADRC and SAMS were sent for protein measurement on a separate version of the SOMAscan5k assay. All panels reported aptamer levels in relative fluorescent units (RFU).

The proteomics data from all cohorts underwent initial normalization by SomaLogic (see Supplementary Notes). Further quality control was performed on the normalized proteomics data provided by Somalogic, according to an in-house protocol, for samples from all cohorts except PPMI. This identical protocol was performed on ADNI, DIAN, Knight-ADRC (Oct 2021), FACE, and Barcelona-1 as one group, Knight-ADRC (June 2023) and MARS as a second group, and Stanford as a third group, due to differences in proteomics panel and time of proteomics measurement. Details of scale factor, coefficient of variation (CV), IQR, and call rate-based filtering are in the Supplementary Notes. A summary of the QC steps and aptamers/samples removed at each step is found in Supp. Fig. 1AC & Supp. Table S1. A summary of the samples removed by step is found in Supplementary Table S52.

Quality control was also performed on the PPMI SOMAscan 5k data. Scale factor and CV were not available for the PPMI cohort, so those filtering steps were not performed. IQR-based and call-rate based quality control were performed using the same methods as with the SOMAscan7k dataset. A summary of the QC steps for the SOMAscan5k dataset is found in Supp. Fig. 1D.

To account for sample size differences across the cohorts, only aptamers present in the largest dataset (that consisting of ADNI, DIAN, Knight-ADRC (Oct 2021), FACE, and Barcelona-1, Naptamers=7,008) were kept for analysis. To determine the number of unique proteins in each dataset, we used information provided by Somalogic to map the aptamer ID to a Uniprot ID. If the aptamer was present in both the SOMAscan 5k and 7k datasets, we prioritized the mapping in 7k. We then determined the number of unique Uniprot IDs included in each dataset after QC.

Genomic Data QC

All proteomics samples were matched to genomic data (if available) pre-proteomics QC. Genomic datasets from ADNI, DIAN, Knight-ADRC (Oct 2021), Knight-ADRC (June 2023), MARS, FACE, and Barcelona-1 were genotyped on multiple arrays at different times and were imputed individually using the GRCh38 Version R2 reference panel on the TOPMed imputation server. Before imputation, high-quality directly sequenced variants were filtered based on the following criteria: (1) genotyping rate ≥98% per SNP or individual; (2) MAF≥0.01; and (3) Hardy-Weinberg Equilibrium (HWE) P≥1×10−6. Genomic datasets from PPMI and Stanford were whole-genome sequenced and aligned to GRCh38. All datasets were merged and variants with genotyping rate ≥90%, minor allele count (MAC) ≥10, and HWE P≥1×10−6 were kept. Ambiguous SNPs were removed from the analyses to avoid false-positive results due to genotyping array strand differences.

Genomic samples were matched to the cleaned proteomic dataset. Principal component analysis (PCA) was performed using PLINK1.998 to account for population stratification. Samples were defined as European in ancestry first using loose thresholds of >−0.01 for both gPC1 and gPC2 (Supplementary Figure 2A). Further filtering was done by removing samples that were outside of three standard deviations from the mean of the remaining samples. Filtering for cryptic relatedness through identity by descent (IBD) was performed using PLINK1.998 including variants with MAF≥0.15 or HWE P≥0.001 and R2 of 0.2 (Supplementary Figure 2B). Pairs with PI_HAT≥0.25 were considered related and one sample from each pair was removed to minimize samples lost (n=230). A summary of the samples removed in each step is available in Supp. Table S52.

pQTL Identification

We performed a joint analysis of 3,506 samples using PLINK298. Details about the cohorts included for each aptamer are in Supp. Table S1. Sample sizes ranged from 2,326 for aptamers only assayed in ADNI, Barcelona-1, DIAN, FACE, and Knight-ADRC (Oct 2021) to 3,506 for aptamers assayed in all cohorts. Per proteomics run (ADNI, DIAN, Knight-ADRC (Oct 2021), FACE, Barcelona-1 together; PPMI alone; Stanford alone; Knight-ADRC (June 2023) alone; MARS alone), aptamer levels were z-score normalized by transforming to log10-scale and normalizing using the scale() function in R with options scale and center set to true. All datasets were then combined for analysis. For each aptamer, age, self-reported sex, ten genetic principal components, and cohortArray variables (for example ADNI_OmniEx) were included as covariates in an additive linear model. The overall model was as follows:

Zscore(aptamerlevel)β0+β1*SNPdosage+β2*age+β3*sex+j413βj*gPC1gPC10+k14nβk*cohortArray+ε

Cis-pQTL identification

We defined cis-pQTL as all variant-aptamer associations with raw P<5×10−8 that were within 1MB in either direction of the transcription start site (TSS) of the corresponding protein-coding gene based on hg38 coordinates.

Trans-pQTL identification

To define trans-pQTL, we calculated the number of proteomics principal components necessary to account for 95% of the variance in protein levels, based only on the samples from the largest dataset (ADNI, DIAN, Knight-ADRC (Oct 2021), FACE, and Barcelona-1). In total 1,450 proteomic PCs were necessary to account for that variance, so a study-wide significant p-value of 5x10−8/1450 (approximately P<3.45x10−11) was used to define trans-pQTL. We considered all study-wide significant variant-aptamer associations farther than 1MB in either direction from the TSS to be trans-pQTL.

Identification of associations

We identified index variants used to define a pQTL through a distance-based method. For each aptamer, we scanned the genome to identify the most significant variant-aptamer association that passed the study-wide multiple testing correction criteria and defined it as the index variant for that association. We then grouped and removed all associations within 1MB of the index variant. We performed the same procedure for the next most significant variant-aptamer association (if applicable) until no associations reached study-wide significance threshold.

Cohort-specific analysis

We performed cohort-specific analyses to obtain effect sizes and standard error estimates for the index variants in each cohort. Using the same covariates as in the joint model, we used PLINK298 to analyze the association of the corresponding index variant with protein levels in each cohort.

Meta-analysis

We utilized METAL99 to perform a fixed-effect, inverse-variance meta-analysis for each association. We input the summary statistics for the index SNP-aptamer pair from each of the eight cohorts into METAL. To assess effect size heterogeneity across the cohorts, we included the ANALYZE HETEROGENEITY flag.

Disease-specific analyses

Using PLINK298, we analyzed the index variants in neurologically healthy controls and AD cases based in biomarker status. We considered age, sex, genetic principal components 1-10, and cohortArray as covariates in the model. We used Alzheimer’s disease-specific biomarkers amyloid beta 42 (Aβ42) and phosphorylated tau-181 (pTau, both measured in CSF) to determine amyloid/tau classification27 for each sample with both measurements. Details of the classification per cohort are in the Supplementary Notes.

Amyloid and tau-positive (A+T+, 798) samples were treated as cases and amyloid and tau-negative samples (AT, 945) were treated as controls. We stratified our samples by biomarker status and used models consistent with the joint analysis to test A+T+ samples and AT samples separately for association of the index SNPs with their corresponding aptamer levels. We then correlated effect size between the disease-stratified and joint analyses. Correlation was calculated using the Pearson method.

Conditional Analysis

We performed conditional analysis using PLINK298. Using the same covariates as the full model, we conditioned on each index variant using the –condition-list flag and determined if any variant was still significant in the conditional model. If so, we added that variant to the –condition-list file to run again. We continued this iteration until no variants reached the study-wide threshold (5×10−8 for cis, 3.45×10−11 for trans).

We further performed conditional analysis for all aptamers with an association in the APOE pleiotropic region (chr19:44888997-chr19:44919689). We conditioned on rs7412 and rs429358 separately using the –condition-list flag. We then conditioned on both variants. We further performed association analysis including only APOE ε33 carriers using the same covariates as the full model.

Variant Annotation

We utilized the Variant Effect Predictor (VEP)28 to annotate the most severe consequence for each pQTL. For each association, we identified all variants in high LD (R2>0.8, calculated with PLINK1.998 using only EUR samples) with the index pQTL in that association. We then used those as input to VEP and obtained all annotation information for each variant. We then grouped each set of correlated variants and determined the most severe annotation for the association based on VEP’s order of severity.

Enrichment of annotation categories

Annotation enrichment was performed by matching the annotations of the index variants to an equal number of randomly-selected variants, matched for allele frequency and gene distance. Details are in the Supplementary Notes.

Identification of Drug Targets

We queried the DrugBank84 database (go.drugbank.com) using their advanced search feature, where we used as input every UniProtID corresponding to a protein measured on the SOMAscan 7k panel (Supp. Table S1). For output, we selected the “Target uniprot”, “DrugBank ID”, “Name”, “Biotech”, “Approved”, “Experimental”, “Investigational”, “Nutraceutical”, “Target name”, and “Target gene name” descriptors. We then matched each of these to the SomaID.

External Replication (Olink)

To assay the replicability of our pQTL associations across proteomics measurement approaches, we obtained pQTL statistics for matching proteins measured using the Olink platform from the Amsterdam Dementia Cohort (ADC).

Amsterdam Dementia Cohort (ADC)

CSF100 and genetic101,102 samples (n=502) were selected from the ADC. The ADC (2000-present) is an ongoing, observational follow-up study of patients who visit the memory clinic of the Alzheimer Center VUmc. In total, 502 samples with both proteomic and genomic data were included. These consisted of 146 controls, 37 patients with mild cognitive impairment (MCI), 176 with AD, 50 with dementia withy Lewy Bodies (LBD), and 93 with frontotemporal dementia (FTD). Details of the cohort and quality control are in the Supplementary Notes.

Statistical Analysis

Association signals between genetic variants and CSF protein levels were performed using a linear regression model in PLINK298. Analyses were corrected for age, sex, and population structure (gPC 1-4). Proteins were then matched across platforms using their UniprotID103 in order to compare association effect sizes.

Tissue-specificity of pQTL Associations

We compared the pQTL associations from our joint analysis to those identified a large aptamer-based plasma pQTL study11 and to plasma and brain pQTL10. We performed colocalization using coloc.abf48, including all variants within 1MB of each index CSF pQTL variant that were present in each study, ensuring that the aptamer ID was consistent between each dataset analyzed. We considered an association to be shared between tissues if the posterior probability of H4 (shared association) was greater than or equal to 0.80. We lifted over the plasma and brain summary statistics published by Yang et al. to GRCh38 coordinates using the UCSC Genome Browser’s LiftOver tool104. Due to limited aptamer overlap, we performed colocalization for 1,821 associations with Ferkingstad et al.11, 428 with in-house plasma10, and 520 with in-house brain10 (out of 2,477). We determined if trans associations were more likely to be CSF-specific than cis using the prop.test() function in R105.

We also analyzed the LD overlap between the CSF index variants and those observed in the large plasma pQTL analysis. To do this, we first determined whether any variant in a 1MB region surrounding the index CSF variant reached the study-wide threshold for significance (P<1.8×10−9) in Ferkingstad et al. We then selected the most-significant variant in plasma that not only passed the significance threshold but was also present in our dataset. We tested for LD between the CSF index variant and the plasma variant using PLINK 1.9’s --r2 flag, using only the genetic information from the individuals determined to be of European descent in our dataset.

Molecule-specificity of pQTL associations

We analyzed whole blood eQTL summary statistics from eQTLGen5 and GTEx4, cortex eQTL from GTEx, microglia-specific eQTL29 and cortex, hippocampus, basal ganglia, cerebellum, and spinal cord eQTL from MetaBrain6. The summary statistics from eQTLGen were supplied using GRCh37, so we lifted them to GRCh38 coordinates using LiftOver104. For the eQTL datasets, if Ensembl transcript ID was supplied, we matched using only the Ensembl gene ID and ignored the transcript info. Ensembl ID was matched to the Entrez Gene Symbol using the biomaRt R package106, then the Entrez Gene Symbols for each protein aptamer were used to match summary statistics between pQTL and eQTL datasets. Because most of these datasets were limited to cis-acting variants only, we performed colocalization using cis pQTL only. For each pQTL, we kept all SNPs present in both pQTL and eQTL datasets that were within 1MB of the index pQTL variant’s location. We utilized the coloc.abf function48 to perform colocalization. We determined shared associations between pQTL and eQTL to be those with PP.H4 ≥ 0.80.

Heritability mediated through QTLs

To analyze the relevance of CSF pQTLs to AD, we utilized MESC107 to estimate the fraction of AD heritability mediated by our results and by cortex eQTLs derived from MetaBrain (a large brain eQTL dataset)6. Details of this analysis are in the Supplementary Notes.

Identification and Analysis of Pleiotropic Regions

Pleiotropy was defined based on LD. For each index pQTL variant identified, we used PLINK1.998 to determine all other index pQTL variants in LD (R2 ≥ 0.5 using only EUR samples). We grouped all sets of variants in LD with each other and defined the region as all base pairs between the two farthest-apart index variants of each LD group. We then grouped all aptamers into the appropriate region and determined the number of aptamers and number of corresponding EntrezGeneSymbols with index variants in each pQTL region. Because of the complexity of the human leukocyte antigen (HLA) region located on chromosome six, we manually curated this region to encompass all index variants either from the start of the HIST1H2AA gene to the end of the RPL12P1 gene or in LD (R2 ≥ 0.5) with either of those variants, producing a pQTL region spanning from chr6:25693046 to chr6:33304176108.

For each of the three highly pleiotropic regions, we first determined the localization of the gene encoding each regulated protein. We matched each gene to a corresponding brain-relevant cell type (see Cell-Type Specificity Analysis). Using the circlize R package109, we obtained circos plots showing the associations for each region. Next, using the gene IDs as determined by Somalogic, we performed pathway analysis for the proteins regulated by each pleiotropic region (see Pathway Enrichment Analysis). We then performed cell-type enrichment analysis (see below). For each of the three regions, we used the index pQTL variants identified in each and queried the GWAS Catalog30 to identify all genome-wide significant associations for those variants. We performed LD pruning using PLINK298 using a window size of 500KB, step size of 50, and R2 cutoff of 0.5 or 0.8 to define independent LD blocks. We utilized Haploview110 to visualize the linkage disequilibrium structure of these pleiotropic regions.

We compared the number of aptamers associated with each region in CSF and plasma11. Using the index associations identified in plasma (n=18,084), we determined the number of associations in each of the three main CSF pleiotropic regions. Using the total number of index associations identified (2,477 in CSF and 18,084 in plasma), we performed a two-sided two-sample z-test for proportions using prop.test() in R to analyze the difference in the proportion of the total associations between tissues in each region.

Because the APOE chr19 region is so intertwined with AD, we tested for a difference in effect size between AT- and AT+ individuals for each of the associations in this region using the following equation:

Zdiff=βATβAT+SEAT2+SEAT+2

We then used a Z-statistic cutoff of ±1.960 (corresponding to P=0.05) to determine associations whose strength differed between AT and AT+ samples.

Alzheimer’s Disease Proteome-Wide Association Study

To identify proteins potentially involved in the etiology of AD, we utilized a modified version of the FUSION framework45. We calculated weights for each protein based on each region with a significant association and used the weights to associate with AD risk GWAS23. Weights were also generated using control, AT samples only to confirm consistency of the associations Details about weight generation and association with AD are in the Supplementary Notes.

Alzheimer’s Disease Colocalization Analysis

Colocalization under a single causal variant assumption

To determine shared genetic etiology between aptamers and AD, we performed Bayesian colocalization analysis using coloc.abf from the coloc R package48. We used the default prior probabilities of P1=1x10−4, P2=1x10−4, and P12=1x10−5. For a genetic signal to be considered shared between aptamer level and AD risk23, we required that the posterior probability of H4 (PP.H4) be >0.8. We included all variants within 1MB of the pQTL index variant that were present in both the pQTL and AD risk analyses. For each of these, we harmonized the effect allele to ensure consistent direction of effect between the two analyses. Using PLINK1.998, we calculated an LD matrix based on the 3,506 samples analyzed. While the matrix is not necessary for this method, for consistency with coloc-SuSiE we used it to filter all variants missing LD info before running colocalization to ensure the same data was used for single- and multi-variant colocalization. Because both the pQTL analyses and AD GWAS report the standard error of the effect size for each variant, we squared the standard error for each variant in each analysis to obtain the variance as required for coloc. For the AD GWAS, the “type” was set to “cc”. Because z-score protein levels were used, “sdY” was set to 1 for the pQTL dataset.

Colocalization with no single causal variant assumption

To relax the single causal variant assumption inherent in coloc48, we also performed colocalization using coloc-SuSiE, which accounts for LD and was able to compare multiple independently significant variants at each locus49. This approach first utilizes SuSiE111, which uses an iterative Bayesian stepwise selection approach to identify independent credible sets of variants, each of which contains a variant predicted to be causal. The same variants as above were used for colocalization, which is then run on each pairing of credible sets. The LD matrix and the effect size and variance for each variant was then supplied to the runsusie function in the coloc R package, which calculated the credible sets for each trait. This information was then passed to the coloc.susie function in the coloc R package to calculate colocalization. A threshold of PP.H4>0.8 was used to identify shared causal variants.

Alzheimer’s Disease Mendelian Randomization

In order to estimate causality of the proteins measured by the aptamers in this analysis, we implemented MR using the R package TwoSampleMR47. We performed clumping on all variants that reached genome-wide significance (P<5×10−8) in cis for each aptamer, using the predefined default clumping thresholds. We extracted the instrument variants then harmonized the two datasets to ensure matching effect alleles. To account for pleiotropy in the analysis, we first removed all variants for each aptamer including any variant in a pleiotropic region (as identified above) that regulated two or more proteins – so only variants associated with a single protein were included. We then performed MR analysis to determine causality of all aptamers with a pQTL for AD risk. For all analyses with one instrumental variable, we utilized the Wald Ratio p-value. For all analyses with two or more instrumental variables, we utilized the inverse variance weighted (IVW) p-value. Results from other methods are summarized in Supp. Table S41. We performed multiple test correction using Benjamini-Hochberg112 false discovery rate, with an adjusted p-value threshold of 0.05. We analyzed directionality using MR-Steiger52. We then determined the aptamers that overlapped between at least two of PWAS, COLOC, and MR and performed pathway and cell-type analyses as described below.

Proteomic Risk Scores

Estimating AD-associated protein levels based on clinical status

Proteomic risk score analysis91 was performed to predict clinical status of the study participants. We stratified our samples to only clinical AD controls (excluding DIAN due to early-onset and autosomal-dominant AD) and clinically-defined cases. To calculate the values used for the prediction model, we selected AD-associated aptamers from the PWAS analysis (456 aptamers). If the PWAS-reported effect size for an aptamer was negative (i.e., higher levels of the aptamer associated with decreased risk of AD), the z-score level of that aptamer was flipped so negative values became positive and vice versa. This ensured higher aptamer levels were always associated with increased AD risk according to PWAS. For each sample, the sum of the z-score levels for the 456 aptamers was calculated. Because each sample has different levels of missingness, the sum was then divided by the number of non-missing aptamer levels out of 456.

We first determined the difference in proteomic risk score (average z-score across all 456 aptamers per participant) distribution between clinically-defined cases (n=1,019) and controls (n=1,167) using the Wilcoxon rank-sum test as implemented in the compare_means() function in the ggpubr R library.

LASSO regression113 was used to select significant predictors out of the 456 aptamers. We also generated PRS values for each sample and compared the methods. Details of the model generation are in the Supplementary Notes.

Pathway Enrichment Analysis

We performed pathway enrichment analysis for multiple groups of aptamers. We performed pathway analysis in R using enrichGO, enrichDGN, enrichKEGG, Reactome, and enrichDO. EnrichGO and enrichKEGG were implemented using the clusterProfiler package114. EnrichDGN and enrichDO were implemented using the DOSE package115. Reactome was implemented using the ReactomePA package116. For each set of aptamers, we used the SomaLogic-annotated EntrezGeneSymbol. For enrichGO, enrichDGN, enrichKEGG, Reactome, and enrichDO, we supplied the list of selected Entrez gene symbols as input (IDs instead of symbols for enrichDGN) and set the universe to all unique genes covered by the SOMAscan7k panel (n=6,113). For all enrichment analyses, an FDR-corrected p-value threshold of 0.05 was used.

Cell-type Specificity Analysis

We downloaded gene expression data from human astrocytes, neurons, oligodendrocytes, microglia/macrophages, and endothelial cells34 to determine the degree of specificity to relevant cell types for the aptamers included in the SOMAscan7k panel. The data contained multiple subtypes of astrocytes; we focused on human mature astrocytes. For each cell type, we determined the average expression level for each gene. We then added the averages from each cell type to get a total expression level for that gene across the five cell types. We calculated the percentage of the total expression that each cell type contributed. A gene was reported to be cell-type specific if the percentage of its full expression contributed by the top cell type was 1.5x higher than the second top cell type.

To determine enrichment, we matched each of the proteins in the SOMAscan7k platform to their EntrezGeneSymbols using the Somalogic-provided documentation. We determined the cell-type specificity for each and counted the number of genes that were specific to each cell type. In total, 5,750 proteins had genes that were included in the cell-type expression data. For each protein subset, we determined the number of corresponding genes that were specific to each cell type. We then tested for enrichment using the hypergeometric test.

Supplementary Material

Supp Resutls
Supp tables

Acknowledgements

We thank all the participants and their families, as well as the many involved institutions and their staff.

We thank Dr. Chloe Robins for her assistance in reviewing the manuscript.

This work was supported by grants from the National Institutes of Health (R01AG044546 (CC), P01AG003991 (CC, JCM), RF1AG053303 (CC), RF1AG058501 (CC), U01AG058922 (CC), RF1AG074007 (YJS), R00AG062723 (LI), P30 AG066515 (TWC, MDG), the Chan Zuckerberg Initiative (CZI), the Michael J. Fox Foundation (LI, CC), the Department of Defense (LI- W81XWH2010849), the Alzheimer’s Association Zenith Fellows Award (ZEN-22-848604, awarded to CC), and the Bright Focus Foundation (A2021033S, LI).

GSK provided funding to support the analyses performed in this study.

The recruitment and clinical characterization of research participants at Washington University were supported by NIH P30AG066444 (JCM), P01AG03991 (JCM), and P01AG026276 (JCM).

This work was supported by access to equipment made possible by the Hope Center for Neurological Disorders, the NeuroGenomics and Informatics Center (NGI: https://neurogenomics.wustl.edu/) and the Departments of Neurology and Psychiatry at Washington University School of Medicine.

DIAN resources:

Data collection and sharing for this project was supported by The Dominantly Inherited Alzheimer Network (DIAN, U19AG032438) funded by the National Institute on Aging (NIA), the Alzheimer’s Association (SG-20-690363-DIAN).

ADNI acknowledgement:

Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012).

ADC Olink proteomic data is part of the neurodegeneration research program of Amsterdam Neuroscience and was supported by: Alzheimer Nederland (WE.03-2018-05, MC and CT) and Selfridges Group Foundation (NR170065, MC and CT).

Competing interests

CC has received research support from GSK and EISAI. The funders of the study had no role in the collection, analysis, or interpretation of data; in the writing of the report; or in the decision to submit the paper for publication. CC is a member of the advisory board of Circular Genomics and owns stocks in this company. DP is an employee of GlaxoSmithKline (GSK) and holds stock in GSK. MC has been an invited speaker at Eisai. MC is an associate editor at Alzheimer Research and Therapy. BMT and PJV are inventors on a patent (#WO2020197399A1; owned by Stichting VUmc). CET has a collaboration contract with ADx Neurosciences, Quanterix and Eli Lilly, performed contract research or received grants from AC-Immune, Axon Neurosciences, Bioconnect, Bioorchestra, Brainstorm Therapeutics, Celgene, EIP Pharma, Eisai, Grifols, Novo Nordisk, PeopleBio, Roche, Toyama, Vivoryon. She serves on editorial boards of Medidact Neurologie/Springer, Alzheimer Research and Therapy, Neurology: Neuroimmunology & Neuroinflammation, and is editor of a Neuromethods book Springer. She had speaker contracts for Roche, Grifols, Novo Nordisk. The rest of the authors declare no competing interest.

Consortia

Dominantly Inherited Alzheimer’s Network

John C. Morris5, Richard J. Perrin5,6,7, & Laura Ibanez1,2,5

A full list of members and their affiliations appears in the Supplementary Material.

Alzheimer’s Disease Neuroimaging Initiative

Allan I. Levey4, John C. Morris5, & Richard J. Perrin5,6,7

A full list of members and their affiliations appears in the Supplementary Material.

Footnotes

Code Availability

We used publicly-available software for all analyses in this manuscript. These included (for the main conclusions of the manuscript) PLINK1.9 v1.90b6.26 & PLINK2 v2.00a3.6LM for pQTL identification and secondary analyses, METAL (https://csg.sph.umich.edu/abecasis/metal/download/, most recent version (2011-03-25)) for cohort-stratified meta-analysis, FUSION (http://gusevlab.org/projects/fusion/, May 24, 2022 version) to perform proteome-wide association studies, the TwoSampleMR R package (v0.5.8) to perform Mendelian Randomization, and the coloc R package (v5.2.2) to perform colocalization. The prediction models were implemented using the glmnet (v4.1-8) and pROC (v1.18.2) R packages. Pathway analysis was performed using the clusterProfiler (v4.8.2), org.Hs.eg.db (v3.17.0), DOSE (v3.26.1), and ReactomePA (v1.44.0) R packages. Variant annotation was performed using VEP v109. We used LDSC v2.0.0 to format summary statistics. Figures were generated using ggplot2 (v3.5.1), cowplot (v1.1.3), ggbreak (v0.1.2), ggpubr (v0.6.0), qqman (v0.1.9), Biobase (v2.64.0), grid (v4.4.0), gdata (v3.0.0), ggnewscale (v0.4.10), enrichplot (v1.24.0), circlize (v0.4.16), ggrepel (v0.9.5), scales (v1.3.0), PerformanceAnalytics (v2.0.4), hudson (v1.0.0), ComplexHeatmap (v2.20.0), and ComplexUpset (v1.3.3). Matching of IDs across platforms was assisted with biomaRt (v2.56.0). Haploview plots were generated using Haploview v4.1. Heritability comparisons were performed using MESC (version updated Jun 24, 2022). Local GWAS plots were produced using locuszoom (v1.4).

Data availability

Full GWAS summary statistics from the joint analysis (3,506 samples, 7,008 aptamers) have been deposited in NIAGADS at https://www.niagads.org/knight-adrc-collection under accession #NG00130 to approved investigators. Additionally, full summary statistics are publicly available through the NeuroGenomics & Informatics Center website (https://neurogenomics.wustl.edu/open-science/raw-data/), through a PheWeb browser developed by the Cruchaga lab (https://ontime.wustl.edu/), and through the GWAS catalog (https://www.ebi.ac.uk/gwas/) under accession IDs GCST90421033-GCST90428040 (see Supplementary Data 1 for details). Furthermore, the PWAS weights (all samples and controls-only) used in this study are also available through the NeuroGenomics & Informatics Center website (https://neurogenomics.wustl.edu/open-science/raw-data/) for public download. Additionally, the proteomics data from the Knight-ADRC cohorts (Oct. 2021 & June 2023) have also been deposited in NIAGADS under the same ID.

The proteomics and individual-level genetic data obtained from the ADNI cohort can be requested through ADNI’s website (adni.loni.usc.edu) after access has been approved.

The proteomics and individual-level genetic data obtained from the Knight-ADRC can be requested through the NIAGADS website (ng00130 for proteomics data, ng00127 for genetic data).

The proteomics and individual-level genetic data obtained from the PPMI cohort can be accessed by approved investigators through the PPMI website (https://www.ppmi-info.org/access-data-specimens/download-data).

The proteomics data from the Stanford ADRC can be accessed by approved investigators through the Stanford ADRC approval process (https://med.stanford.edu/adrc/researcher-resources.html)117.

The individual-level data from DIAN cannot be publicly shared, because the rarity of this disease makes all the data identifiable. This has been checked with the IRB and NIH and confirmed it is not possible to put this data in public repositories. However, data is available to approved investigators through https://dian.wustl.edu/our-research/for-investigators/diantu-investigator-resources/dian-tu-biospecimen-request-form/.

The FACE and Barcelona-1 individual-level data cannot be shared due to the European Union’s general data protection regulation (GDPR).

The participants from MARS have not given consent to share individual-level data.

The proteomics data from ADC is available at https://www.synapse.org/PRIDE_AD.

References

  • 1.Yengo L et al. A saturated map of common genetic variants associated with human height. Nature 610, 704–712 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Fernandez-Rozadilla C et al. Deciphering colorectal cancer genetics through multi-omic analysis of 100,204 cases and 154,587 controls of European and east Asian ancestries. Nature Genetics (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Tcheandjieu C. et al. Large-scale genome-wide association study of coronary artery disease in genetically diverse populations. Nature Medicine 28, 1679–1692 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Consortium TG The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science, 1318–1330 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Võsa U et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nature Genetics 53, 1300–1310 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.de Klein N et al. Brain expression quantitative trait locus and network analyses reveal downstream effects and putative drivers for brain-related diseases. Nature Genetics 55, 377–388 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Deming Y et al. The MS4A gene cluster is a key modulator of soluble TREM2 and Alzheimer’s disease risk. Science Translational Medicine 14, eaau2291 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Vogel C & Marcotte EM Insights into the regulation of protein abundance from proteomic and transcriptomic analyses. Nature Reviews Genetics 13, 227–232 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Johnson ECB et al. Large-scale deep multi-layer analysis of Alzheimer’s disease brain reveals strong proteomic disease-related changes not observed at the RNA level. Nature Neuroscience 25, 213–225 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Yang C et al. Genomic atlas of the proteome from brain, CSF and plasma prioritizes proteins implicated in neurological disorders. Nature Neuroscience 24, 1302–1312 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Ferkingstad E et al. Large-scale integration of the plasma proteome with genetics and disease. Nature Genetics 53, 1712–1721 (2021). [DOI] [PubMed] [Google Scholar]
  • 12.Pietzner M et al. Mapping the proteo-genomic convergence of human diseases. Science 374(2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Sun BB et al. Genomic atlas of the human plasma proteome. Nature 558, 73–79 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Katz DH et al. Whole Genome Sequence Analysis of the Plasma Proteome in Black Adults Provides Novel Insights Into Cardiovascular Disease. Circulation 145, 357–370 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Sun BB et al. Plasma proteomic associations with genetics and health in the UK Biobank. Nature 622, 329–338 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Sasayama D et al. Genome-wide quantitative trait loci mapping of the human cerebrospinal fluid proteome. Human Molecular Genetics 26, 44–51 (2017). [DOI] [PubMed] [Google Scholar]
  • 17.Hansson O et al. The genetic regulation of protein expression in cerebrospinal fluid. EMBO Molecular Medicine 15(2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Kaiser S et al. A proteogenomic view of Parkinson’s disease causality and heterogeneity. npj Parkinson’s Disease 9(2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Kauwe JSK et al. Genome-Wide Association Study of CSF Levels of 59 Alzheimer’s Disease Candidate Proteins: Significant Associations with Proteins Involved in Amyloid Processing and Inflammation. PLoS Genetics 10, e1004758 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Wang L et al. Proteo-genomics of soluble TREM2 in cerebrospinal fluid provides novel insights and identifies novel modulators for Alzheimer’s disease. Molecular Neurodegeneration 19, 1 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Phillips B et al. Proteome wide association studies of LRRK2 variants identify novel causal and druggable proteins for Parkinson’s disease. npj Parkinson’s Disease 9, 107 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Karlsson IK et al. Measuring heritable contributions to Alzheimer’s disease: polygenic risk score analysis with twins. Brain Communications 4, fcab308 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Bellenguez C et al. New insights into the genetic etiology of Alzheimer’s disease and related dementias. Nature Genetics 54, 412–436 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Gold L et al. Aptamer-based multiplexed proteomic technology for biomarker discovery. PLoS One 5, e15004 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.SOMAscan® v4.1 Data Standardization and File Specification Technical Note. (2021). [Google Scholar]
  • 26.Dammer EB et al. Multi-platform proteomic analysis of Alzheimer’s disease cerebrospinal fluid and plasma reveals network biomarkers associated with proteostasis and the matrisome. Alzheimer’s Research & Therapy 14(2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Jack CR et al. A/T/N: An unbiased descriptive classification scheme for Alzheimer disease biomarkers. Neurology 87, 539–547 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.McLaren W et al. The Ensembl Variant Effect Predictor. Genome Biology 17, 122 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Lopes KDP et al. Genetic analysis of the human microglial transcriptome across brain regions, aging and disease pathologies. Nature Genetics 54, 4–17 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Sollis E et al. The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Research 51, D977–D985 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Bennett MK, Calakos N & Scheller RH Syntaxin: A Synaptic Protein Implicated in Docking of Synaptic Vesicles at Presynaptic Active Zones. Science 257, 255–259 (1992). [DOI] [PubMed] [Google Scholar]
  • 32.Dines M & Lamprecht R The Role of Ephs and Ephrins in Memory Formation. International Journal of Neuropsychopharmacology 19, pyv106 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Washburn HR, Chander P, Srikanth KD & Dalva MB Transsynaptic Signaling of Ephs in Synaptic Development, Plasticity, and Disease. Neuroscience 508, 137–152 (2023). [DOI] [PubMed] [Google Scholar]
  • 34.Zhang Y et al. Purification and Characterization of Progenitor and Mature Human Astrocytes Reveals Transcriptional and Functional Differences with Mouse. Neuron 89, 37–53 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Jansen IE et al. Genome-wide meta-analysis for Alzheimer’s disease cerebrospinal fluid biomarkers. Acta Neuropathologica 144, 821–842 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Hammond TR, Marsh SE & Stevens B Immune Signaling in Neurodegeneration. Immunity 50, 955–974 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Bader JM et al. Proteome profiling in cerebrospinal fluid reveals novel biomarkers of Alzheimer’s disease. Molecular Systems Biology 16(2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Gaetani L et al. Neurofilament light chain as a biomarker in neurological disorders. Journal of Neurology, Neurosurgery & Psychiatry 90, 870 (2019). [DOI] [PubMed] [Google Scholar]
  • 39.Cruchaga C et al. SNPs Associated with Cerebrospinal Fluid Phospho-Tau Levels Influence Rate of Decline in Alzheimer’s Disease. PLoS Genetics 6, e1001101 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Zhao J et al. APOE4 exacerbates synapse loss and neurodegeneration in Alzheimer’s disease patient iPSC-derived cerebral organoids. Nature Communications 11(2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Wang C et al. Gain of toxic apolipoprotein E4 effects in human iPSC-derived neurons is ameliorated by a small-molecule structure corrector. Nature Medicine 24, 647–657 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Grainger DJ, Reckless J & McKilligin E Apolipoprotein E Modulates Clearance of Apoptotic Bodies In Vitro and In Vivo, Resulting in a Systemic Proinflammatory State in Apolipoprotein E-Deficient Mice. The Journal of Immunology 173, 6366–6375 (2004). [DOI] [PubMed] [Google Scholar]
  • 43.Liu C-C, Kanekiyo T, Xu H & Bu G Apolipoprotein E and Alzheimer disease: risk, mechanisms and therapy. Nature Reviews Neurology 9, 106–118 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Lumsden AL, Mulugeta A, Zhou A & Hyppönen E Apolipoprotein E (APOE) genotype-associated disease risks: a phenome-wide, registry-based, case-control study utilising the UK Biobank. eBioMedicine 59, 102954 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Gusev A et al. Integrative approaches for large-scale transcriptome-wide association studies. Nature Genetics 48, 245–252 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Wingo AP et al. Integrating human brain proteomes with genome-wide association data implicates new proteins in Alzheimer’s disease pathogenesis. Nature Genetics 53, 143–146 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Hemani G et al. The MR-Base platform supports systematic causal inference across the human phenome. eLife 7(2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Giambartolomei C et al. Bayesian Test for Colocalisation between Pairs of Genetic Association Studies Using Summary Statistics. PLoS Genetics 10, e1004383 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Wallace C A more accurate method for colocalisation analysis allowing for multiple causal variants. PLOS Genetics 17, e1009440 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Yang C et al. Mendelian randomization and genetic colocalization infer the effects of the multi-tissue proteome on 211 complex disease-related phenotypes. Genome Medicine 14, 140 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Zheng J et al. Phenome-wide Mendelian randomization mapping the influence of the plasma proteome on complex diseases. Nature Genetics 52, 1122–1131 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Hemani G, Tilling K & Davey Smith G Orienting the causal relationship between imprecisely measured traits using GWAS summary data. PLOS Genetics 13, e1007081 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Fonseca MI et al. Analysis of the Putative Role of CR1 in Alzheimer’s Disease: Genetic Association, Expression and Function. PLOS ONE 11, e0149792 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Kucukkilic E et al. Complement receptor 1 gene (CR1) intragenic duplication and risk of Alzheimer’s disease. Human Genetics 137, 305–314 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Brouwers N et al. Alzheimer risk associated with a copy number variation in the complement receptor 1 increasing C3b/C4b binding sites. Molecular Psychiatry 17, 223–233 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Vandendreissche S, Cambier S, Proost P & Marques PE Complement Receptors and Their Role in Leukocyte Recruitment and Phagocytosis. Frontiers in Cell and Developmental Biology 9(2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Heneka MT et al. Neuroinflammation in Alzheimer’s disease. The Lancet Neurology 14, 388–405 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Rathore N et al. Paired Immunoglobulin-like Type 2 Receptor Alpha G78R variant alters ligand binding and confers protection to Alzheimer’s disease. PLOS Genetics 14, e1007427 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Schwartzentruber J et al. Genome-wide meta-analysis, fine-mapping and integrative prioritization implicate new Alzheimer’s disease risk genes. Nature Genetics 53, 392–402 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Haaker J et al. Higher anxiety and larger amygdala volumes in carriers of a TMEM132D risk variant for panic disorder. Translational Psychiatry 4, e357–e357 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Lane JM et al. Genome-wide association analyses of sleep disturbance traits identify new loci and highlight shared genetics with neuropsychiatric and metabolic traits. Nature Genetics 49, 274–281 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Polla DL et al. Phenotypic spectrum associated with a CRADD founder variant underlying frontotemporal predominant pachygyria in the Finnish population. European Journal of Human Genetics 27, 1235–1243 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Jabado O et al. RAIDD aggregation facilitates apoptotic death of PC12 cells and sympathetic neurons. Cell Death & Differentiation 11, 618–630 (2004). [DOI] [PubMed] [Google Scholar]
  • 64.Kögel D, Deller T & Behl C Roles of amyloid precursor protein family members in neuroprotection, stress signaling and aging. Experimental Brain Research 217, 471–479 (2012). [DOI] [PubMed] [Google Scholar]
  • 65.Postina R et al. A disintegrin-metalloproteinase prevents amyloid plaque formation and hippocampal defects in an Alzheimer disease mouse model. Journal of Clinical Investigation 113, 1456–1464 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Schmitt U, Hiemke C, Fahrenholz F & Schroeder A Over-expression of two different forms of the α-secretase ADAM10 affects learning and memory in mice. Behavioural Brain Research 175, 278–284 (2006). [DOI] [PubMed] [Google Scholar]
  • 67.Jones ME et al. A genetic variant of the Wnt receptor LRP6 accelerates synapse degeneration during aging and in Alzheimer’s disease. Science Advances 9, eabo7421 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Deczkowska A et al. Disease-Associated Microglia: A Universal Immune Sensor of Neurodegeneration. Cell 173, 1073–1081 (2018). [DOI] [PubMed] [Google Scholar]
  • 69.Konishi H & Kiyama H Microglial TREM2/DAP12 Signaling: A Double-Edged Sword in Neural Diseases. Frontiers in Cellular Neuroscience 12(2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Fournier N et al. FDF03, a Novel Inhibitory Receptor of the Immunoglobulin Superfamily, Is Expressed by Human Dendritic and Myeloid Cells1. The Journal of Immunology 165, 1197–1209 (2000). [DOI] [PubMed] [Google Scholar]
  • 71.Paul SP, Taylor LS, Stansbury EK & McVicar DW Myeloid specific human CD33 is an inhibitory receptor with differential ITIM function in recruiting the phosphatases SHP-1 and SHP-2. Blood 96, 483–490 (2000). [PubMed] [Google Scholar]
  • 72.Zen K et al. Inflammation-induced proteolytic processing of the SIRPα cytoplasmic ITIM in neutrophils propagates a proinflammatory state. Nature Communications 4(2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Zhang JQ, Nicoll G, Jones C & Crocker PR Siglec-9, a Novel Sialic Acid Binding Member of the Immunoglobulin Superfamily Expressed Broadly on Human Blood Leukocytes. Journal of Biological Chemistry 275, 22121–22126 (2000). [DOI] [PubMed] [Google Scholar]
  • 74.Linnartz B, Wang Y & Neumann H Microglial Immunoreceptor Tyrosine-Based Activation and Inhibition Motif Signaling in Neuroinflammation. International Journal of Alzheimer’s Disease 2010, 1–7 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Li Y et al. Functional genomics identify causal variant underlying the protective CTSH locus for Alzheimer’s disease. Neuropsychopharmacology (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Brix K, Dunkhorst A, Mayer K & Jordans S Cysteine cathepsins: Cellular roadmap to different functions. Biochimie 90, 194–207 (2008). [DOI] [PubMed] [Google Scholar]
  • 77.Haves-Zburof D et al. Cathepsins and their endogenous inhibitors cystatins: expression and modulation in multiple sclerosis. Journal of Cellular and Molecular Medicine 15, 2421–2429 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Basak I, Hansen RA, Ward ME & Hughes SM Deficiency of the Lysosomal Protein CLN5 Alters Lysosomal Function and Movement. Biomolecules 11, 1412 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Basak I et al. A lysosomal enigma CLN5 and its significance in understanding neuronal ceroid lipofuscinosis. Cellular and Molecular Life Sciences 78, 4735–4763 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Paushter DH, Du H, Feng T & Hu F The lysosomal function of progranulin, a guardian against neurodegeneration. Acta Neuropathologica 136, 1–17 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Feng T et al. Loss of TMEM106B and PGRN leads to severe lysosomal abnormalities and neurodegeneration in mice. EMBO reports 21, e50219 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Van Deerlin VM et al. Common variants at 7p21 are associated with frontotemporal lobar degeneration with TDP-43 inclusions. Nature Genetics 42, 234–239 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Baker M et al. Mutations in progranulin cause tau-negative frontotemporal dementia linked to chromosome 17. Nature 442, 916–919 (2006). [DOI] [PubMed] [Google Scholar]
  • 84.Wishart DS et al. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Research 46, D1074–D1082 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Akushevich I, Yashkin AP, Kravchenko J & Kertai MD Chemotherapy and the Risk of Alzheimer’s Disease in Colorectal Cancer Survivors: Evidence From the Medicare System. JCO Oncology Practice 17, e1649–e1659 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Fiermonte G et al. Identification of the Mitochondrial Glutamate Transporter: BACTERIAL EXPRESSION, RECONSTITUTION, FUNCTIONAL CHARACTERIZATION, AND TISSUE DISTRIBUTION OF TWO HUMAN ISOFORMS*. Journal of Biological Chemistry 277, 19289–19294 (2002). [DOI] [PubMed] [Google Scholar]
  • 87.Wang R & Reddy PH Role of Glutamate and NMDA Receptors in Alzheimer’s Disease. Journal of Alzheimer’s Disease 57, 1041–1048 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Ge M et al. Role of Calcium Homeostasis in Alzheimer’s Disease. Neuropsychiatric Disease and Treatment Volume 18, 487–498 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.AbdAlla S, Langer A, Fu X & Quitterer U ACE Inhibition with Captopril Retards the Development of Signs of Neurodegeneration in an Animal Model of Alzheimer’s Disease. in International Journal of Molecular Sciences Vol. 14 16917–16942 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Solesio ME et al. Carbonic anhydrase inhibition selectively prevents amyloid β neurovascular mitochondrial toxicity. Aging Cell 17, e12787 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Marigorta UM et al. Transcriptional risk scores link GWAS to eQTLs and predict complications in Crohn’s disease. Nature Genetics 49, 1517–1521 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Regeniter A et al. A modern approach to CSF analysis: Pathophysiology, clinical application, proof of concept and laboratory reporting. Clinial Neurology and Neurosurgery 111, 313–318 (2009). [DOI] [PubMed] [Google Scholar]
  • 93.Jones L et al. Genetic Evidence Implicates the Immune System and Cholesterol Metabolism in the Aetiology of Alzheimer’s Disease. PLoS ONE 5, e13950 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Yokoyama JS et al. Association Between Genetic Traits for Immune-Mediated Diseases and Alzheimer Disease. JAMA Neurology 73, 691 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Haage V & De Jager PL Neuroimmune contributions to Alzheimer’s disease: a focus on human data. Molecular Psychiatry 27, 3164–3181 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Jin SC et al. Coding variants in TREM2 increase risk for Alzheimer’s disease. Human Molecular Genetics 23(2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Timsina J et al. Comparative Analysis of Alzheimer’s Disease Cerebrospinal Fluid Biomarkers Measurement by Multiplex SOMAscan Platform and Immunoassay-Based Approach 1. Journal of Alzheimer’s Disease 89, 193–207 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Chang CC et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4(2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Willer CJ, Li Y & Abecasis GR METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–1 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.del Campo M et al. CSF proteome profiling across the Alzheimer’s disease spectrum reflects the multifactorial nature of the disease and identifies specific biomarker panels. Nature Aging 2, 1040–1053 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Tesi N et al. Centenarian controls increase variant effect sizes by an average twofold in an extreme case–extreme control analysis of Alzheimer’s disease. European Journal of Human Genetics 27, 244–253 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Tesi N et al. Immune response and endocytosis pathways are associated with the resilience against Alzheimer’s disease. Translational Psychiatry 10, 332 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Consortium TU UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Research 51, D523–D531 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Kent WJ et al. The Human Genome Browser at UCSC. Genome Research 12, 996–1006 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Newcombe RG Interval estimation for the difference between independent proportions: comparison of eleven methods. Statistics in Medicine (1998). [DOI] [PubMed] [Google Scholar]
  • 106.Durinck S, Spellman PT, Birney E & Huber W Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nature Protocols 4, 1184–1191 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Yao DW, O’Connor LJ, Price AL & Gusev A Quantifying genetic effects on disease mediated by assayed gene expression levels. Nature Genetics 52, 626–633 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Horton R et al. Gene map of the extended human MHC. Nature Reviews Genetics 5, 889–899 (2004). [DOI] [PubMed] [Google Scholar]
  • 109.Gu Z, Gu L, Eils R, Schlesner M & Brors B circlize implements and enhances circular visualization in R. Bioinformatics 30, 2811–2812 (2014). [DOI] [PubMed] [Google Scholar]
  • 110.Barrett JC, Fry B, Maller J & Daly MJ Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21, 263–265 (2005). [DOI] [PubMed] [Google Scholar]
  • 111.Wang G, Sarkar A, Carbonetto P & Stephens M A simple new approach to variable selection in regression, with application to genetic fine mapping. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82, 1273–1300 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112.Benjamini Y & Hochberg Y Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society: Series B (Statistical Methodology) (1995). [Google Scholar]
  • 113.Friedman J, Hastie T & Tibshirani R Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw 33(2010). [PMC free article] [PubMed] [Google Scholar]
  • 114.Yu G, Wang L-G, Han Y & He Q-Y clusterProfiler: an R Package for Comparing Biological Themes Among Gene Clusters. OMICS: A Journal of Integrative Biology 16, 284–287 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115.Yu G, Wang L-G, Yan G-R & He Q-Y DOSE: an R/Bioconductor package for disease ontology semantic and enrichment analysis. Bioinformatics 31, 608–609 (2015). [DOI] [PubMed] [Google Scholar]
  • 116.Yu G & He Q-Y ReactomePA: an R/Bioconductor package for reactome pathway analysis and visualization. Molecular BioSystems 12, 477–479 (2016). [DOI] [PubMed] [Google Scholar]
  • 117.Oh HS-H et al. Organ aging signatures in the plasma proteome track health and disease. Nature 624, 164–172 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp Resutls
Supp tables

Data Availability Statement

Full GWAS summary statistics from the joint analysis (3,506 samples, 7,008 aptamers) have been deposited in NIAGADS at https://www.niagads.org/knight-adrc-collection under accession #NG00130 to approved investigators. Additionally, full summary statistics are publicly available through the NeuroGenomics & Informatics Center website (https://neurogenomics.wustl.edu/open-science/raw-data/), through a PheWeb browser developed by the Cruchaga lab (https://ontime.wustl.edu/), and through the GWAS catalog (https://www.ebi.ac.uk/gwas/) under accession IDs GCST90421033-GCST90428040 (see Supplementary Data 1 for details). Furthermore, the PWAS weights (all samples and controls-only) used in this study are also available through the NeuroGenomics & Informatics Center website (https://neurogenomics.wustl.edu/open-science/raw-data/) for public download. Additionally, the proteomics data from the Knight-ADRC cohorts (Oct. 2021 & June 2023) have also been deposited in NIAGADS under the same ID.

The proteomics and individual-level genetic data obtained from the ADNI cohort can be requested through ADNI’s website (adni.loni.usc.edu) after access has been approved.

The proteomics and individual-level genetic data obtained from the Knight-ADRC can be requested through the NIAGADS website (ng00130 for proteomics data, ng00127 for genetic data).

The proteomics and individual-level genetic data obtained from the PPMI cohort can be accessed by approved investigators through the PPMI website (https://www.ppmi-info.org/access-data-specimens/download-data).

The proteomics data from the Stanford ADRC can be accessed by approved investigators through the Stanford ADRC approval process (https://med.stanford.edu/adrc/researcher-resources.html)117.

The individual-level data from DIAN cannot be publicly shared, because the rarity of this disease makes all the data identifiable. This has been checked with the IRB and NIH and confirmed it is not possible to put this data in public repositories. However, data is available to approved investigators through https://dian.wustl.edu/our-research/for-investigators/diantu-investigator-resources/dian-tu-biospecimen-request-form/.

The FACE and Barcelona-1 individual-level data cannot be shared due to the European Union’s general data protection regulation (GDPR).

The participants from MARS have not given consent to share individual-level data.

The proteomics data from ADC is available at https://www.synapse.org/PRIDE_AD.

RESOURCES