Abstract
Robust and accessible biomarkers that can capture the heterogeneity of Alzheimer’s disease and its diverse pathological processes are urgently needed. Here, we undertook an investigation of Alzheimer’s disease cerebrospinal fluid (CSF) and plasma from the same subjects (n=18 control, n=18 AD) using three different proteomic platforms—SomaLogic SomaScan, Olink proximity extension assay, and tandem mass tag-based mass spectrometry—to assess which protein markers in these two biofluids may serve as reliable biomarkers of AD pathophysiology observed from unbiased brain proteomics studies. Median correlation of overlapping protein measurements across platforms in CSF (r~0.7) and plasma (r~0.6) was good, with more variability in plasma. The SomaScan technology provided the most measurements in plasma. Surprisingly, many proteins altered in AD CSF were found to be altered in the opposite direction in plasma, including important members of AD brain co-expression modules. An exception was SMOC1, a key member of the brain matrisome module associated with amyloid-β deposition in AD, which was found to be elevated in both CSF and plasma. Protein co-expression analysis on greater than 7000 protein measurements in CSF and 9500 protein measurements in plasma across all proteomic platforms revealed strong changes in modules related to autophagy, ubiquitination, and sugar metabolism in CSF, and endocytosis and the matrisome in plasma. Cross-platform and cross-biofluid proteomics represents a promising approach for AD biomarker development.
Supplementary Information
The online version contains supplementary material available at 10.1186/s13195-022-01113-5.
Introduction
Alzheimer’s disease (AD) is a growing public health problem with no available disease-modifying therapies. Multi-omic analyses of AD brain have illustrated the varied and complex pathophysiology beyond amyloid-β (Aβ) plaques and tau neurofibrillary tangles [1–5], but how these pathological processes develop over time during the disease course is unclear. Given that AD is a heterogeneous disease, composed of different combinations and degrees of brain pathologies in a given individual, multiple biomarkers beyond Aβ and tau will be required to advance our understanding of the complex disease processes underlying AD. One of the current limitations for advancement of AD research, clinical care, and therapeutic development is the lack of easily accessible biofluid biomarkers for these varied pathological processes.
Protein biomarkers represent a promising class of biomarkers for AD given the large diversity of potential markers, their direct role in subserving biological processes, and the fact that standard protein affinity-based measurement approaches are already deployed in most clinical laboratories around the world. Three main quantitative protein measurement technologies are currently available to conduct proteomic discovery experiments in biofluids at scale: mass spectrometry, multiplexed nucleic acid aptamers, and multiplexed antibodies. Mass spectrometry (MS) has been used most extensively to date in AD biofluid biomarker discovery research and provides a direct measurement of protein identity and abundance through the measurement of peptides [3, 6–8]. Through the use of isobaric tandem mass tags (TMT) or data-independent acquisition (DIA) techniques [9], cohorts of hundreds of subjects can be analyzed at a depth of thousands of proteins in CSF and plasma. Traditionally, depth of analysis by MS is limited by the large dynamic range of protein concentration present in CSF and, especially, in plasma, as well as the problem of missing values that accumulate when analyzing larger cohorts [10–12]. Specificity and accuracy may be affected by ion interference as the complexity of the matrix increases. More recently, two affinity-based proteomic technologies have become available for biofluid measurements that offer different advantages and disadvantages compared to MS for protein measurement in biofluids: the SomaScan® aptamer-based technology from SomaLogic, and proximity extension assay (PEA) technology from Olink®. SomaScan uses modified DNA aptamers (SOMAmers) with slow off-rate binding kinetics to measure relative protein levels in multiplex fashion [13–15]. Multiple SOMAmers can be generated towards a given protein target and included in the microarray-based readout of relative fluorescence intensity. PEA uses a sandwich antibody-based approach where the capture and detection antibodies are conjugated to a complementary oligonucleotide probe pair, the levels of which are ultimately measured by quantitative PCR or next-generation sequencing approaches to provide a relative protein abundance value [16, 17]. Measurement specificity is provided both through the dual epitope antibody binding as well as through specific oligonucleotide probe hybridization. As affinity-based approaches, both SomaScan and PEA (subsequently referred to as “Olink”) theoretically suffer less from dynamic range and missing value challenges compared to MS. However, because both are affinity-based, they are indirect measurements of protein identity and abundance. Specificity and accuracy may be affected by protein modifications or off-target binding, and sensitive and specific reagents must be designed for each protein. Comparison of SomaScan and Olink measurements for the same proteins in various bodily fluids have previously been described [18–22]. However, to our knowledge, no studies have compared these affinity-based measurements with mass spectrometry measurements, particularly across different biofluids.
In order to further the development of robust biofluid biomarkers of AD that can reflect multiple pathological processes in brain, we conducted a proteomic analysis on CSF and plasma by applying each proteomic technology described above to the same discovery set of AD and control samples. By analyzing the same samples with each proteomic technology, we were able to conduct an in-depth cross-platform technical analysis to better understand the current strengths and limitations inherent in each platform for AD biofluid biomarker development in CSF and plasma, and increase confidence in proteins that show promise as potential AD biomarkers. Furthermore, because the CSF and plasma samples we analyzed were matched within subject, we were able to explore the relationship between CSF and plasma levels of promising AD biomarkers within subject. We leveraged the combined proteomic datasets to generate an AD protein co-expression network in CSF and plasma and explored how protein co-expression in these two fluids might be related to each other and to AD brain protein co-expression. We found strong co-expression signals for proteostasis, synaptic biology, sugar metabolism, complement, and TGF-β signaling in AD CSF, and endocytosis, matrisome, and complement in AD plasma. Multi-platform proteomic analysis of AD biofluids holds promise for identification of robust biomarkers for clinical translation.
Results
Pre-processing and technical analyses of proteomic measurements in CSF and plasma
For our discovery experiments and cross-platform proteomic analyses, we used CSF and plasma from a cohort of control (n=18) and AD (n=18) patients in the Emory Goizueta Alzheimer’s Disease Research Center (ADRC) (Fig. 1, Additional file 2: Supplementary Table 1). The CSF and plasma samples were drawn at or near the same time point for each subject. All samples were analyzed by each proteomic platform except for one subject (n=35/36), whose CSF and plasma were analyzed only by SomaScan. This subject was excluded from all direct cross-platform comparisons, and therefore, such analyses were restricted to N=35 subjects. For both CSF and plasma, mass spectrometry measurements were performed using isobaric tandem mass tags (TMT-MS) with pre-fractionation [7], including both with and without prior depletion of highly abundant proteins in each fluid. For PEA analyses, we used all thirteen qPCR-based human biomarker panels available through Olink, encompassing 1196 protein assays (1160 unique proteins). For aptamer-based analyses, we used the SomaScan assay (v4.1) from SomaLogic, which provides 7288 SOMAmers targeting 6596 unique proteins.
Analysis of the signal-to-noise (S:N) properties of each platform showed that a large proportion of the SomaScan measurements in CSF were at or near background noise level (Additional file 1: Supplementary Figure 1A). By contrast, S:N was acceptable for nearly all SOMAmers in plasma (Additional file 1: Supplementary Figure 1B). To address this limitation, we empirically determined a S:N cutoff for SOMAmers in CSF by correlating measurements in common across all three proteomic platforms at different SOMAmer S:N thresholds, and selected a S:N threshold where the correlations were maximized (Additional file 1: Supplementary Figure 1C). This S:N threshold was 0.45. Applying this threshold to the SomaScan data reduced the number of quantified SOMAmers in CSF from 6776 to 3624 (Additional file 1: Supplementary Figure 1D, Additional file 2: Supplementary Table 2). This reduced set of SomaScan CSF measurements was used for most subsequent analyses.
We also analyzed missing measurements present in each platform across 36 CSF and plasma samples (Additional file 1: Supplementary Figure 2). The Olink and SomaScan platforms had a similar increase in missing values across samples, which was greater in CSF than in plasma. TMT-MS suffered more from missing values in both fluids, particularly in plasma. In plasma undepleted of highly abundant plasma proteins, a maximum of approximately 500 quantified proteins was reached within 2 batches of TMT-MS. In plasma depleted of the top fourteen most abundant proteins, this threshold had nearly been reached at the point when all 36 samples had been analyzed. We decided to set a threshold of <75% missing values (or measurement in at least 9 out of 36 samples) for subsequent individual protein analyses and exclude proteins with higher levels of missing values from consideration. Protein measurements that met this threshold were well balanced across AD and control cases. After applying the S:N and missingness filters, a large proportion (50.6%) of the SomaScan measurements in CSF, and a significant proportion (13.5%) of the TMT-MS measurements in depleted plasma, were removed from consideration for individual protein analyses.
In order to assess whether highly abundant protein depletion significantly affected TMT-MS measurements in CSF and plasma, we correlated protein values before and after depletion of these top fourteen most abundant proteins within subject (Additional file 1: Supplementary Figure 3A). Median correlation after depletion in CSF was excellent in both CSF (r=0.78) and plasma (r=0.69), with greater variability introduced by depletion in plasma. Correlation was also good at the group level when proteins that were significantly altered in AD in either depleted or undepleted CSF (Additional file 1: Supplementary Figure 3B) or plasma (Additional file 1: Supplementary Figure 3C) were correlated with depletion versus no depletion. To avoid setting an arbitrary correlation threshold, we excluded all proteins measured by TMT-MS in CSF and plasma that had correlation values of zero or below (i.e., anticorrelated) or that had an opposite direction of change in AD, after highly abundant protein depletion. This totaled 32 proteins in CSF, and 27 proteins in plasma (Additional file 2: Supplementary Table 3). The final protein abundance matrices used for individual protein analyses and cross-platform comparisons therefore included the <75% missingness filter across all platforms and fluids, the S:N filter for SomaScan CSF, and excluded proteins that were strongly affected by highly abundant protein depletion in CSF and plasma from TMT-MS measurements.
Cross-platform comparisons
Across all three platforms, we were able to measure a total of 4655 unique proteins (as represented by unique gene symbols) in CSF, and 6794 unique proteins in plasma (Fig. 2A, Additional file 2: Supplementary Tables 4-7). The SomaScan platform provided the deepest proteomic coverage in plasma, measuring 4662 proteins not measured by Olink or MS. Most of the proteins that could be measured in plasma could also be measured in CSF by Olink (Fig. 2B, Additional file 2: Supplementary Table 8), whereas due primarily to our S:N filter, only approximately half of the proteins that could be measured in plasma by SomaScan could also be reliably measured in CSF (Additional file 2: Supplementary Table 9). Over twice as many proteins could be measured in CSF compared to plasma on depleted fluid using TMT-MS due to the larger number of highly abundant proteins in plasma and the protein dynamic range limitations of unbiased discovery MS-based approaches. Only marginal improvement in proteomic coverage was observed in MS with depleted versus undepleted fluid in both CSF and plasma (Additional file 1: Supplementary Figure 4A, Additional file 2: Supplementary Tables 10 and 11), with depth of coverage improvement more apparent in CSF than in plasma (Additional file 1: Supplementary Figure 4B). Ontologies of proteins uniquely measured by the SomaScan platform in CSF and plasma included nucleic acid metabolism and binding, and nucleus (Additional file 1: Supplementary Figure 4C), suggesting that the platform was enriched for measurement of nuclear proteins. Ontologies unique to Olink included mitotic cell cycle, immune processes, and plasma membrane, reflecting selection bias of these biological pathways in the Olink platform compared to the other platforms. Ontologies unique to MS included transmembrane transport, complement, and cytoskeleton/structural proteins, representing more highly abundant proteins.
We correlated protein measurements across all three platforms in CSF and plasma (Fig. 3, Additional file 2: Supplementary Tables 12-17, Additional file 1: Extended Data). Median correlation within subject was approximately 0.7 for CSF, with a similar distribution of correlation values among platforms. Median correlation was slightly lower in plasma at approximately 0.6, with more variability between the MS and affinity-based measurements than between SomaScan and Olink affinity-based measurements. Other than slightly lower overall correlation, these correlation patterns were generally similar when analyzed at the group level rather than within subject (Additional file 1: Supplementary Figure 5). Improvement in median correlation was generally observed in plasma when only proteins that were significantly altered in AD in a given platform were used for correlation (Fig. 3B), suggesting that inclusion of proteins with lower S:N led to decreased correlation. Interestingly, in CSF, the improvement in correlation was only observed with MS-based measurements. In summary, median correlation of proteomic measurements between platforms within the same subject was quite good, with better correlation in CSF than in plasma.
To compare how proteomic measurements in our discovery cohort compared to Olink and SomaScan measurements in other AD cohorts, we performed correlation analyses with plasma Olink data from a Hong Kong-based cohort [23], CSF and plasma Olink data from the BioFinder cohort [24], and SomaScan plasma data from the AddNeuroMed cohort [25] (Additional file 1: Supplementary Figure 6). Correlations were restricted to proteins significantly altered in AD in each biofluid to maximize S:N. In AD plasma, correlation of Olink measurements in the Hong Kong cohort with our discovery cohort Olink measurements was excellent (r=0.82), with lower but strong correlation with SomaScan (r=0.57) and MS (r=0.63) measurements in our cohort (Additional file 1: Supplementary Figure 6A). When comparing BioFinder Olink CSF measurements with our CSF measurements, correlation was also excellent across all measurement platforms (r~0.7) (Additional file 1: Supplementary Figure 6B). However, BioFinder Olink plasma measurements did not correlate with our discovery cohort platform plasma measurements (Additional file 1: Supplementary Figure 6C). SomaScan plasma measurements in the AddNeuroMed cohort also did not correlate with any of our platform plasma measurements (Additional file 1: Supplementary Figure 6D). These findings suggest that our cohort was most similar to the Hong Kong cohort and that pre-analytical factors unique to each cohort likely significantly influenced the plasma measurements in each cohort.
Proteins of lower abundance are decreased in AD plasma
To determine which proteins were significantly altered across platforms in AD CSF and plasma, we performed differential abundance analyses within each fluid for each proteomic platform (Fig. 4, Additional file 1: Supplementary Figure 7). The analyses were performed without median normalization of overall protein abundance levels between AD and control cases, given that biomarker measurements in a clinical setting do not undergo median normalization [26, 27]. While a greater number of proteins were found to be decreased in AD CSF across all platforms, the decrease in plasma proteins in AD was much greater than in CSF and was strikingly apparent across all platforms (Fig. 4A). This finding was consistent with the strong bias towards lower protein abundance observed in AD plasma in the Hong Kong cohort, in which the data had undergone some degree of median normalization prior to differential abundance analysis [23]. Overlap of differentially abundant proteins was low to modest across platforms (Fig. 4B), due likely in part to the smaller size of the cohort and less statistical power to observe significant differences. Given the clear abundance differences observed in AD plasma across platforms, we further investigated this phenomenon by exploring which proteins were driving the difference in abundance. We first ranked each platform measurement by its contribution to the overall signal within each fluid and tested the difference between the top 5% strongest signals in each platform compared with the total signal (Additional file 1: Supplementary Figure 8). By this approach, we found that both the top 5% and overall protein levels were decreased in AD plasma by SomaScan and Olink, but not MS. However, because SOMAmer relative fluorescence units and Olink normalized protein expression values do not necessarily correlate with absolute protein abundance, we also calibrated measurements in each platform to known absolute protein concentrations in plasma obtained from the Human Protein Atlas [28]. After calibration, we observed that the top 5% most highly abundant proteins were not decreased in AD plasma in any platform, but that the overall decrease in AD plasma proteins was driven by proteins of lower abundance. This was consistently observed across platforms except for undepleted plasma analyzed by MS, in which many fewer proteins of lower abundance were measured. In summary, we observed a decrease in CSF and plasma proteins in AD compared to control, with the striking bias in plasma driven by proteins of lower abundance.
Brain protein network module coverage by platform in CSF and plasma
We recently generated a consensus AD brain protein co-expression network from over 500 brain tissues as part of the Accelerating Medicines Partnership for Alzheimer’s Disease (AMP-AD) initiative that revealed many modules strongly correlated to AD neuropathological traits and cognitive decline [2] (Fig. 5A). To determine the potential for AD-relevant brain modules to be measured by markers in CSF and plasma, we calculated the percent coverage in CSF and plasma for each of the 44 brain network modules by proteomic platform (Fig. 5B). All modules had at least some coverage in CSF, with SomaScan and MS providing the most module coverage in CSF compared to the Olink 1196 platform. In plasma, SomaScan provided the most module coverage. Brain module M26 complement/acute phase was particularly well covered by SomaScan and MS in both CSF and plasma, and M42 matrisome was well covered in both fluids by all platforms. Given that M42 matrisome had the strongest correlation to AD neuropathological traits in brain, we more closely examined this module across platforms in both fluids. M42 hub proteins—or proteins that contribute most to the module eigenprotein and are drivers of module co-expression—were generally well measured by all platforms in CSF (Fig. 6A). In plasma, M42 hub protein coverage was best with SomaScan and Olink, and especially with SomaScan. SMOC1, which was the strongest driver of M42 co-expression, could be measured in CSF and plasma by both Olink and SomaScan, and measurements were well correlated in both fluids between the two platforms (Additional file 1: Supplementary Figure 9A, B). Levels of SMOC1 were elevated in both AD CSF and plasma despite the decrease in lower abundant proteins in AD plasma (Fig. 6B). We leveraged Olink CSF and plasma data from control (n=90) and Parkinson’s disease (PD, n=118) subjects in the Accelerating Medicines Partnership for Parkinson’s Disease (AMP-PD) consortium to test the specificity of SMOC1 for AD (Fig. 1B). We did not observe an increase in SMOC1 in PD CSF and observed a weak increase in PD plasma (Fig. 6C). We also tested specificity of SMOC1 for AD by measuring SMOC1 CSF levels using TMT-MS in a separate cohort comprised of control, AD, ALS, FTD, and PD subjects as previously described in Higginbotham et al. [7] Elevation of SMOC1 in CSF was specific to AD. SMOC1 levels were generally well correlated between CSF and plasma within AD subjects but not controls (Fig. 6D). CSF and plasma SMOC1 levels also correlated with Aβ/Tau levels in CSF (Fig. 6E). In plasma, this correlation was driven by group differences and was not significant within group (Additional file 1: Supplementary Figure 9C). SMOC1 levels did not correlate strongly with cognitive function in AD (Additional file 1: Supplementary Figure 9D) or PD (Additional file 1: Supplementary Figure 9E). Interestingly, SMOC1 levels correlated weakly with age in both CSF and plasma (Fig. 6F), an association which has been previously described in plasma [29, 30].
Hub proteins of other AD-relevant brain co-expression modules could also be measured in CSF and plasma (Fig. 7). These included HOMER1 in the M5 Post-Synaptic Density module (Additional file 1: Supplementary Figure 10), NEFL in the M3 Oligodendrocyte/Myelination module (Additional file 1: Supplementary Figure 11), CHI3L1—also known as YKL-40—in the M21 MHC Complex/Immune module (Additional file 1: Supplementary Figure 12), YWHAZ in the M4 Synapse/Neuron module (Additional file 1: Supplementary Figure 13), ENO1 in the M7 MAPK Signaling/Metabolism Module (Additional file 1: Supplementary Figure 14), and PEBP1 in the M25 Sugar Metabolism Module (Additional file 1: Supplementary Figure 15). All of these proteins were increased in AD CSF, yet only NEFL and CHI3L1 were also increased in AD plasma, demonstrating the diversity of potential AD biomarker changes across different fluids. Furthermore, not all proteins within a brain module were found to behave similarly in CSF and plasma. For instance, YWHAZ in the M4 Synapse/Neuron module was observed to be increased in CSF and decreased in plasma, whereas NPTXR, another M4 protein, was found to be decreased in both CSF and plasma (Additional file 1: Supplementary Figure 16). NPTXR also illustrated a discrepancy in platform measurements, where the Olink measurement was significantly negatively correlated with the MS and SomaScan measurements in CSF, but was more similar to the SomaScan measurement than the MS measurement in plasma (Additional file 1: Supplementary Figure 16A). Another example of a protein with discrepancy in measurements across fluids was SPP1 in the M21 MHC Complex/Immune module (Additional file 1: Supplementary Figure 17). SPP1 measurements best correlated between MS and SomaScan in CSF, with the Olink measurement being anticorrelated to the other two platforms. However, in plasma, SomaScan and Olink SPP1 measurements correlated well, whereas MS did not with either affinity-based platform. NEFL was also highly correlated among all platforms in CSF, but was anticorrelated between SomaScan and Olink in plasma. As has been previously demonstrated, NEFL levels were strongly correlated with increasing age [31] (Additional file 1: Supplementary Figure 11F). In summary, AD brain co-expression module proteins could be measured by all three platforms in CSF and plasma, but SomaScan had the best coverage in plasma, especially for the M42 matrisome module. SMOC1, a hub of M42, was elevated in both AD CSF and plasma and levels correlated within subject between CSF and plasma. Other AD brain module protein hubs could also be measured in CSF and plasma, but opposite directions of change were often observed between CSF and plasma protein levels of these hubs, and protein measurements did not always positively correlate across proteomic platforms.
AD CSF co-expression network reveals strong disease-related modules reflecting proteostasis, synaptic, complement, and sugar metabolism pathophysiology
While AD-related protein co-expression modules have been reliably identified in brain, to date it has been unclear whether co-expression modules related to AD are also present in biofluids. To address this question, we leveraged all three proteomic platforms and harmonized their measurements by median normalization into separate protein abundance matrices for CSF and plasma, and used these harmonized abundance matrices to build co-expression networks for each fluid. We were then able to compare these networks to one another, and to the consensus AD brain network (Fig. 1C). Using this approach, we built a CSF co-expression network from 7158 protein assays targeting 4154 unique gene symbols (Fig. 8A, Additional file 2: Supplementary Tables 18 and 19, Additional file 1: Extended Data). The network consisted of 38 modules, with each platform contributing measurements to nearly all modules (Additional file 1: Supplementary Figure 18). Modules that were most strongly correlated to Aβ and tau pathological measures in CSF and/or cognitive function included M15 post-synaptic membrane, M8 autophagy, M7 SNAP receptor/SNARE complex, M32 synaptic membrane/matrisome, M16 sugar metabolism, M29 sugar metabolism/actin depolymerization, M24 ubiquitination, M26 TGF-β signaling, and M3 complement/protein activation cascade. M8 autophagy and M24 ubiquitination modules were particularly strongly correlated to total tau and p-tau181 levels. The M8 autophagy module contained microtubule associated protein tau (MAPT) and SMOC1 as members, as well as other markers previously associated with AD such as NEFL and PEBP1 (Fig. 8B). M8 module eigenprotein levels were strongly negatively correlated with cognitive function (r= –0.67) and Aβ42/tau ratio (r= –0.82), and strongly positively correlated with total tau (r=0.86) and p-tau181 (r=0.78) (Fig. 8C), reflecting its close association to AD brain amyloid-β and tau pathology. Among the other modules that correlated strongly with traits were M29 sugar metabolism/actin depolymerization, which was most strongly correlated to APOE ε4, and M26 TGF-β signaling, which was strongly positively correlated with age. The M8 autophagy and M15/M32 synaptic modules were enriched in neuronal and oligodendrocyte cell type markers, potentially reflecting the brain cell type origin of these CSF modules. The M24 ubiquitination and M29 sugar metabolism/actin depolymerization modules did not have cell type character, whereas the M3 complement/protein activation cascade module was enriched in endothelial and microglial markers.
We tested whether CSF modules were present in brain or plasma by two different approaches: over-representation analysis (ORA), and network preservation statistics. We also tested how the module eigenproteins changed in CSF between AD and control, and whether the cognate module eigenprotein (or “synthetic” eigenprotein) in plasma and brain were altered in AD (Fig. 8A, Additional file 1: Extended Data). The CSF M3 complement/protein activation cascade module was most strongly preserved in plasma and brain and was decreased in AD CSF. M26 TGF-β signaling was also decreased in AD CSF. Modules that were increased in AD CSF included the M15 and M32 synaptic, and M8 and M24 proteostasis modules, along with M16 sugar metabolism. Most of the synthetic eigenproteins for these modules were decreased in plasma except for the M3 complement/protein activation cascade module, which was increased in both plasma and brain.
In summary, we were able to construct an AD CSF protein co-expression network from >7000 protein assays in CSF which revealed strong disease-associated modules related to proteostasis, synaptic biology, sugar metabolism, and complement. All module eigenproteins were increased in CSF and decreased in plasma except for complement, which was decreased in CSF and increased in plasma.
AD plasma co-expression network reveals strong disease-related modules reflecting endocytosis and matrisome pathophysiology
The plasma network included 9589 protein assays targeting 6614 unique gene symbols, and consisted of 35 modules with good platform measurement representation across the network, similar to the CSF network (Fig. 9A, Additional file 2: Supplementary Tables 20 and 21, Additional file 1: Supplementary Figure 18). The SomaScan platform contributed approximately 80% of the measurements in the network. A striking feature of the plasma network was the number of modules related to extracellular matrix biology and the matrisome that correlated with AD CSF Aβ and tau biomarkers. One such module was the M33 adhesion/ECM/wound response module whose eigenprotein—along with other matrisome-related module eigenproteins—was elevated in AD plasma despite the decrease in lower abundance plasma proteins in AD (Fig. 9B, C). M33 module co-expression was driven by tenascin (TNC), an extracellular matrix protein involved in neuronal migration and regeneration, as well as synaptic plasticity. TNC was measured by ten separate assays across the three platforms, eight of which were SOMAmers. Eight of the ten TNC assays fell within M33, including six SOMAmers and one Olink and one MS measurement, suggesting good correlation for most TNC measurements across platforms. SPP1 as measured by Olink and SomaScan were also members of M33. Another module strongly related to AD was the M24 endocytosis module, which was the module most strongly correlated to total tau levels in CSF. Interestingly, this module was not strongly preserved in CSF or brain, potentially reflecting a more systemic process associated with AD. Plasma modules were generally less well preserved in CSF and brain than CSF modules were preserved in plasma and brain. Brain protein co-expression was generally not strongly preserved in CSF or plasma except for the complement module, which was highly preserved across all tissues (Additional file 1: Supplementary Figure 19 and 20, Additional file 2: Supplementary Table 22).
We compared our 3-platform plasma network using module ORA to a serum network built from approximately 5000 SOMAmers previously reported by Emilsson et al. [32] (Additional file 1: Supplementary Figure 21). The serum network had fewer modules than the plasma network (27 versus 38). Over half of the serum modules had significant overlap in plasma by ORA. One of these modules was serum module 11—a lipid module with many module protein levels affected by variation in the APOE locus. Serum module 11 overlapped with plasma modules M15 Lipid Biosynthesis/Immune Response and M32 Lipoprotein Metabolism. M15 co-expression was driven by ApoE and levels of this module decreased most strongly with increasing number of APOE ε4 alleles, whereas M32 co-expression was strongly driven by ApoB and module levels increased most strongly with the number of APOE ε4 alleles (Fig. 9A). Therefore, the 3-platform plasma network provided sufficient resolution to identify lipoprotein-related protein co-expression modules divergent in their relationship to APOE ε4 genotype.
Given our findings with the discrepancy in levels of individual AD-related proteins between CSF and plasma, we tested whether CSF and plasma co-expression modules also showed discrepancy in the levels of their eigenproteins and synthetic eigenproteins in the paired fluid (Fig. 10). Like many individual proteins, we also observed an inverse relationship of module levels in plasma compared to the levels in CSF in AD. One notable exception was the M3 complement/protein activation cascade module, where the within-subject module eigenprotein was increased in AD plasma compared to AD CSF. In plasma, the within-subject eigenprotein relationship to CSF was noisier and did not show a strong discrepancy between fluids for most modules. An exception again was the plasma M8 protein activation cascade module, which was increased in AD plasma compared to CSF for most subjects despite the general decrease in protein abundances in AD plasma.
In summary, we were able to build AD CSF and plasma protein co-expression networks using measurements from all three proteomic platforms, providing excellent proteomic depth in each fluid. Synaptic, proteostasis, sugar metabolism, and complement modules were strongly altered in AD CSF, while matrisome, endocytosis, and complement modules were altered in AD plasma. The complement modules were the modules best preserved across brain, CSF, and plasma and were observed to be increased in brain and plasma, but decreased in CSF. AD-related CSF module eigenproteins were generally increased in AD but decreased in plasma, likely due in part to the overall decrease of low abundance plasma proteins in AD. The relationship of plasma modules to CSF was more variable, reflecting the contribution of many tissues other than brain to plasma protein co-expression.
Discussion
In this study, we used three different proteomic technologies to interrogate matched CSF and plasma from a discovery cohort of AD subjects in order to identify and assess promising protein biomarkers for AD. We found that overall correlation among the proteomic platforms was good, with weaker correlation in plasma. We observed a general decreased expression level of lower abundance proteins in AD CSF and plasma, most notably in plasma. However, despite this general decrease, proteins that are important drivers of AD brain co-expression modules such as SMOC1 were increased in AD plasma and show promise as accessible biomarkers of AD brain pathology. Co-expression analysis showed strong changes in AD CSF related to proteostasis, sugar metabolism, synaptic biology, and complement pathways. Analysis of plasma showed matrisome, complement, and endocytosis modules strongly correlated with AD. These modules may themselves represent promising AD biofluid biomarkers potentially more robust to analytical and natural biological variation than individual protein markers.
SOMAmer signals were much lower in CSF compared to plasma, an important consideration when interpreting SomaScan assay data in this fluid. We used an empirically derived threshold to remove assays that did not meet our criteria for acceptable S:N, which removed approximately half of the assays from consideration for most of our analyses. The approach to handle S:N for SomaScan in CSF will depend on the context and needs of the desired analysis. There were no issues with S:N for SomaScan in plasma, reflecting optimization of the platform for this matrix.
Correlation of protein measurements across platforms was good, with a median r of approximately 0.7 in CSF and 0.6 in plasma. The median correlation in plasma was generally higher than what has previously been observed comparing Olink and SomaScan assays [18, 19], perhaps because of the larger number of assays compared between these two platforms in this study and the lack of normalization of the SomaScan data [18]. The correlations improved when considering only proteins that are altered in AD, suggesting that measurement variability within a given assay reduced the correlation when S:N for a given assay was low. The source of this noise is likely biological, particularly in plasma where protein levels are influenced by multiple tissues and organ systems and other factors unrelated to AD. An important consideration when interpreting cross-platform correlation is also protein isoform, or “proteoform,” complexity in plasma versus CSF, including splicing variation and post-translational modifications (PTMs), as well as protein complex formation. Such proteoform complexity is very likely to be a large driver of variation across proteomic measurements within and across platforms, where a given platform is targeting a particular epitope or peptide for measurement of protein levels that could be obscured by splice variation, PTMs, or complex formation [18, 33]. This is likely the source of variation in SPP1 measurement noted above, which has greater than 40 known phosphorylation sites and 5 glycosylation sites. Another source of variation is also off-target binding, or ion co-isolation and interference in the case of MS, which is a larger issue in more complex matrices such as plasma. In this context, one current advantage for the SomaScan and MS platforms is the use of multiple aptamers or peptides for the measurement of some proteins, which could allow for consideration of such biological and technical variation. An example of this is the SomaScan measurement of TNC described above, where six of the eight SOMAmers correlated with one another. Multiple assays for other AD-related proteins, such as NEFL, CHI3L1, NPTXR, and SPP1 would be a welcome addition to proteomic platforms. In-depth characterization of the actual protein species being measured by an assay in a given platform will significantly advance our understanding of proteoforms related to disease.
A key observation that arose from our analyses was the strong bias for proteins of medium to low abundance to be decreased in AD plasma. Our AD plasma data were similar to those in a recently described Hong Kong cohort analyzed by Olink where this strong bias was also observed [23]. The same bias has also been observed in a TMT-MS study [34]. The basis of this observation is not currently clear, but could represent general protein translation reduction [35] in a systemic fashion in AD that somewhat spares proteins of higher abundance such as albumin, which are the primary drivers of gross plasma protein level measurements in clinical assays. Alteration of the blood-brain barrier in AD may also contribute to the observed discrepancy between the CSF and plasma levels of some proteins [8, 36]. We chose not to perform median normalization prior to analysis given that we are most interested in actual measured levels for clinical biomarker discovery and translation. Therefore, proteins that remain elevated in plasma without median normalization, such as SMOC1, represent highly promising markers for AD. SMOC1 has been shown in prior studies to be significantly altered in both brain and CSF [7, 34, 37]. SMOC1 may be an excellent marker for the M42 brain matrisome module in CSF and plasma given the fact that it is a key driver of M42 co-expression in brain. M42 is strongly related to amyloid deposition [2, 38], and the fact that CSF and plasma SMOC1 did not correlate with cognitive function is consistent with this association given that amyloid also does not strongly correlate with cognitive function [39] and M42 levels are not an independent driver of cognitive decline after adjustment for AD neuropathology [2]. Other brain module hub proteins as described above may also represent promising biofluid biomarkers for brain processes. The relationship in the levels of many of these markers between CSF and plasma appears more complicated than for SMOC1, with some showing opposite directions of change in AD plasma compared to CSF, consistent with prior observations in a TMT-MS study in which a number of AD markers that were observed to be increased in CSF were decreased in serum [34]. While some of this variation may be due to peripheral sources of the protein, it is also possible that exchange of certain proteins from CSF to the plasma compartment is regulated. Further investigation into such possible regulatory mechanisms is warranted.
One potential way to deal with variation in any one biomarker is to construct panels of markers that reflect a particular biological process, where the composite level of the panel becomes the measurement of the biological process of interest [7]. In this study, we constructed protein co-expression networks in CSF and plasma to illustrate this potential approach to AD biomarker development. We were able to incorporate all three proteomic platform measurements into these networks, which were based on >7100 protein measurements in CSF, and >9500 protein measurements in plasma. To our knowledge, these are the deepest proteomic analyses of these two fluids to date. The CSF network revealed autophagy and ubiquitination modules that were strongly correlated to current AD CSF biomarkers, indicating disruption of proteostasis is a strong disease-related signal in AD CSF. Other strong disease-related signals included alterations in synaptic biology, sugar metabolism, and complement, all of which have been previously described in AD brain [2, 3]. In plasma, these modules were not as well defined. Whether the CSF eigenproteins for these modules will translate into potential plasma biomarkers is currently unknown and will require further study in larger cohorts. One module that will likely translate across tissues is the complement module, which was highly preserved across brain, CSF, and plasma. Complement proteins have previously been shown to be increased in both AD plasma and brain by TMT-MS. [40] Interestingly, while complement module levels were increased in brain and plasma, they were decreased in CSF, suggesting that complement deposition in brain leads to discordance in brain-CSF levels similar to that seen with brain Aβ deposition [41]. However, in contrast to Aβ, the source of complement in the brain may be derived largely from peripheral sources given the observed elevation in plasma. This hypothesis would be consistent with recent findings on peripheral factors that influence brain pathophysiology in AD [42].
Our study was conducted on a small discovery cohort of subjects. At n=36, we were powered to detect a correlation rho of 0.45 at 80% power and p=0.05. For analyses of differential abundance between AD and control groups considering all proteins measured, we were powered to detect a fold change of ~1.2 to ~1.3. For correlation network analyses, it is generally advisable to include at least 20 total samples to avoid spurious correlations according to Oldham et al. [43] Here, we used 35 independent samples for both CSF and plasma networks. We were therefore sufficiently powered to draw conclusions about differential abundance, cross-platform correlations, and correlation networks. Additional studies in larger cohorts will be required to further validate these findings.
In summary, multi-platform proteomic analysis of AD CSF and plasma is a promising approach to further development of biomarkers that can reflect the complex and multifaceted processes that comprise AD and that can enable patient stratification, diagnostics and disease monitoring, and therapeutic development.
Methods
CSF and plasma samples and case classification
All CSF and plasma samples used in this study were collected under the auspices of the Emory Goizueta Alzheimer’s Disease Research Center (ADRC). The cohort consisted of 18 healthy controls and 18 patients with AD. Basic demographic data were obtained from the Goizueta ADRC. Controls and patients with AD received standardized cognitive assessments in the Emory Cognitive Neurology Clinic or Goizueta ADRC. CSF and plasma were collected at or near the same time point in each individual and banked according to the 2014 National Institute on Aging best practice guidelines for Alzheimer’s Disease Centers (https://alz.washington.edu/BiospecimenTaskForce.html). CSF samples were subjected to ELISA Aβ1–42, total tau, and p-tau181 analysis by the INNO-BIA AlzBio3 Luminex Assay [44]. ELISA values were used to support diagnostic classification based on established AD biomarker cutoff criteria [45, 46]. APOE genotype was determined by extracting DNA from the plasma buffy using the GenePure kit (Qiagen) following the manufacturer’s recommended protocol, then determining the rs7412 and rs429358 genotypes using either an Affymetrix Precision Medicine Array (Affymetrix) or TaqMan assays (Thermo Fisher Scientific C_904973_10 and C_3084793_20). All samples were analyzed by each proteomic platform except for one subject, whose CSF and plasma were analyzed only by SomaScan. All Emory research participants provided informed consent under protocols approved by the Institutional Review Board at Emory University. Summarized case metadata is provided in Additional file 2: Supplementary Table 1.
Quantification of proteins by Olink proximity extension assay (PEA)
Proteins were quantified by PEA as previously described [17]. Aliquots of CSF and plasma from each subject were sent to Olink (Olink Proteomics, Uppsala, Sweden) for analysis on 13 human Olink Target 96 panels (cardiometabolic, cardiovascular II, cardiovascular III, cell regulation, development, immune response, inflammation, metabolism, neuro exploratory, neurology, oncology II, oncology III, and organ damage). All samples passed quality control measures and were randomized by Olink prior to analysis on single plates. Results were reported as Normalized Protein eXpression (NPX) values in log2 scale for relative quantification of protein abundance.
Quantification of proteins by SomaLogic SomaScan modified aptamers
Proteins were quantified by SomaScan as previously described [13, 14]. Aliquots of CSF and plasma from each subject were sent to SomaLogic (SomaLogic, Boulder, CO) for analysis using the modified aptamer SomaScan assay (v4.1). All samples passed quality control measures and were randomized by SomaLogic prior to analysis on single plates. Results were reported as relative fluorescence units (RFUs) for relative quantification of protein abundance.
CSF protein preparation and digestion for tandem mass tag mass spectrometry (TMT-MS) analysis
CSF undepleted of highly abundant plasma proteins
Equal volumes (50 μl of each sample) of CSF were digested with lysyl endopeptidase (LysC, Wako 125-05061) and trypsin (Thermo Fisher Scientific 90058). Briefly, each sample was reduced and alkylated with 1 μl of 0.5 M tris-2(-carboxyethyl)-phosphine (TCEP) and 5 μl of 0.4 M chloroacetamide (CAA) at 90°C for 10 min, followed by water bath sonication for 15 min. The same volume of 8 M urea buffer [56 μl, 8 M urea in 10 mM Tris, 100 mM NaH2PO4 (pH 8.5)] was added to each sample after cooling the samples to room temperature, along with LysC (2.5 μg). After overnight digestion, 336 μl of 50 mM ammonium bicarbonate (ABC) was added to each sample to dilute the urea concentration to 1 M, along with trypsin (5 μg). After 12 h, the trypsin digestion was stopped by adding final concentration of 1% formic acid (FA) and 0.1% trifluoroacetic acid (TFA).
CSF depleted of highly abundant plasma proteins
To increase the depth of proteome coverage, immunodepletion of highly abundant proteins was performed as previously described [7]. For CSF samples, 130 μl was incubated with equal volume (130 μl) of High Select Top14 Abundant Protein Depletion Resin (Thermo Fisher Scientific, A36372) at room temperature in centrifuge columns (Thermo Fisher Scientific, A89868). After 15 min of mixing with gentle rotation, the samples were centrifuged at 1000×g for 2 min. Sample flow-through was concentrated with a 3K Ultra Centrifugal Filter Device (Millipore, UFC500396) by centrifugation at 14,000×g for 30 min, and then the immunodepleted samples were diluted to equal volumes of 75 μl with phosphate-buffered saline. Immunodepleted CSF (60 μl) was then digested with LysC and trypsin. Briefly, the samples were reduced and alkylated with 1.2 μl of 0.5 M TCEP and 3 μl of 0.8 M CAA at 90°C for 10 min, followed by water bath sonication for 15 min. Samples were diluted with 193 μl of 8 M urea buffer [8 M urea in 10 mM Tris, 100 mM NaH2PO4 (pH 8.5)] to a final concentration of 6 M urea. LysC (4.5 μg) was used for overnight digestion at room temperature. Samples were then diluted to 1 M urea with 50 mM ABC. Trypsin (4.5 μg) was then added, and the samples were subsequently incubated for 12 h. The digestion was then stopped by adding final concentration of 1% FA and 0.1% TFA.
Plasma protein preparation and digestion for TMT-MS analysis
Plasma undepleted of highly abundant plasma proteins
Equal volumes (2 μl of each sample) of plasma were digested with LysC and trypsin. Briefly, each sample was diluted 10-fold with 50 mM ABC, following by reduction and alkylation with 0.4 μl of 0.5 M of TCEP and 2 μl of 0.4 M CAA with heating at 90°C for 10 min. The samples were sonicated for 15 min with water bath sonication to help sample solubilization. Then, 8 M urea buffer [22.4 μl, 8 M urea, 10 mM Tris, 100 mM NaH2PO4 (pH 8.5)] was added to each sample after cooling to room temperature, along with LysC (10 μg). After overnight digestion, 134.4 μl of 50 mM ABC was added to each sample to dilute the urea concentration to 1 M, along with trypsin (20 μg). After 12 h, the trypsin digestion was stopped by adding final concentration of 1% FA and 0.1% TFA.
Plasma depleted of highly abundant plasma proteins
The High Select Top14 Abundant Protein Depletion Resin was also utilized for plasma samples prior to digestion. Following mixing, 500 μl of resin was aliquoted into each spin column. After the resin settled to the bottom of the spin column, 8 μL of each sample was added and depletion was performed by gentle rotation for 15 min at room temperature, followed by centrifugation at 1000×g for 2 min. Sample flow-through was concentrated with a 3K Ultra Centrifugal Filter Device by centrifugation at 14,000×g for 30 min. Immunodepleted samples were diluted to equal volumes of 75 μl with phosphate-buffered saline. Immunodepleted plasma (60 μl) was then digested with LysC and trypsin using the same protocol used for CSF depleted samples.
Isobaric TMT peptide labeling
Before TMT labeling, the digested peptides were desalted using 50 mg of Sep-Pak C18 columns (Waters). Briefly, the columns were activated with 1 mL of methanol, then equilibrated with 2 × 1 mL 0.1% TFA. The acidified samples were loaded following by washing with 2 × 1 mL 0.1% TFA. Elution was performed with 1 mL 50% acetonitrile. To normalize protein quantification across batches, global internal standard (GIS) samples were generated for each sample set by combining 100 μl aliquots from each sample elution. All individual samples and GIS pooled standards were dried by speed vacuum (Labconco).
Both depleted CSF and depleted plasma samples were divided into five TMT batches, labeled using an 11-plex TMT kit (Thermo Fisher Scientific, A34808, lot number for TMT 10-plex: SI258088, 131C channel SJ258847), and derivatized as previously described [7]. For the sample and channel distribution, please see Additional file 2: Supplementary Table 1. Five milligrams of each channel reagent was dissolved in 256 μL anhydrous acetonitrile. Each peptide sample was resuspended in 50 μl of 100 mM triethylammonium bicarbonate (TEAB) buffer, and 20.5 μl of TMT reagent solution was subsequently added. After 1 h, the reaction was quenched with 4 μl of 5% hydroxylamine (Thermo Fisher Scientific, 90115) for 15 min. The peptide solutions were then combined according to the batch arrangement. Each TMT sample was desalted with 100 mg of Sep-Pak C18 columns and dried by speed vacuum. Notably, there were 9 TMT channels used for depleted CSF samples with one GIS sample on channel 127N, whereas 10 channels were used for depleted plasma samples with two GIS samples included on both 127C and 131C channels. Channel 126 was left empty on both sample sets.
For undepleted CSF and plasma samples, the TMT 16-plex kit (Thermo Fisher Scientific, A44520, lot number VH311511) was used for labeling, which divided both CSF and plasma sample sets into 3 TMT batches with 12 samples plus 1 GIS in each batch. The sample and channel distribution were the same for CSF and plasma samples (Additional file 2: Supplementary Table 1). Five milligrams of each channel reagent was dissolved in 200 μL anhydrous acetonitrile. Each CSF peptide sample was resuspended in 50 μl of 100 mM TEAB buffer, and 10 μl of TMT reagent solution was subsequently added. For plasma samples, each peptide sample was resuspended in 150 μl of 100 mM TEAB buffer, and 30 μl of TMT reagent solution was subsequently added. The labeling was stopped after 1 h with 4 μl of 5% hydroxylamine for CSF and 12 μl of 5% hydroxylamine for plasma, and the peptide solutions were then combined according to the batch arrangement. The combined TMT samples were desalted with 100 mg of Sep-Pak C18 columns except for each undepleted plasma TMT sample, which was split and desalted using 2 × 100 mg of Sep-Pak C18 columns. The elutions were dried under speed vacuum.
High-pH off-line fractionation
CSF and plasma undepleted of highly abundant plasma proteins
Dried samples were resuspended in high pH loading buffer (0.07% vol/vol NH4OH, 0.045% vol/vol FA, 2% vol/vol ACN) and loaded onto a Water’s BEH column (2.1 mm × 150 mm with 1.7 μm particles). A Vanquish UPLC system (Thermo Fisher Scientific) was used to carry out the fractionation. Solvent A consisted of 0.0175% (vol/vol) NH4OH, 0.01125% (vol/vol) FA, and 2% (vol/vol) ACN; solvent B consisted of 0.0175% (vol/vol) NH4OH, 0.01125% (vol/vol) FA, and 90% (vol/vol) ACN. The sample elution was performed over a 25-min gradient with a flow rate of 0.6 mL/min with a gradient from 0 to 50% solvent B. A total of 192 individual equal volume fractions were collected across the gradient. Fractions were concatenated to either 48 or 96 fractions and dried to completeness using vacuum centrifugation.
CSF and plasma depleted of highly abundant plasma proteins
Dried samples were resuspended in high pH loading buffer (0.07% vol/vol NH4OH, 0.045% vol/vol FA, 2% vol/vol ACN) and loaded onto an Agilent ZORBAX 300 Extend-C18 column (2.1 mm × 150 mm with 3.5 μm beads). An Agilent 1100 HPLC system was used to carry out the fractionation. Solvent A consisted of 0.0175% (vol/vol) NH4OH, 0.01125% (vol/vol) FA, and 2% (vol/vol) ACN; solvent B consisted of 0.0175% (vol/vol) NH4OH, 0.01125% (vol/vol) FA, and 90% (vol/vol) ACN. The sample elution was performed over a 60 min gradient with a flow rate of 0.4 mL/min with a gradient from 0 to 60% solvent B. A total of 96 individual equal volume fractions were collected across the gradient and subsequently pooled by concatenation into 30 fractions and dried to completeness under vacuum centrifugation.
TMT mass spectrometry
CSF undepleted of highly abundant plasma proteins
For batch 1, all samples (~1μg) were loaded and eluted using a Dionex Ultimate 3000 RSLCnano (Thermo Fisher Scientific) on an in-house packed 25 cm, 100 μm internal diameter (i.d.) capillary column with 1.9 μm Reprosil-Pur C18 beads (Dr. Maisch, Ammerbuch, Germany) over a 60 min gradient. Mass spectrometry was performed with a high-field asymmetric waveform ion mobility spectrometry (FAIMS) Pro-equipped Orbitrap Eclipse (Thermo Fisher Scientific) in positive ion mode using data-dependent acquisition with 2-s top speed cycles. Each cycle consisted of one full MS scan followed by as many MS/MS events that could fit within the given 2-s cycle time limit. MS scans were collected at a resolution of 120,000 (410–1600 m/z range, 4×10^5 AGC, 50 ms maximum ion injection time, FAIMS compensation voltage of −50 and −70). All higher energy collision-induced dissociation (HCD) MS/MS spectra were acquired at a resolution of 30,000 (0.7 m/z isolation width, 35% collision energy, 1.25×10^5 AGC target, 54 ms maximum ion time, TurboTMT on). Dynamic exclusion was set to exclude previously sequenced peaks for 30 s within a 10-ppm isolation window. For batches 2 and 3, samples were eluted over a 21-min gradient. Mass spectrometry was performed the same as batch 1 except with a FAIMS compensation voltage of −45, and dynamic exclusion set to exclude previously sequenced peaks for 6 s within a 10-ppm isolation window.
Plasma undepleted of highly abundant plasma proteins
Mass spectrometry was performed the same as for CSF undepleted batches 2 and 3 except FAIMS compensation voltage was set at −40 and −60, and dynamic exclusion was set to exclude previously sequenced peaks for 20 s within a 10-ppm isolation window.
CSF and plasma depleted of highly abundant plasma proteins
All fractions (~1μg) were loaded and eluted using an Easy-nLC 1200 (Thermo Fisher Scientific) on an in-house packed 30 cm, 750 μm i.d. capillary column with 1.9 μm Reprosil-Pur C18 beads over a 120-min gradient. Mass spectrometry was performed with a Q-Exactive HFX (Thermo Fisher Scientific) in positive ion mode using data-dependent acquisition with a top 10 method. Each cycle consisted of one full MS scan followed by 10 MS/MS events. MS scans were collected at a resolution of 120,000 (400–1600 m/z range, 3×10^6 AGC, 100 ms maximum ion injection time). All higher energy collision-induced dissociation (HCD) MS/MS spectra were acquired at a resolution of 45,000 (1.6 m/z isolation width, 35% collision energy, 1×10^5 AGC target, 86 ms maximum ion time). Dynamic exclusion was set to exclude previously sequenced peaks for 20 s within a 10-ppm isolation window.
Database searches and protein quantification
All raw files were searched using Proteome Discoverer (version 2.4.1.15, Thermo Fisher Scientific) with Sequest HT. The spectra were searched against a human UniProt database downloaded April 2015 (90,411 target sequences). We used this database for consistency with our prior brain proteomics study [2]. Search parameters included 20 ppm precursor mass window, 0.05 Da product mass window, dynamic modifications methionine (+15.995 Da), deamidated asparagine and glutamine (+0.984 Da), phosphorylated serine, threonine, and tyrosine (+79.966 Da), and static modifications for carbamidomethyl cysteines (+57.021 Da) and N-terminal and lysine-tagged TMT (+229.163 or +304.207 Da depending on the dataset). Percolator was used to filter peptide spectral matches (PSMs) to 1% FDR. Peptides were grouped using strict parsimony and only razor and unique peptides were used for protein level quantitation. Reporter ions were quantified from MS2 scans using an integration tolerance of 20 ppm with the most confident centroid setting. Only unique and razor (i.e., parsimonious) peptides were considered for quantification.
Protein abundance data processing
Tandem mass tag mass spectrometry (TMT-MS)
Only proteins that were identified and summarized as high confidence (<1% FDR) by Proteome Discoverer (PD) were used for analysis. The 3730 UniProt protein identifier accessions provided by PD were further annotated with Hugo Gene Nomenclature Committee (HGNC) official gene symbols. TMT-MS data were processed identically for both CSF and plasma, including fluid depleted of highly abundant proteins. TMT reporter intensities (abundances) that had not undergone normalization by Proteome Discoverer (PD) were used for analysis to preserve inherent protein abundance differences between control and AD subjects. Four separate datasets were used for analysis: CSF undepleted, CSF depleted, plasma undepleted, and plasma depleted of highly abundant proteins. For each dataset, batch correction was performed by dividing abundances for each protein within each batch by the global internal standard (GIS). GIS measurements were then removed, and proteins with more than 75% (n=27/36) missing values were excluded from consideration. The number of remaining protein isoforms after missing value control was 1128 in undepleted plasma, 2229 in undepleted CSF, 1385 in plasma depleted of highly abundant proteins, and 2944 in CSF depleted of highly abundant proteins.
Olink proximity extension assay (PEA) and SomaLogic SomaScan assay
Olink NPX values were analyzed using the OlinkAnalyze R package v1.2.1. NPX values that were flagged with quality control (QC) warnings were removed from further consideration. SomaScan RFU data and assay metadata for CSF and plasma were analyzed using the SomaDataIO R package v3.1.0. Olink NPX and SomaLogic SomaScan data included blank buffer replicate measurements (noise). Buffer measurements were used to calculate signal-to-noise (S:N) ratios for both unnormalized NPX or RFU abundance data. S:N ratios were calculated by subtracting the within-assay median buffer signal from the unlogged assay signal (2NPX or RFU), then dividing by the median buffer signal. Protein assay-specific limit of detection (LOD) was defined as median log2 buffer signal plus 3 standard deviations (SD) of the assay’s buffer measurements. NPX background signal SD for PEA was defined from historically recorded background variance of the assays and included as a component of predetermined LOD, whereas SomaScan background SD was calculated from available buffer replicate data for the assays performed. Sample measurements in CSF or plasma that were below LOD were retained but considered as missing values. Olink repeated measurements of the same UniProt protein assayed in different panels (N=36 duplicated assays) were reduced to their representative single best replicate of the assay in one of the panels based on criteria including the highest pairwise correlation to other replicate assays, highest signal, and largest dynamic range.
Both platform assays underwent a first-pass filter allowing up to 75% (n=27/36) missing values. Because SomaScan CSF data had low signal, a second-pass filter step was applied to remove assays that did not meet an empirically derived S:N threshold. This threshold was determined by correlating SomaScan assays with Olink and TMT-MS undepleted assays at varying S:N cutoff values (0, 0.15, 0.25, 0.3, 0.35, 0.4, 0.45, 0.50, 0.625, 0.75, 1, 2, 4, and 8), and selecting the S:N value (≥0.45) that maximized median Pearson correlation with the other platforms. After application of first- and second-pass filters and removal of control aptamers, 3594 CSF and 7284 plasma human SomaScan assays were kept for subsequent analyses. For Olink assays, after applying the first-pass filter, 902 CSF and 1140 plasma proteins were kept for subsequent analyses.
Proteome coverage overlap, ontology enrichment, and missing data analysis
Unique gene symbols measured in each platform were counted, and overlap was visualized using the venneuler R package (v1.1-0) venneuler function. Enrichment of gene ontologies (GO) in different Venn categories was calculated as Fisher’s exact test p value transformed to z score using GO-Elite (v1.2.5) and visualized using a custom in-house R script. The same procedure was used to determine ontology enrichment for network modules. Missing data (Additional file 1: Supplementary Figure 2) in Olink and SomaScan included assays flagged by QC warnings, below LOD, and truly missing measurements. Missing data in TMT-MS was considered at the level of batch, as all measurements within a batch result from the same MS/MS fragmentation.
Censoring of proteins affected by depletion of highly abundant proteins
Proteins considered for analysis were those measured by TMT-MS before and after depletion of highly abundant proteins that had the same UniProt accession, at least 9 paired abundance measurements, and at least 3 measurements per case status group (AD or control; n=1932 CSF proteins, n=852 plasma proteins). Pairwise measurements were correlated using Pearson correlation on the difference in abundance between AD versus control subjects. Proteins that were discordant in their differential abundance, as well as proteins with negative Pearson rho across depleted and undepleted matched protein measurements across the 36 case samples, were considered significantly affected by depletion. In total, 32 proteins in CSF and 27 in plasma were censored from the TMT-MS depleted data due to effects of depletion on their abundance levels (Additional file 2: Supplementary Table 3).
Protein abundance correlation analysis
Proteins measured in common across two platforms within the same biofluid were correlated across all samples using the corAndPvalue function in the WGCNA R package (v1.69) (Additional file 2: Supplementary Tables 12-17, Additional file 1: Extended Data). In the case of multiple SomaScan assays for the same protein, the assay with the identical UniProt protein accession, or secondarily, a SOMAmer measuring an identical gene product, was selected. When multiple cross-platform UniProt accession or gene symbol matches occurred, the SOMAmer with the highest correlation was selected. We constructed a population histogram of all Pearson correlations for distinct gene products or UniProt accessions (representing distinct protein isoforms) and identified the median rho for each population of paired measurements between two platforms (Fig. 3).
Cumulative signal and total protein abundance comparison
Mean NPX, RFU, or ion intensity signal without prior normalization or log transformation for each protein across the 36 samples was ranked for each platform and biofluid from highest to lowest abundance. Curves of incremental median cumulative abundance were constructed for AD and control groups (Additional file 1: Supplementary Figure 8, left uncalibrated panels). Absolute protein abundance differences in plasma were also assessed by calibrating relative platform signals to absolute plasma protein concentrations (n=4226) as provided in the Human Protein Atlas (HPA) on March 5, 2022 (https://www.proteinatlas.org/search). For the calibrated abundance calculations, unlogged abundance (signal minus buffer median, including positive values below LOD) was calibrated so that the geometric mean of control group measurements was set to the known absolute blood concentration with all individual measurements varying relative to this value. Missing values were considered as one-half the minimum assayed nonzero signal for geometric mean calculations. When multiple assays for a gene product were available within a platform, the assay with maximum mean signal was selected for calibration. HPA-calibrated values were plotted for all 4226 proteins regardless of presence in the platform as a ranked absolute abundance curve (non-cumulative, black trace in Additional file 1: Supplementary Figure 8B, D, F, and H). Then, the cumulative log10-transformed abundance of all lesser abundant ranked proteins up to each represented rank of any protein measured within the platform was plotted as the median such value for AD or control (Additional file 1: Supplementary Figure 8). In contrast to the uncalibrated cumulative abundance curves, the value at each point in these left-truncated cumulative sum curves represents the sum (cumulative) abundance of only lesser abundant proteins up to that rank, and not those ranked with higher abundance.
Differential expression analysis
Differences between AD and control were assessed on the log2(abundance) measurements over all proteins after data processing as described above, which included signal cleanup, filtering on missingness, removal of proteins affected by top-14 highly abundant protein depletion, and, in the case of SomaScan CSF data, control of excessively low S:N assays. Volcano plots were made using a custom in-house script via the plotly (v4.9.2.1) R package function ggplotly. Individual volcano points were colored by membership in the 44 brain network modules described in Johnson et al. [2].
Comparison to external datasets
External Olink datasets used for correlation included plasma AD effect sizes from a Hong Kong-based cohort provided in Jiang et al. [23] Appendix Table 1, from the BioFinder cohort described in Whelan et al. [24] Table S19, and from the Accelerating Medicines Partnership for Parkinson’s Disease (AMP-PD) 2021 v2-5 (May 10) release. The external SomaScan dataset was obtained from the ANMerge version of the AddNeuroMed study data as described in Birkenbihl et al. [25]. Only sample data collected at the last visit in the AMP-PD and ANMerge datasets were used for correlations.
Data provided in Jiang et al. and Whelan et al. was used directly for correlation without additional processing. AMP-PD Olink raw data from 212 study participants and all four 384-assay panels available (Cardiometabolic, Inflammation, Neurology, and Oncology) was loaded and processed in the same fashion as Emory Olink data. Four participants with a diagnosis of multiple systems atrophy were excluded. Final number of subjects analyzed was 118 PD and 90 control. As described above, values below LOD were censored as missing but otherwise retained for correlations, and proteins duplicated in the different panels were reduced to one representative assay. Final assay numbers included 1054 CSF and 1398 plasma assays. Log2 fold change values that remained significant after FDR correction were correlated with log2 fold change values in the current study using Pearson correlation and Student’s p values as implemented in the WGCNA package verboseScatterplot function.
Harmonization of platform protein abundance prior to network analysis
TMT-MS ion counts in fluid depleted of highly abundant proteins, SomaScan RFUs, and Olink unlogged NPX values, totaling 9589 assays in plasma and 7158 assays in CSF, were assembled for the 35 case samples commonly measured on all three platforms. Only truly missing values were considered as unavailable; values below LOD or those subject to S:N threshold-based filtering were retained. Data were transposed prior to removal of platform-specific effects as a batch effect using the TAMPOR algorithm. TAMPOR has been described previously for removal of nuisance batch effects in proteomic data and harmonization of different cohorts [2, 3]. As applied to TMT data, TAMPOR removes batch effects from TMT reporter abundance by using a ratio of TMT reporter signal over the GIS signal within batch. Because the GIS represents the all-sample average, the ratios tend to be near unity, and log2(abundance/GIS) tends to be zero. TAMPOR further enforces the central tendency towards zero for both proteins and sample medians by performing a median polish of the log2 ratios [47]. Output from this step is centered at a log2(abundance) of zero for both proteins and samples (rows and columns). TAMPOR retains the relative protein abundances as row-wise medians, which can be inserted back into the data after completion of TAMPOR to restore the data to the original form as that used as input for median polish. This alternative version of the data (“clean relative abundance”) is all positive, in the same form as TMT reporter intensity or abundance.
To harmonize measurements across different proteomic platforms, we employed an adaptation of the TAMPOR algorithm (termed “transposed” TAMPOR) applied to relative abundance. SomaScan RFU, Olink 2^NPX, and TMT reporter abundance ratios were harmonized by considering common proteins across the different platforms as anchoring measurements. The central tendencies (intra-platform) of these common protein measurements are considered as robust as GIS measures within platform and are then normalized across platforms with the same median polish algorithm. The algorithm is therefore applied to samples as though they were proteins, and the proteins as though they were samples, in contrast to the standard TAMPOR algorithm described for batch correction in TMT data.
Common proteins measured across all three platforms were used as the GIS (n=101 in plasma and n=201 in CSF) to calculate the central tendency of data within and across platforms used for the denominators in the TAMPOR algorithm, as previously described [2]. Normalized data used in subsequent network analyses was of the form log2(abundance/central tendency) of the common proteins in all platforms. No missing values were present in the protein abundances used for network construction.
Protein co-expression network analysis
Networks for CSF and plasma were constructed using the harmonized protein abundances for each biofluid. The Weighted Correlation Network Analysis (WGCNA) algorithm (v1.69) was used for network generation. No outliers were detected using the WGCNA sample network connectivity outlier algorithm. The WGCNA blockwiseModules function was run on the CSF and plasma harmonized abundances with the following parameters: power=6.5 (CSF) or 11 (plasma), deepSplit=4, minModuleSize=10, mergeCutHeight=0.07, TOMdenom=“mean”, bicor correlation, signed network type, PAM staging and PAM respects dendro as TRUE, and a maxBlockSize larger than the total number of protein assays. Module memberships were then iteratively reassigned to enforce kME table consistency, as previously described [3]. The resulting network assignments were visualized as modules using the iGraph(v1.2.5) package. Module eigenprotein correlations and significance were visualized in circular heatmaps using the circlize (v0.4.10), dendextend (v1.13.4), and dendsort (0.3.3) R packages. Synthetic eigenproteins for each network (CSF, plasma, and brain [2]) were calculated as previously described [3]. For synthetic eigenproteins translated either from or to the brain, the existing data for 8619 proteins underlying the brain network were mapped to labels in the biofluid network using a mapping rubric to cross-reference protein labels. Specifically [1], an exact Uniprot ID match to that in labels of the form Symbol|UniprotID|platform|biofluid took precedence for labels with MS as the platform, followed by [2] symbol matches with MS as the platform. This was followed by [3] an exact Uniprot ID match to an Olink row in a biofluid dataset, and then [4] an exact Uniprot ID to a SomaScan row, followed by [5] a symbol match with Olink as the platform, and finally [6], a symbol match with SomaScan as the platform. In this way, unmatched proteins across pairs of networks were minimized. The same 6-point rubric was used for matching (relabeling) brain network member labels before performing module preservation (below).
Network module overlap
Over-representation analysis (ORA) of module gene symbols between networks was determined using two-tailed Fisher’s exact test, followed by correction of p values for multiple testing using the Benjamini-Hochberg method. The plasma network was compared to a SomaScan human serum network constructed from 4137 proteins as described in Emilsson et al. [32] Serum network assignments in 27 modules plus grey were curated from Table S7 in Emilsson et al. [32]. Overlap of module gene symbols between the two networks was determined as described above. Overlap was visualized using a custom in-house script.
Network preservation
Pairwise, directional preservation between CSF and plasma, plasma and CSF, and brain to each of the biofluid networks and vice versa was performed using the WGCNA (v1.69) modulePreservation function with 500 permutations after harmonizing protein assay labels as described above. Zsummary composite z score for 8 underlying network parameters was calculated and visualized by circular heatmap as significance (minus log10(Benjamini-Hochberg adjusted p values), corresponding to the Zsummary scores obtained.
Cell type marker enrichment analyses
Cell type-specific enriched marker gene symbol lists were used as previously published to perform Fisher’s exact one-tailed test for enrichment [2]. Benjamini-Hochberg correction was applied to all resulting p values.
Other statistics
All statistical analyses were performed in R (v4.0.2). Boxplots represent the median, 25th, and 75th percentile extremes; thus, hinges of a box represent the interquartile range of the two middle quartiles of data within a group. The farthest data points up to 1.5 times the interquartile range away from box hinges define the extent of whiskers (error bars). Correlations were performed using the biweight midcorrelation function as implemented in the WGCNA R package or Pearson correlation. Comparisons between two groups were performed by a two-sided t test. Comparisons among three or more groups were performed with Kruskal-Wallis nonparametric ANOVA or standard ANOVA with Tukey or post hoc pairwise comparison of significance. P values were adjusted for multiple comparisons by false discovery rate (FDR) correction according to the Benjamini-Hochberg method where indicated. Z score conversion of normalized protein data and normalized protein eigenproteins or synthetic eigenproteins were calculated as fold of standard deviation from the mean. At n=36, we were powered to detect a correlation rho of 0.45 at 80% power and p=0.05. For pairwise comparisons between AD and control groups considering all proteins measured, we were powered to detect a fold change of ~1.2 to ~1.3 depending on platform and fluid based on the power calculation method described by Bi and Liu [48].
Supplementary Information
Acknowledgements
We are grateful to those who agreed to donate their CSF and blood for this study. The authors would like to thank SomaLogic and Olink research customer support teams for their consultations on data analysis.
Authors’ contributions
ECBJ, EBD, LP, DMD, and JJL designed the experiments; LP and DMD carried out experiments; EBD and ECBJ analyzed data; LP, DMD, ESM, NTS, JJL, and AIL provided advice on the interpretation of data; ECBJ wrote the manuscript with input from coauthors. All author(s) read and approved the final manuscript.
Funding
This study was supported by the following National Institutes of Health funding mechanisms: K08AG068604, P30AG066511, and U54AG065187.
Availability of data and materials
Raw data, case traits, and analyses related to this manuscript are available at https://www.synapse.org/3platformEmory. Code available in the research compendium for the current study is available from https://www.synapse.org/3platformEmory. The algorithm used for batch correction is fully documented and available as an R function, which can be downloaded from https://github.com/edammer/TAMPOR. Data used in the preparation of this article were obtained from the Accelerating Medicine Partnership® (AMP®) Parkinson’s Disease (AMP-PD) Knowledge Platform. For up-to-date information on the study, visit https://www.amp-pd.org. The AMP® PD program is a public-private partnership managed by the Foundation for the National Institutes of Health and funded by the National Institute of Neurological Disorders and Stroke (NINDS) in partnership with the Aligning Science Across Parkinson’s (ASAP) initiative; Celgene Corporation, a subsidiary of Bristol Myers Squibb Company; GlaxoSmithKline plc (GSK); The Michael J. Fox Foundation for Parkinson's Research; Pfizer Inc.; Sanofi US Services Inc.; and Verily Life Sciences. AMP-PD clinical data and biosamples used in preparation of this article were obtained from the Parkinson’s Progression Markers Initiative (PPMI) and the NINDS Parkinson’s Disease Biomarkers Program (PDBP). PPMI is sponsored by The Michael J. Fox Foundation for Parkinson’s Research and supported by a consortium of scientific partners: 4D Pharma, AbbVie Inc., AcureX Therapeutics, Allergan, Amathus Therapeutics, Aligning Science Across Parkinson’s (ASAP), Avid Radiopharmaceuticals, Bial Biotech, Biogen, BioLegend, Bristol Myers Squibb, Calico Life Sciences LLC, Celgene Corporation, DaCapo Brainscience, Denali Therapeutics, The Edmond J. Safra Foundation, Eli Lilly and Company, GE Healthcare, GlaxoSmithKline, Golub Capital, Handl Therapeutics, Insitro, Janssen Pharmaceuticals, Lundbeck, Merck & Co., Inc., Meso Scale Diagnostics, LLC, Neurocrine Biosciences, Pfizer Inc., Piramal Imaging, Prevail Therapeutics, F. Hoffmann-La Roche Ltd and its affiliated company Genentech Inc., Sanofi Genzyme, Servier, Takeda Pharmaceutical Company, Teva Neuroscience, Inc., UCB, Vanqua Bio, Verily Life Sciences, Voyager Therapeutics, Inc., and Yumanity Therapeutics, Inc. The PPMI investigators have not participated in reviewing the data analysis or content of the manuscript. For up-to-date information on the study, visit www.ppmi-info.org. The PDBP consortium is supported by the National Institute of Neurological Disorders and Stroke (NINDS) at the National Institutes of Health. A full list of PDBP investigators can be found at https://pdbp.ninds.nih.gov/policy. The PDBP investigators have not participated in reviewing the data analysis or content of the manuscript.
ACCELERATING MEDICINES PARTNERSHIP and AMP are registered service marks of the U.S. Department of Health and Human Services. Absolute quantitative plasma protein data was obtained from the Human Protein Atlas at proteinatlas.org.
Declarations
Ethics approval and consent to participate
All Emory research participants provided informed consent for this study under protocols approved by the Institutional Review Board at Emory University, in accordance with the Declaration of Helsinki.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Eric B. Dammer, Email: edammer@emory.edu
Lingyan Ping, Email: lingyan.ping@emory.edu.
Duc M. Duong, Email: dduong@emory.edu
Erica S. Modeste, Email: erica.shante.modeste@emory.edu
Nicholas T. Seyfried, Email: nseyfri@emory.edu
James J. Lah, Email: jlah@emory.edu
Allan I. Levey, Email: alevey@emory.edu
Erik C. B. Johnson, Email: erik.c.b.johnson@emory.edu
References
- 1.Beckmann ND, et al. Multiscale causal networks identify VGF as a key regulator of Alzheimer's disease. Nat Commun. 2020;11(1):3942. 10.1038/s41467-020-17405-z. [DOI] [PMC free article] [PubMed]
- 2.Johnson ECB, et al. Large-scale deep multi-layer analysis of Alzheimer's disease brain reveals strong proteomic disease-related changes not observed at the RNA level. Nat Neurosci. 2022;25(2):213–25. 10.1038/s41593-021-00999-y. [DOI] [PMC free article] [PubMed]
- 3.Johnson ECB, et al. Large-scale proteomic analysis of Alzheimer's disease brain and cerebrospinal fluid reveals early changes in energy metabolism associated with microglia and astrocyte activation. Nat Med. 2020;26(5):769–80. 10.1038/s41591-020-0815-6. [DOI] [PMC free article] [PubMed]
- 4.Mostafavi S, et al. A molecular network of the aging human brain provides insights into the pathology and cognitive decline of Alzheimer's disease. Nat Neurosci. 2018;21(6):811–9. 10.1038/s41593-018-0154-9. [DOI] [PMC free article] [PubMed]
- 5.Wan YW, et al. Meta-analysis of the Alzheimer's disease human brain transcriptome and functional dissection in mouse models. Cell Rep. 2020;32(2):107908. 10.1016/j.celrep.2020.107908. [DOI] [PMC free article] [PubMed]
- 6.Bader JM, et al. Proteome profiling in cerebrospinal fluid reveals novel biomarkers of Alzheimer's disease. Mol Syst Biol. 2020;16(6):e9356. 10.15252/msb.20199356. [DOI] [PMC free article] [PubMed]
- 7.Higginbotham L, et al. Integrated proteomics reveals brain-based cerebrospinal fluid biomarkers in asymptomatic and symptomatic Alzheimer's disease. Sci Adv. 2020;6(43). 10.1126/sciadv.aaz9360. [DOI] [PMC free article] [PubMed]
- 8.Dayon L, et al. Proteomes of paired human cerebrospinal fluid and plasma: relation to blood-brain barrier permeability in older adults. J Proteome Res. 2019;18(3):1162–74. 10.1021/acs.jproteome.8b00809. [DOI] [PubMed]
- 9.Li KW, Gonzalez-Lozano MA, Koopmans F, Smit AB. Recent developments in data independent acquisition (DIA) mass spectrometry: application of quantitative analysis of the brain proteome. Front Mol Neurosci. 2020;13:564446. 10.3389/fnmol.2020.564446. [DOI] [PMC free article] [PubMed]
- 10.Brenes A, et al. Multibatch TMT reveals false positives, batch effects and missing values. Mol Cell Proteomics. 2019;18(10):1967–80. 10.1074/mcp.RA119.001472. [DOI] [PMC free article] [PubMed]
- 11.Johnson ECB, et al. Deep proteomic network analysis of Alzheimer's disease brain reveals alterations in RNA binding proteins and RNA splicing associated with disease. Mol Neurodegener. 2018;13(1):52. 10.1186/s13024-018-0282-4. [DOI] [PMC free article] [PubMed]
- 12.Geyer PE, et al. Revisiting biomarker discovery by plasma proteomics. Mol Syst Biol. 2017;13(9):942. 10.15252/msb.20156297. [DOI] [PMC free article] [PubMed]
- 13.Gold L, et al. Aptamer-based multiplexed proteomic technology for biomarker discovery. PLoS One. 2010;5(12):e15004. 10.1371/journal.pone.0015004. [DOI] [PMC free article] [PubMed]
- 14.Tin A, et al. Reproducibility and variability of protein analytes measured using a multiplexed modified aptamer assay. J Appl Lab Med. 2019;4(1):30–9. 10.1373/jalm.2018.027086. [DOI] [PMC free article] [PubMed]
- 15.Walker KA, et al. Large-scale plasma proteomic analysis identifies proteins and pathways associated with dementia risk. Nature Aging. 2021;1(5):473–89. 10.1038/s43587-021-00064-0. [DOI] [PMC free article] [PubMed]
- 16.Assarsson E, et al. Homogenous 96-plex PEA immunoassay exhibiting high sensitivity, specificity, and excellent scalability. PLoS One. 2014;9(4):e95192. 10.1371/journal.pone.0095192. [DOI] [PMC free article] [PubMed]
- 17.Lundberg M, Eriksson A, Tran B, Assarsson E, Fredriksson S. Homogeneous antibody-based proximity extension assays provide sensitive and specific detection of low-abundant proteins in human blood. Nucleic Acids Res. 2011;39(15):e102. 10.1093/nar/gkr424. [DOI] [PMC free article] [PubMed]
- 18.Pietzner M, et al. Synergistic insights into human health from aptamer- and antibody-based proteomic profiling. Nat Commun. 2021;12(1):6822. 10.1038/s41467-021-27164-0. [DOI] [PMC free article] [PubMed]
- 19.Raffield LM, et al. Comparison of proteomic assessment methods in multiple cohort studies. Proteomics. 2020;20(12):e1900278. 10.1002/pmic.201900278. [DOI] [PMC free article] [PubMed]
- 20.Katz DH, et al. Whole genome sequence analysis of the plasma proteome in black adults provides novel insights into cardiovascular disease. Circulation. 2022;145(5):357–70. 10.1161/CIRCULATIONAHA.121.055117. [DOI] [PMC free article] [PubMed]
- 21.Finkernagel F, et al. Dual-platform affinity proteomics identifies links between the recurrence of ovarian carcinoma and proteins released into the tumor microenvironment. Theranostics. 2019;9(22):6601–17. 10.7150/thno.37549. [DOI] [PMC free article] [PubMed]
- 22.Graumann J, et al. Multi-platform affinity proteomics identify proteins linked to metastasis and immune suppression in ovarian cancer plasma. Front Oncol. 2019;9:1150. 10.3389/fonc.2019.01150. [DOI] [PMC free article] [PubMed]
- 23.Jiang Y, et al. Large-scale plasma proteomic profiling identifies a high-performance biomarker panel for Alzheimer's disease screening and staging. Alzheimers Dement. 2021;18(1):88-102. 10.1002/alz.12369. [DOI] [PMC free article] [PubMed]
- 24.Whelan CD, et al. Multiplex proteomics identifies novel CSF and plasma biomarkers of early Alzheimer's disease. Acta Neuropathol Commun. 2019;7(1):169. 10.1186/s40478-019-0795-2. [DOI] [PMC free article] [PubMed]
- 25.Birkenbihl C, et al. ANMerge: a comprehensive and accessible Alzheimer's disease patient-level dataset. J Alzheimers Dis. 2021;79(1):423–31. 10.3233/JAD-200948. [DOI] [PMC free article] [PubMed]
- 26.Weiner S, et al. Optimized sample preparation and data analysis for TMT proteomic analysis of cerebrospinal fluid applied to the identification of Alzheimer's disease biomarkers. Clin Proteomics. 2022;19(1):13. 10.1186/s12014-022-09354-0. [DOI] [PMC free article] [PubMed]
- 27.Andreasen N, et al. Cerebrospinal fluid levels of total-tau, phospho-tau and a beta 42 predicts development of Alzheimer's disease in patients with mild cognitive impairment. Acta Neurol Scand Suppl. 2003;179:47–51. 10.1034/j.1600-0404.107.s179.9.x. [DOI] [PubMed]
- 28.Uhlen M, et al. The human secretome. Sci Signal. 2019;12(609). 10.1126/scisignal.aaz0274. [DOI] [PubMed]
- 29.Lehallier B, et al. Undulating changes in human plasma proteome profiles across the lifespan. Nat Med. 2019;25(12):1843–50. 10.1038/s41591-019-0673-2. [DOI] [PMC free article] [PubMed]
- 30.Tanaka T, et al. Plasma proteomic biomarker signature of age predicts health and life span. Elife. 2020;9. 10.7554/eLife.61073. [DOI] [PMC free article] [PubMed]
- 31.Benkert P, et al. Serum neurofilament light chain for individual prognostication of disease activity in people with multiple sclerosis: a retrospective modelling and validation study. Lancet Neurol. 2022;21(3):246–57. 10.1016/S1474-4422(22)00009-6. [DOI] [PubMed]
- 32.Emilsson V, et al. Co-regulatory networks of human serum proteins link genetics to disease. Science. 2018;361(6404):769–73. 10.1126/science.aaq1327. [DOI] [PMC free article] [PubMed]
- 33.Sun BB, et al. Genomic atlas of the human plasma proteome. Nature. 2018;558(7708):73–9. 10.1038/s41586-018-0175-2. [DOI] [PMC free article] [PubMed]
- 34.Wang H, et al. Integrated analysis of ultra-deep proteomes in cortex, cerebrospinal fluid and serum reveals a mitochondrial signature in Alzheimer's disease. Mol Neurodegener. 2020;15(1):43. 10.1186/s13024-020-00384-6. [DOI] [PMC free article] [PubMed]
- 35.Stein KC, et al. Ageing exacerbates ribosome pausing to disrupt cotranslational proteostasis. Nature. 2022;601(7894):637–42. 10.1038/s41586-021-04295-4. [DOI] [PMC free article] [PubMed]
- 36.Sweeney MD, Sagare AP, Zlokovic BV. Blood-brain barrier breakdown in Alzheimer disease and other neurodegenerative disorders. Nat Rev Neurol. 2018;14(3):133–50. 10.1038/nrneurol.2017.188. [DOI] [PMC free article] [PubMed]
- 37.Dayon L, et al. Alzheimer disease pathology and the cerebrospinal fluid proteome. Alzheimers Res Ther. 2018;10(1):66. 10.1186/s13195-018-0397-4. [DOI] [PMC free article] [PubMed]
- 38.Bai B, et al. Deep multilayer brain proteomics identifies molecular networks in Alzheimer's disease progression. Neuron. 2020;105(6):975–991 e7. 10.1016/j.neuron.2019.12.015. [DOI] [PMC free article] [PubMed]
- 39.Nelson PT, et al. Correlation of Alzheimer disease neuropathologic changes with cognitive status: a review of the literature. J Neuropathol Exp Neurol. 2012;71(5):362–81. 10.1097/NEN.0b013e31825018f7. [DOI] [PMC free article] [PubMed]
- 40.Chen M, Xia W. Proteomic profiling of plasma and brain tissue from Alzheimer's disease patients reveals candidate network of plasma biomarkers. J Alzheimers Dis. 2020;76(1):349–68. 10.3233/JAD-200110. [DOI] [PMC free article] [PubMed]
- 41.Mawuenyega KG, et al. Decreased clearance of CNS beta-amyloid in Alzheimer's disease. Science. 2010;330(6012):1774. 10.1126/science.1197623. [DOI] [PMC free article] [PubMed]
- 42.De Miguel Z, et al. Exercise plasma boosts memory and dampens brain inflammation via clusterin. Nature. 2021;600(7889):494–9. 10.1038/s41586-021-04183-x. [DOI] [PMC free article] [PubMed]
- 43.Oldham MC. Transcriptomics: from differential expression to coexpression. In: Coppola G, editor. The OMICs: applications in neuroscience; 2014. p. 85–113.
- 44.Olsson A, et al. Simultaneous measurement of beta-amyloid(1-42), total tau, and phosphorylated tau (Thr181) in cerebrospinal fluid by the xMAP technology. Clin Chem. 2005;51(2):336–45. 10.1373/clinchem.2004.039347. [DOI] [PubMed]
- 45.Hulstaert F, et al. Improved discrimination of AD patients using beta-amyloid(1-42) and tau levels in CSF. Neurology. 1999;52(8):1555–62. 10.1212/wnl.52.8.1555. [DOI] [PubMed]
- 46.Shaw LM, et al. Cerebrospinal fluid biomarker signature in Alzheimer's disease neuroimaging initiative subjects. Ann Neurol. 2009;65(4):403–13. 10.1002/ana.21610. [DOI] [PMC free article] [PubMed]
- 47.Tukey JW. Exploratory data analysis. 1977. [Google Scholar]
- 48.Bi R, Liu P. Sample size calculation while controlling false discovery rate for differential expression analysis with RNA-sequencing experiments. BMC Bioinformatics. 2016;17:146. 10.1186/s12859-016-0994-9. [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Raw data, case traits, and analyses related to this manuscript are available at https://www.synapse.org/3platformEmory. Code available in the research compendium for the current study is available from https://www.synapse.org/3platformEmory. The algorithm used for batch correction is fully documented and available as an R function, which can be downloaded from https://github.com/edammer/TAMPOR. Data used in the preparation of this article were obtained from the Accelerating Medicine Partnership® (AMP®) Parkinson’s Disease (AMP-PD) Knowledge Platform. For up-to-date information on the study, visit https://www.amp-pd.org. The AMP® PD program is a public-private partnership managed by the Foundation for the National Institutes of Health and funded by the National Institute of Neurological Disorders and Stroke (NINDS) in partnership with the Aligning Science Across Parkinson’s (ASAP) initiative; Celgene Corporation, a subsidiary of Bristol Myers Squibb Company; GlaxoSmithKline plc (GSK); The Michael J. Fox Foundation for Parkinson's Research; Pfizer Inc.; Sanofi US Services Inc.; and Verily Life Sciences. AMP-PD clinical data and biosamples used in preparation of this article were obtained from the Parkinson’s Progression Markers Initiative (PPMI) and the NINDS Parkinson’s Disease Biomarkers Program (PDBP). PPMI is sponsored by The Michael J. Fox Foundation for Parkinson’s Research and supported by a consortium of scientific partners: 4D Pharma, AbbVie Inc., AcureX Therapeutics, Allergan, Amathus Therapeutics, Aligning Science Across Parkinson’s (ASAP), Avid Radiopharmaceuticals, Bial Biotech, Biogen, BioLegend, Bristol Myers Squibb, Calico Life Sciences LLC, Celgene Corporation, DaCapo Brainscience, Denali Therapeutics, The Edmond J. Safra Foundation, Eli Lilly and Company, GE Healthcare, GlaxoSmithKline, Golub Capital, Handl Therapeutics, Insitro, Janssen Pharmaceuticals, Lundbeck, Merck & Co., Inc., Meso Scale Diagnostics, LLC, Neurocrine Biosciences, Pfizer Inc., Piramal Imaging, Prevail Therapeutics, F. Hoffmann-La Roche Ltd and its affiliated company Genentech Inc., Sanofi Genzyme, Servier, Takeda Pharmaceutical Company, Teva Neuroscience, Inc., UCB, Vanqua Bio, Verily Life Sciences, Voyager Therapeutics, Inc., and Yumanity Therapeutics, Inc. The PPMI investigators have not participated in reviewing the data analysis or content of the manuscript. For up-to-date information on the study, visit www.ppmi-info.org. The PDBP consortium is supported by the National Institute of Neurological Disorders and Stroke (NINDS) at the National Institutes of Health. A full list of PDBP investigators can be found at https://pdbp.ninds.nih.gov/policy. The PDBP investigators have not participated in reviewing the data analysis or content of the manuscript.
ACCELERATING MEDICINES PARTNERSHIP and AMP are registered service marks of the U.S. Department of Health and Human Services. Absolute quantitative plasma protein data was obtained from the Human Protein Atlas at proteinatlas.org.