Abstract
The blood proteome contains biomarkers of ageing and age-associated diseases, but such markers are rarely validated longitudinally. Here we map the longitudinal proteome in 7,565 serum samples from a cohort of 3,796 middle-aged and elderly adults across three time points over a 9-year follow-up period. We pinpoint 86 ageing-related proteins that exhibit signatures associated with 32 clinical traits and the incidence of 14 major ageing-related chronic diseases. Leveraging a machine-learning model, we pick 22 of these proteins to generate a proteomic healthy ageing score (PHAS), capable of predicting the incidence of cardiometabolic diseases. We further identify the gut microbiota as a modifiable factor influencing the PHAS. Our data constitute a valuable resource and offer useful insights into the roles of serum proteins in ageing and age-associated cardiometabolic diseases, providing potential targets for intervention with therapeutics to promote healthy ageing.
Subject terms: Biomarkers, Ageing, Metabolism, Proteomic analysis
Tang, Yue, Xu and colleagues map the proteome of several thousand individuals over a 9-year period to identify potential biomarkers of ageing and of age-associated diseases.
Main
Ageing is a ubiquitous process characterized by progressive degeneration at molecular, cellular and organic levels, leading to functional decline and heightened susceptibility to diseases. Although physiological measurements, functional tests and sophisticated omics approaches have been used to predict chronological age1,2, comprehending the intricate biology of ageing remains challenging. In addition to the identification of reliable ageing biomarkers, gaining deeper insights into their involvement in ageing-related pathologies would contribute substantially to the advancement of clinical interventions aimed at extending a healthy lifespan.
Protein homeostasis has a critical role in supporting a naturally long lifespan, and loss of proteostasis has been recognized as a key hallmark of ageing3,4. Perturbations in proteome-wide homeostasis across diverse tissues have been closely associated with accelerated ageing5. Blood, a reservoir of proteins from cells, extracellular fluids, tissues and organs, represents an ideal platform for identifying proteomic biomarkers indicative of various health conditions6–10. Previous cross-sectional studies using mass spectrometry (MS)-based, antibody-based (Olink) or modified aptamer-based (SomaScan) proteomics technologies have identified hundreds of blood proteins that are correlated with age11–14. Furthermore, several longitudinal studies have identified personal proteomic and other omics markers15 as well as lipid–cytokine networks16 associated with ageing within small cohorts over time frames of 2–6 years. However, these findings could potentially be limited by small sample sizes and short follow-up durations. Notably, a recent large-scale study across five independent cohorts has developed cross-sectional plasma proteomic signatures for organ ageing that are capable of forecasting long-term risks of heart failure, Alzheimer’s disease and mortality17. Nevertheless, there remains a significant scarcity of longitudinal evidence for ageing-related proteomic biomarkers and their associations with various health outcomes.
Here, we perform a longitudinal proteomic profiling of 7,565 serum samples from a cohort of 3,796 middle-aged and elderly participants over a 9-year follow-up period. We capture the trajectories of serum proteome changes during follow-up visits and identify 86 ageing-related proteins based on longitudinal proteomic data. To gain insights into the biological significance and the clinical implications of these ageing-related proteins, we explore the biological pathways and functional networks represented by these proteins and further investigate the longitudinal association of ageing-related proteins with 32 clinical measures as well as 14 ageing-related chronic diseases. Using a machine-learning algorithm, we identify 22 out of the 86 ageing-related proteins to discriminate the healthy status of the participants. We then establish a PHAS based on the 22 proteins that can predict the incidence of cardiometabolic diseases during the ageing process. We further reveal the potential intrinsic, lifestyle, dietary and multi-omics determinants of the PHAS, highlighting the important role of gut microbiota in the modulation of PHAS. The integration of longitudinal serum proteome and clinical data reveals potential new interventional or therapeutic targets for treating ageing-related cardiometabolic diseases and for monitoring the healthy ageing status among the general population. To maximize the resource value of our study, we share our results through an open-access web server available at https://omics.lab.westlake.edu.cn/resource/aging.html.
Results
Serum proteome profiling of longitudinal cohorts
The present study used data from the Guangzhou Nutrition and Health Study (GNHS), which included 3,796 individuals with 7,565 serum samples across three time points over a 9-year follow-up period. We separated these 3,796 participants into two subcohorts: the discovery cohort, which comprised 4,637 serum samples from 1,939 participants of a multi-omics substudy within GNHS, and the validation cohort, which included 2,928 serum samples from the remaining 1,857 participants (Fig. 1 and Extended Data Fig. 1a). Additionally, we included an external validation cohort of 124 participants, with 200 serum samples collected at two cohort visits during a 4-year follow-up period (Fig. 1 and Extended Data Fig. 1b). Baseline characteristics of participants were similar among the included cohorts, except that participants in the external validation cohort were over one decade older (median age, 70 years) than participants in either the GNHS discovery cohort (median age, 57.6 years) or the GNHS validation cohort (median age, 57.1 years) (Supplementary Table 1).
Fig. 1. Overview of longitudinal cohorts and analysis workflow.
The present study included 3,796 individuals from the GNHS, with 7,565 serum samples collected across three time points over a 9-year follow-up period. We separated the participants into the GNHS discovery cohort, which included 4,637 serum samples of 1,939 participants from a multi-omics subcohort within GNHS, and the GNHS validation cohort, which included 2,928 serum samples from the remaining 1,857 participants. We performed analyses by integrating lifestyle, clinical and multi-omics data into the GNHS. We also recruited 124 participants with 200 serum samples collected at two visits during a 4-year follow-up period, which was set as the external validation cohort.
Extended Data Fig. 1. Serum samples and age distribution of study participants.
a, 4637 serum samples of 1939 participants in the GNHS discovery cohort and 2928 serum samples of 1857 participants in the GNHS validation cohort, and participants’ age distribution over 9 years’ follow-up. b, 200 serum samples of 124 participants and their age distribution over 4 years’ follow-up in the external validation cohort.
We used MS-based proteomics technology for the measurement of the serum proteome of study participants. Using DIA-NN software (v.1.8) and a spectral library containing 5,102 peptides and 819 unique human proteins reviewed by Swiss-Prot, we quantified a total of 438 proteins in the GNHS discovery cohort, 413 proteins in the GNHS validation cohort and 432 proteins in the external validation cohort (see ‘Serum proteome profiling’ in the Methods and Supplementary Table 2).
Longitudinal trajectories of serum proteome during follow-up
To track the global proteomic trajectories during the ageing process, we analysed data from 1,018 participants in the GNHS discovery cohort with serum proteome measurements across three time points (Fig. 2a). k-means clustering was used to classify 438 serum proteins based on their z-scored mean levels across the three time points (Fig. 2b,c). We identified four distinct trajectory clusters: cluster 1 with 32 proteins exhibiting a sharp increase, cluster 2 with 124 proteins showing a slight increase, cluster 3 with 179 proteins remaining relatively constant and cluster 4 with 103 proteins exhibiting a decline over time (Fig. 2d and Supplementary Table 3). We used a linear mixed model to evaluate the statistical significance of the trends of protein trajectories and found that the levels of 7, 37, 34 and 62 proteins from each of the four clusters were significantly changed during the follow-up period (false discovery rate (FDR) < 0.05) (Fig. 2e and Supplementary Table 3).
Fig. 2. Longitudinal trajectories of serum proteome during follow-up in the middle-aged and elderly participants.
a, A total of 1,018 participants from the GNHS discovery cohort with serum proteome data at all three cohort visits were included in this analysis. b, Heatmap of the mean z-scored levels of the 438 serum proteins among the 1,018 participants across three cohort visits. c, k-means clustering identified four clusters of the changes in 438 serum proteins across three time points among the 1,018 participants. The optimal number of clusters was determined using the elbow method by calculating the sum of squared errors. d, Line plots of the mean z-scored levels of proteins at three time points within each of the four identified clusters. e, Volcano plot for the trend of changes in 438 serum proteins during follow-up. Proteins with significant trends are highlighted in different colours: red for cluster 1, yellow for cluster 2, purple for cluster 3 and blue for cluster 4. Note that fibronectin (FINC) (β = 0.455, Q = 5.34 × 10−73) is not displayed owing to its extremely high Q value. Detailed results are presented in Supplementary Table 3. The horizontal dashed line represents the cutoff Q value of 0.05. Dots (proteins) positioned above this line are considered significant, while those below are not significant. f, Top five enriched GO terms or pathways based on Q values for proteins from each of the four clusters. Functional enrichment analysis was conducted using GO, Reactome, KEGG and WikiPathways databases. All enriched GO terms and pathways are listed in Supplementary Table 4.
We then explored whether the protein trajectories in the four clusters were consistent across different subgroups of participants stratified by sex and baseline age (>60 or ≤60 years). We set the age of 60 as the cutoff point, as a previous study has shown that changes in blood proteins often peak at this threshold14. The protein trajectories in clusters 1 and 2 remained consistent across all subgroups. However, slight variations were observed in the trajectories of proteins within clusters 3 and 4, particularly among male participants and those over 60 years of age (Extended Data Fig. 2).
Extended Data Fig. 2. Protein trajectories in subgroups of participants stratifies by sex and baseline age.
Line plots of the mean z-scored levels of proteins across three follow-up time points within each of the four clusters by subgroups: a, females, b, males, c, participants with baseline age ≤ 60 years, and d, participants with baseline age > 60 years.
To elucidate the biological implications of the four distinct protein trajectories, we performed functional enrichment analyses using the g:Profile toolkit18. This analysis revealed a total of 10, 152, 451 and 558 significantly enriched Gene Ontology (GO) terms and pathways (FDR < 0.05) for proteins from clusters 1 to 4, respectively (Fig. 2f and Supplementary Table 4). The diverse enriched GO terms and pathways illustrate the biological trends associated with each protein cluster. For example, the prominent enriched GO terms in cluster 1, such as the actin filament bundle, contractile actin filament bundle and actomyosin, could potentially indicate an imbalance in muscle protein synthesis and breakdown during age-related loss of skeletal muscle mass and function19. Moreover, one of the top enriched GO terms and pathways for proteins in cluster 4 that exhibited a decrease during follow-up was the blood microparticle, which was consistent with previous findings14.
Identifying ageing-related proteins using longitudinal data
To identify ageing-related proteomic biomarkers, we examined the correlation between serially measured levels of serum proteins and the corresponding chronological ages of the participants over follow-up, by using linear mixed models adjusted for sex, measurement batches and instruments. In the GNHS discovery cohort consisting of 1,939 participants with 4,637 serum samples collected during follow-up (Extended Data Fig. 1a), we identified 148 serum proteins significantly associated with age (FDR < 0.05). Among these, 77 displayed a negative association and 71 exhibited a positive association (Fig. 3a and Supplementary Table 5).
Fig. 3. Identification of ageing-related proteins using longitudinal data.
a,b, Volcano plots showing the associations between age and serum proteins using linear mixed models adjusted for sex, measurement batch and instrument in the GNHS discovery cohort comprising 1,939 participants with 4,637 serum samples (a) and the GNHS validation cohort comprising 1,857 participants with 2,928 serum samples (b). Red (positive) and blue (negative) dots represent proteins significantly associated with age (FDR < 0.05). The top ten positive and negative proteins based on Q values are labelled. Detailed results are presented in Supplementary Table 5. c, Comparison of the associations between age and serum proteins in the GNHS discovery cohort and GNHS validation cohort. Pearson’s correlation coefficient (r) for the associations between the two cohorts is indicated. Red (positive) and blue (negative) dots represent significant proteins (FDR < 0.05) in both cohorts. d, Volcano plot showing the associations between sex and serum proteins using linear mixed models adjusted for age, measurement batch and instrument in the GNHS discovery cohort. Brown (positive) and purple (negative) dots represent proteins significantly associated with sex (male versus female) (FDR < 0.05). The top ten positive and negative proteins based on Q values are labelled. Detailed results are presented in Supplementary Table 5. The horizontal dashed line in a, b and d represents the cutoff Q value of 0.05. Dots (proteins) positioned above this line are considered significant, while those below are not significant. e, Comparison of the associations between sex and serum proteins in the GNHS discovery cohort and GNHS validation cohort. Pearson’s correlation coefficient (r) for the associations between the two cohorts is indicated. Brown (positive) and purple (negative) dots represent significant proteins (FDR < 0.05) in both cohorts. The grey circles represent proteins that were not significantly associated with sex in either the GNHS discovery cohort or the GNHS validation cohort. f, Overlap of 41 ageing-related and sex-related proteins. Overlapped proteins are listed in Supplementary Table 6. g, Subgroup analyses for the 41 proteins that were associated with both age and sex. The circle colour scale indicates the degree and direction of the associations between age and proteins, and the circle size scale indicates the significance of the associations. Only significant associations (FDR < 0.05) are presented. Proteins showing a significant interaction by age and sex (FDR < 0.05) in both the GNHS discovery and validation cohorts are marked with an asterisk. F, female; M, male. Detailed results are presented in Supplementary Table 5.
Subsequently, 86 of the 148 proteins also significantly associated with age in the same direction in the GNHS validation cohort comprising 1,857 participants with 2,928 serum samples collected during follow-up (FDR < 0.05), and the coefficients for these associations were highly correlated (Pearson’s r = 0.96) between the GNHS discovery and validation cohorts (Fig. 3b,c and Supplementary Tables 5 and 6). Additionally, we replicated the associations of the 86 identified proteins with age in the external validation cohort comprising 124 participants with 200 serum samples collected over a 4-year follow-up period. We observed similar associations with age for these 86 proteins between the GNHS discovery cohort and the external validation cohort (Pearson’s r = 0.80; Extended Data Fig. 3a,b). Notably, although 68 out of the 86 proteins exhibited nominal significant associations, only 15 proteins demonstrated significance at FDR < 0.05 in the external validation cohort (Supplementary Tables 5 and 6), which might be partly attributed to its relatively small sample size, shorter follow-up period and different characteristics of the participants (over 10 years older). To further investigate whether the 86 ageing-related proteins can predict age, we fitted a generalized linear mixed model by L1-penalized estimation (GLMMLasso) model20 based on the longitudinal serum proteome data. We identified a subset of 83 ageing-related proteins that showed high accuracy in predicting age within the GNHS discovery cohort (Pearson’s r = 0.70), GNHS validation cohort (Pearson’s r = 0.74) and external validation cohort (Pearson’s r = 0.67) (Extended Data Fig. 3c–e).
Extended Data Fig. 3. External validation for ageing-related proteins and performance of ageing-related proteins to predict age.
a, Volcano plot for the associations between age and the 86 serum proteins (identified as significant in both GNHS discovery cohort and GNHS validations cohort) by linear mixed models adjusted for sex, measurement batch and instrument in the external validation cohort (124 participants with 200 serum samples). Red (positive) and blue dots (negative) represent serum proteins that are significantly associated with age (FDR < 0.05). b, Comparison of the associations between age and the 86 proteins in the GNHS discovery cohort and external validation cohort. The Pearson correlation coefficient (r) for the associations between the two cohorts is indicated. Red (positive) and blue dots (negative) represent serum proteins that are significantly associated with age in both two cohorts (FDR < 0.05). c-e, Performance of the GLMMLasso model using 83 proteins to predict age within the remaining GNHS discovery cohort (1583 observations) (c), the GNHS validation cohort (2928 observations) (d), and the external validation cohort (200 observations) (e). The Pearson correlation coefficient (r) between the chronological and predicted age are indicated.
Given that previous studies suggested the presence of sex differences in the ageing process21,22, we delved into whether the 86 ageing-related proteins were also correlated with sex. We identified 108 proteins that showed significant sex differences in both the GNHS discovery and GNHS validation cohorts (FDR < 0.05) (Fig. 3d,e and Supplementary Tables 5 and 6). Notably, this set included several well-known sex hormones such as the pregnancy zone protein, which has consistently displayed sex differences during the ageing process and has been suggested to be associated with Alzheimer’s disease23–25. Furthermore, we observed an overlap of 41 proteins that were associated with both age and sex (Fig. 3f and Supplementary Tables 5 and 6). Among them, we identified a significant interaction between age and sex for seven proteins (Q values for interaction of <0.05 in both the GNHS discovery and validation cohort), namely prothrombin (THRB), tRNA-splicing endonuclease subunit Sen2 (SEN2), alpha-2-macroglobulin (A2MG), inter-alpha-trypsin inhibitor heavy chain H3 (ITIH3), ras-interacting protein 1 (RAIN), pigment epithelium-derived factor (PEDF) and coagulation factor IX (FA9) (Fig. 3g and Supplementary Table 5). These proteins displayed differential associations with age in males and females. Specifically, four proteins (THRB, A2MG, ITIH3 and RAIN) exhibited stronger associations with age in males than in females, whereas the remaining three proteins (PEDF, FA9 and SEN2) showed robust associations with age only in females, with SEN2 even showing opposite associations in males (Fig. 3g). These findings suggest that the associations of these proteins with age may be sex-dependent.
Biological implications of ageing-related proteins
To understand how ageing-related proteins may interconnect with others and their biological implications, we identified functional networks of protein groups using the Ingenuity Pathway Analysis (IPA), an algorithm that connects proteins based on well-established knowledge26. We identified four protein–protein networks consisting of nine or more ageing-related proteins, each centred around a hub including apolipoprotein B (APOB), complement C3 (C3), nuclear factor kappa-light-chain-enhancer of activated B cells (NF-κB) and immunoglobulin, respectively (Fig. 4a). These networks represented specific functional pathways including lipid metabolism, organismal injury and abnormalities, neurological disease and cell-to-cell signalling and interaction, potentially revealing different biological aspects of the ageing process. Network 1, for instance, showed that 18 ageing-related proteins were involved in lipid metabolism, a crucial pathway in regulating the ageing process27. Similarly, network 3 suggested that the hub NF-κB, a nuclear factor that regulates the immune response to infection, might be a potential pathway underlying ageing-related neurological disease, as reported previously7.
Fig. 4. Functional networks of ageing-related proteins and their longitudinal associations with clinical traits.
a, Four functional networks of ageing-related proteins identified by IPA. Red and blue dots represent gene names for proteins (Supplementary Table 2) that were positively and negatively associated with age (FDR < 0.5), respectively. b, Top five enriched GO terms or pathways based on Q values for ageing-related proteins from each of the four functional networks. All enriched GO terms and pathways are listed in Supplementary Table 7. c, Heatmap showing the longitudinal associations between 86 ageing-related proteins and 32 clinical traits in the GNHS discovery cohort (1,939 participants with 4,637 observations during follow-up), analysed by using linear mixed model adjusted for age, sex, measurement batch and instrument. Only significant associations are indicated (FDR < 0.05). The colour scale indicated the degree and direction of the associations. On the right side, the bar plot indicates the number of significant associations that were validated in the GNHS validation cohort (FDR < 0.05), and the line chart indicates the Pearson correlation for the associations between GNHS discovery and validation cohorts. BMI, body mass index; WC, waist circumference; SBP, systolic blood pressure; DBP, diastolic blood pressure; HDL-C, high-density lipoprotein cholesterol; LDL-C, low-density lipoprotein cholesterol; TG, triglycerides; TC, total cholesterol; FBG, fasting blood glucose; HbA1c, haemoglobin A1c; IL-1β, interleukin-1β; IL-6, interleukin-6; TNF, tumour necrosis factor; ALT, alanine transaminase; AST, aspartate aminotransferase; SOD, superoxide dismutase; ALP, alkaline phosphatase; UA, urine acid; uCRE, urine creatinine; MMSE, Mini-Mental State Examination; MMSE1, immediate orientation; MMSE2, spatial orientation; MMSE3, temporal memory; MMSE4, attention; MMSE5, delayed recall, MMSE6, naming; MMSE7, verbal repetition; MMSE8, reading; MMSE9, verbal comprehension; MMSE10, writing; MMSE11, constructional praxis. Detailed results are presented in Supplementary Tables 8 and 9.
Additionally, we performed upstream regulator analysis using IPA and identified hepatocyte nuclear factor 1-alpha (HNF1A) and interleukin-6 (IL-6) as the top two upstream regulators for the 86 ageing-related proteins (Extended Data Fig. 4a,b). We also conducted functional enrichment analysis for proteins from networks using the g:Profiler toolkit18. The analysis revealed 536, 337, 274 and 193 significantly enriched GO terms and pathways for proteins from each of the four networks, respectively (Fig. 4b and Supplementary Table 7), suggesting that the protein networks are probably involved in ageing biology in a multifaceted manner.
Extended Data Fig. 4. Upstream regulators of ageing-related proteins and longitudinal associations between 86 ageing-related proteins and 32 clinical traits in GNHS validation cohort.
a,b Top two upstream regulators, namely the hepatocyte nuclear factor 1-alpha (HNF1A) (a) and interleukin-6 (IL-6) (b) for ageing-related proteins by Ingenuity Pathway Analysis. Red and blue dots represent gene names for proteins that were positively and negatively associated with age, respectively. c, Heatmap showing the longitudinal associations between 86 ageing-related proteins and 32 clinical traits in the GNHS validation cohort (1857 participants with 2928 observations), analyzed by using linear mixed model adjusted for age, sex, measurement batch and instrument. Only significant associations are indicated (FDR < 0.05). The color scale indicated the degree and direction of the associations. BMI, body mass index; WC, waist circumference; SBP, systolic blood pressure; DBP, diastolic blood pressure; HDL-C, high-density lipoprotein cholesterol; LDL-C, low-density lipoprotein cholesterol; TG, triglycerides; TC, total cholesterol; FBG, fasting blood glucose; HbA1c, hemoglobin A1c; IL-1β, interleukin-1β; IL-6, interleukin-6; TNF, tumor necrosis factor; ALT, alanine transaminase; AST, aspertate aminotransferase; SOD, superoxide dismutase; ALP, Alkaline phosphatase; UA, urine acid; uCRE, urine creatinine; MMSE, Mini-mental State Examination; MMSE1, immediate orientation; MMSE2, spatial orientation; MMSE3, temporal memory; MMSE4, attention; MMSE5, delayed recall, MMSE6, naming; MMSE7, verbal repetition; MMSE8, reading; MMSE9, verbal comprehension; MMSE10, writing; MMSE11, constructional praxis.
We used linear mixed models to investigate the longitudinal relationships between ageing-related proteins with 32 clinical traits including anthropometric, metabolic, inflammatory, cognitive and hepatic and renal parameters. Our analysis revealed a distinct overall pattern of ageing-related proteins associated with clinical traits (Fig. 4c, Extended Data Fig. 4c and Supplementary Tables 8 and 9). Specifically, the protein–trait associations between the GNHS discovery and validation cohorts exhibited high correlations for anthropometric parameters, lipid and glucose metabolic profiles and several hepatic and renal biomarkers. Overall, we identified a total of 320 significant protein–trait associations (FDR < 0.05) in both the GNHS discovery and validation cohorts. Notably, the associations were predominantly significant in anthropometric and cardiometabolic outcomes.
Remarkably, we noted compelling concurrences between the inferred biological implications from network analyses and the actual protein–trait associations. For instance, 12 out of 18 ageing-related proteins from protein network 1, which represents lipid metabolism, were significantly associated with at least one serum lipid biomarkers (Fig. 4c). Additionally, several proteins from protein network 2, which signifies organismal injury and abnormalities, exhibited noteworthy associations with functional biomarkers of liver and kidney (Fig. 4c). These alignments have strengthened our findings regarding the biological relevance of ageing-related proteins and pinpointed their strong correlations with health conditions during ageing.
Associations of ageing-related proteins with diseases
To investigate the potential roles of the above-identified proteomic biomarkers in ageing-related pathologies, we used Cox proportional hazards models to examine the prospective associations between baseline levels of 86 ageing-related proteins and the incidence of 14 chronic diseases during a 9-year follow-up period in 3,414 participants from the entire GNHS cohort (that is, pooling the GNHS discovery and validation cohorts to increase the incident disease cases). We observed a total of 131 nominally significant associations for 67 ageing-related proteins, with more than ten proteins being associated with the incidence of dyslipidemia, hypertension, type 2 diabetes (T2D), fatty liver and hepatitis (Supplementary Table 10).
We then clustered these associations into eight hierarchical groups that represented protein signatures for the long-term risk of incident chronic diseases (Extended Data Fig. 5). To illustrate, the proteins within cluster 1 exhibited a positive association with the risk of developing renal diseases, whereas those within cluster 3 were positively associated with the risk of incident hepatitis. Cluster 2 proteins displayed a strong positive association and cluster 4 proteins demonstrated a strong negative association with the risk of incident cirrhosis. Likewise, proteins in cluster 6 were predominantly negatively associated and proteins in cluster 7 were positively associated with the risk of developing T2D and fatty liver. These findings suggest that our identified ageing-related proteins may have a temporal role in the development of age-related pathologies.
Extended Data Fig. 5. Hierarchical clustering for the nominally significant prospective associations between ageing-related proteins and risk of 14 chronic diseases.
The prospective associations were investigated using a Cox proportional hazards model adjusted for age, sex, BMI, subcohort, and presence of chronic diseases at baseline. Hierarchical clustering identified a total of eight protein clusters based on their nominally significant associations with the 14 chronic diseases. T2D: type 2 diabetes; CHD: coronary heart disease; RA: rheumatoid arthritis; HR: hazard ratio.
Furthermore, our analysis revealed that 35 out of the 131 observed protein–disease associations retained significance following correction for multiple testing (FDR < 0.05). Specifically, we identified 13, 11, 5, 1, 3, 1 and 1 proteins that exhibited significant association with incident T2D, fatty liver, hepatitis, renal disease, dyslipidemia, hypertension and rheumatoid arthritis, respectively (Fig. 5 and Supplementary Table 10). Among them, eight proteins, including alpha-1-antitrypsin (A1AT), leucine-rich alpha-2-glycoprotein (A2GL), A2MG, adiponectin (ADIPO), zinc finger protein Gfi-1 (GFI1), ITIH3, RAIN and vitronectin (VTNC), were associated with two or more ageing-related metabolic diseases. For example, a 1 s.d. increase in baseline levels of A1AT and A2GL was associated with a 30% and 29% lower risk of incident T2D and a 17% and 17% lower risk of fatty liver, respectively (Fig. 5 and Supplementary Table 10). Moreover, we found that 16 out of the 25 disease-associated proteins were drug-targetable as indicated by the DrugBank database (Supplementary Table 11). Interestingly, ten disease-associated proteins were targeted by zinc and zinc compounds, indicating the potential benefits of zinc supplementation in promoting healthy ageing.
Fig. 5. Prospective associations between ageing-related proteins and incident chronic diseases.
Prospective associations between ageing-related proteins and the risk of incident dyslipidemia, hypertension, T2D, fatty liver, hepatitis and renal diseases in the entire GNHS cohort (n = 3,414). The Cox proportional hazards model was used with adjustment for age, sex, BMI, subcohort and presence of chronic diseases at baseline. The dots and horizontal lines represent hazard ratios (HRs) and corresponding 95% confidence intervals (CIs), respectively. Two-sided P values were calculated, and Q values were estimated using the Benjamini–Hochberg approach to control for multiple testing. Detailed results are presented in Supplementary Table 10.
Ageing-related proteins as cardiometabolic health indicators
Next, we used the random forest machine-learning algorithm to examine whether a combination of these proteins could function as a proteomic classifier to discriminate the overall healthy and unhealthy status of participants (defined by the presence of any of the 14 ageing-related diseases). We set 1,785 participants with serum proteome at baseline from the GNHS discovery cohort as the training dataset and set the 1,629 participants with serum proteome at baseline from the GNHS validation cohort as the validation dataset. The random forest model incorporating 86 ageing-related proteins achieved an area under the receiver operating characteristic curve (AUC) of 0.70 in distinguishing between healthy and unhealthy participants, which was comparable to the performance of the model using 408 serum proteins (AUC = 0.72) (Fig. 6a). By performing tenfold cross-validation, we identified a more concise random forest model consisting of the top 22 most important ageing-related proteins (Fig. 6b and Supplementary Table 12). This concise model achieved equivalent accuracy (AUC = 0.70) compared to the model containing 86 ageing-related proteins (Fig. 6a) and demonstrated significantly higher accuracy than ten models using random subsets of 22 proteins selected from the total pool of 408 proteins (all P < 0.05; Extended Data Fig. 6). Additionally, the majority of the top 22 ageing-related proteins exhibited significant differences between healthy and unhealthy participants, as evident in both the training and validation datasets (Fig. 6c and Supplementary Table 12).
Fig. 6. Ageing-related proteins as indicators of cardiometabolic health.
a, Performance of the random forest models using 408 serum proteins, 86 ageing-related proteins and the top 22 ageing-related proteins in discriminating between healthy and unhealthy participants in the validation dataset (n = 1,629). b, Top 22 important proteins identified for the final random forest model by tenfold cross-validation, with feature importance measured by mean decrease accuracy. c, Associations between the top 22 proteins and healthy status (yes or no) by linear models in the training (n = 1,785) and validation dataset (n = 1,629). *Q value < 0.05; **Q value < 0.01; ***Q value < 0.001. See Supplementary Table 12 for details. d, Performance of the random forest models using intrinsic factors (age, sex and BMI), top 22 ageing-related proteins and their combination in discriminating between healthy and unhealthy participants in the validation dataset (n = 1,629). e, Longitudinal association between PHAS and 32 clinical traits in the GNHS discovery cohort (1,939 participants with 4,637 observations during follow-up) by linear mixed model adjusted for age and sex. Significant associations (FDR < 0.05) are labelled. f, Comparison of associations between PHAS and 32 clinical traits in the GNHS discovery cohort and GNHS validation cohort (1,857 participants with 2,928 observations during follow-up). Pearson correlation coefficient (r) for the associations between the two cohorts is indicated. Significant associations (FDR < 0.05) in both two cohorts are labelled. Detailed results are presented in Supplementary Table 13. g, Prospective associations between baseline PHAS (per 1 s.d. increase) and incidences of chronic diseases in the entire GNHS cohort (n = 3,414) using Cox proportional hazards model 1 adjusted for age, sex, BMI and subcohort, and model 2 further adjusted for baseline disease status. The dots and horizontal lines represent HRs and corresponding 95% CIs, respectively. Detailed results are presented in Supplementary Table 14. CHD, coronary heart disease; RA, rheumatoid arthritis.
Extended Data Fig. 6. Performance of random forest models trained by random subsets of 22 proteins.
We trained 10 random forest models, each utilizing a random subset of 22 proteins selected from the common pool of 408 proteins. The performances of these random forest models were evaluated by calculating the area under the receiver operating characteristic curve (AUC). Differences in performance between the random forest models were tested by DeLong test. The P values comparing the model using the top 22 ageing-related proteins to those using random subsets 1 to 10 were 8.38×10−11, 1.44×10−4, 2.10×10−14, 1.32×10−5, 1.55×10−12, 1.42×10−2, 2.61×10−12, 3.25×10−7, 2.06×10−10, and 2.02×10−5, respectively.
Given that the 22 proteins were all correlated with age and predominantly associated with sex and BMI, we proceeded to compare their predictive performance with intrinsic factors (age, sex, BMI) in distinguishing between healthy and unhealthy participants. The model using age, sex and BMI achieved an AUC of 0.63, while the model including the 22 proteins achieved an AUC of 0.70 and the full model incorporating a combination of these factors achieved an AUC of 0.72 (Fig. 6d). These results suggest that the discriminate power of the 22 ageing-related proteins may be not solely attributed to age, sex and BMI.
Based on our random forest model using 22 ageing-related proteins, we developed a PHAS to serve as an indicator of overall health status. We observed that higher PHAS values were longitudinally associated with improved anthropometric parameters, lipid and glucose metabolic biomarkers and improved hepatic and renal biomarkers (Fig. 6e and Supplementary Table 13), with consistent associations (Pearson’s r = 0.72) in both the GNHS discovery and validation cohorts (Fig. 6f and Supplementary Table 13).
We proceeded to validate the performance of 22 proteins in distinguishing between healthy and unhealthy participants at baseline within the external validation cohort. We found a comparable accuracy (AUC = 0.71) (Extended Data Fig. 7a) to that observed in the GNHS validation cohort (AUC = 0.70) (Fig. 6a). Additionally, within the external validation cohort, the PHAS exhibited longitudinal associations with lower waist circumference and serum uric acid as well as improved lipid and glucose metabolic measures (FDR < 0.05) (Extended Data Fig. 7b). To further validate these pivotal findings, we used multiple reaction monitoring (MRM)–MS-based targeted proteomics to quantify the levels of the 22 proteins and replicated these analyses in the external validation cohort. We found a considerable alignment of protein levels (median Spearman’s ρ = 0.47) between the DIA-MS-based proteomics assay and the MRM–MR-based targeted proteomics assay (Extended Data Fig. 8a). Moreover, the 22 proteins measured by MRM–MR-based targeted proteomics achieved similar performance in distinguishing between healthy and unhealthy participants (AUC = 0.69) (Extended Data Fig. 8b) compared to our primary analysis using the DIA-MS-based proteomics assay (AUC = 0.71) (Extended Data Fig. 7a). Furthermore, the PHAS generated by the random forest models based on these two proteomics assays were highly correlated (Pearson’s r = 0.68, mean absolute error = 0.08) (Extended Data Fig. 8c) and exhibited similar longitudinal associations with clinical traits (Pearson’s r = 0.87) (Extended Data Fig. 8d). Specifically, the PHAS based on targeted proteomics were also significantly associated with lower waist circumference and serum uric acid as well as improved lipid profiles (lower triglycerides and low-density lipoprotein cholesterol) and glucose metabolic measures (lower hemoglobin A1c and insulin levels) (FDR < 0.05) (Extended Data Fig. 8d). The validation by targeted proteomics in the external validation cohort has notably strengthened the reliability of our findings.
Extended Data Fig. 7. Performance of the random forest model and the longitudinal associations of PHAS with 14 clinical traits in the external validation cohort.
a, Performance of the random forest model using the top 22 important proteins in discriminating the healthy status of the 124 participants at baseline from the external validation cohort. The area under the receiver operating characteristic curve (AUC) is indicated. b, Longitudinal associations of proteomic healthy ageing score (PHAS) with 14 available clinical traits in the external validation cohort (124 participants with 200 observations during follow-up) by linear mixed models adjusted for age and sex. The 14 clinical traits include BMI, body mass index; WC, waist circumference; HDL-C, high-density lipoprotein cholesterol; LDL-C, low-density lipoprotein cholesterol; TG, triglycerides; TC, total cholesterol; FBG, fasting blood glucose; HbA1c, hemoglobin A1c; Insulin; ALT, alanine transaminase; AST, aspertate aminotransferase; ALP, alkaline phosphatase; UA, urine acid; uCRE, urine creatinine.
Extended Data Fig. 8. Validation of the 22 proteins constructing for proteomic healthy ageing score by MRM-MS-based targeted proteomics in the external validation cohort.
The 22 proteins were measured by the multiple reaction monitoring (MRM)-MS-based targeted proteomics assay in 179 available serum samples from 115 participants in the external validation cohort. a, Spearman’s ρ for the levels of the 22 proteins proteins measured by our primary DIA-MS-based proteomics assay and the MRM-MS-based targeted proteomics assay in the 179 serum samples. b, Performance of the random forest model using the 22 proteins measure by MRM-MS based targeted proteomics in discriminating healthy and unhealthy status of 104 participants at baseline from the external validation cohort. c, Pearson’s correlation of the proteomic healthy ageing scores (PHASs) derived from the 22 proteins measured by DIA-MS-based proteomics and by MRM-MS-based targeted proteomics. MAE, mean absolute error. d. Comparison of the longitudinal associations between 14 clinical trails and the proteomic heathy ageing scores (PHASs) generated by DIA-MS based proteomics and by MRM-MS based targeted proteomics. The longitudinal associations between 14 clinical traits and PHAS were investigated by linear mixed models adjusted for age and sex. The 14 clinical traits include BMI, body mass index; WC, waist circumference; HDL-C, high-density lipoprotein cholesterol; LDL-C, low-density lipoprotein cholesterol; TG, triglycerides; TC, total cholesterol; FBG, fasting blood glucose; HbA1c, hemoglobin A1c; Insulin; ALT, alanine transaminase; AST, aspertate aminotransferase; ALP, alkaline phosphatase; UA, urine acid; uCRE, urine creatinine. The triangles represent clinical traits that displayed significant associations (FDR < 0.05) with PHAS both in our primary analysis using DIA-MS based proteomics and the replicate analyses using MRM-MS-based targeted proteomics within the external validation cohort.
To delve into the long-term clinical implications of PHAS, we examined its association with the incidence of 14 ageing-related diseases among 3,414 participants within the entire GNHS cohort, using multivariable-adjusted Cox proportional hazard models. Our analysis revealed that for a 1 s.d. increase in baseline PHAS, there was a 72% lower risk of developing chronic diseases during follow-up (Fig. 6g and Supplementary Table 14). Additionally, we observed 53%, 32%, 53% and 40% lower risk of incident T2D, dyslipidemia, fatty liver and hypertension (Fig. 6g and Supplementary Table 14). These findings suggest that the PHAS might serve as a predictive indicator for incident cardiometabolic diseases during the ageing process.
Determinants of PHAS and 22 ageing-related proteins
To investigate potential determinants of PHAS and the 22 ageing-related proteins used for constructing the PHAS, we evaluated the proportion of variances explained by intrinsic factors (age, sex, BMI), lifestyle factors (smoking, alcohol drinking, physical activity), diet (15 food groups or items), gut microbiota (219 species based on metagenome sequencing) and host genetics (65 independent genetic variants associated with the identified proteins based a recent Chinese protein quantitative trait loci study28) among 1,325 participants from the GNHS discovery cohort who had all multi-omics data available. We used permutational multivariate analysis of variance (PERMANOVA)29 with a backward feature selection procedure to assess the explained variance for the entire set of the 22 ageing-related proteins and used linear models with variables selected through the least absolute shrinkage and selection operator (LASSO) method30 to determine the explained variance for individual proteins as well as the PHAS (see ‘Distance-based and linear model-based variance estimation’ in the Methods).
We found that host genetics accounted for the largest proportion of variance (7.9%) for the whole set of 22 ageing-related proteins, followed by intrinsic factors (4.0%) and gut microbiota (3.8%). By contrast, lifestyle factors (0.6%) and diet (0.4%) explained smaller proportions of the variance (Fig. 7a and Supplementary Table 15). Consistently, host genetics, intrinsic factors and gut microbiota were the primary determinants of variance for most individual proteins. Out of the 22 ageing-related proteins, ten individual proteins were predominantly explained by intrinsic factors, six were explained by host genetics and four were explained by gut microbiota (Fig. 7b and Supplementary Table 16). With respect to the PHAS, intrinsic factors explained the largest proportion of the variance (7.0%), followed by gut microbiota (6.3%) and host genetics (4.1%), whereas lifestyle factors (0.1%) and diet (0.5%) had a much smaller contribution to the variance (Fig. 7a and Supplementary Table 15).
Fig. 7. Determinants of PHAS and its comprising proteins.
a, Variance of the whole set of 22 serum proteins and the PHAS explained by indicated factor groups among 1,325 participants who had multi-omics data available from the GNHS discovery cohort. The explained variance of the 22 serum proteins was estimated by using PERMANOVA with backward feature selection. The explained variance of PHAS was estimated by using linear models including variables selected by the LASSO method. Detailed results are presented in Supplementary Table 15. b, Variance of each of the 22 serum proteins explained by the indicated factor groups among the 1,325 participants, estimated by using linear regressions including variables selected by the LASSO method. Detailed results are presented in Supplementary Table 16. c, Associations between PHAS and each of the 18 microbial species that contributed to the variance explanation of PHAS among the 1,325 participants. Linear models were used with adjustments for age, sex and BMI. The y axis indicates two-sided P values. Q values were estimated using the Benjamini–Hochberg approach. Red and blue circles represent significant positive and negative associations at Q < 0.05. Detailed results are presented in Supplementary Table 17. The grey circles represent microbial species that were not associated with PHAS (Q value ≥ 0.05). d,e, Associations between PHAS and gut microbial score in the GNHS discovery cohort (n = 1,325) (d) and the external validation cohort (n = 34) (e). Linear models were used with adjustments for age, sex and BMI. The red lines represent the fitted linear association, and the shaded regions represent 95% CIs. Two-sided P values are indicated.
Given that gut microbiota emerged as the modifiable factor explaining the variation of PHAS, we conducted exploratory analyses to examine the associations of PHAS with each of the 18 gut microbial species that contributed to the variance explanation. We found that 15 out of the 18 microbial species exhibited significant associations with the PHAS (FDR < 0.05) (Fig. 7c and Supplementary Table 17). We then created a gut microbial score based on the 18 gut microbial species and found that the gut microbial score had a strong positive association with PHAS (β = 0.65, P = 4.30 × 10−19) in the GNHS discovery cohort (Fig. 7d), which remained stable in the external validation cohort (β = 0.79, P = 4.64 × 10−2) (Fig. 7e). These results indicate that gut microbiota may be an important modifiable factor associated with the PHAS.
Discussion
In this longitudinal study, we identified serum proteomic biomarkers associated with ageing in a cohort of 3,796 middle-aged and elderly adults over a 9-year follow-up period. By incorporating deep phenotyping data, we comprehensively investigated the biological implications of identified ageing-related proteins and their associations with various clinical traits and chronic diseases. Our findings suggest that ageing-related proteins are closely associated with health status and disease risk during ageing, especially for cardiometabolic health. Based on these ageing-related proteins, we created a PHAS that was associated with long-term risk of several cardiometabolic diseases.
We used an MS-based proteomics approach to measure serum proteome. Although aptamer-based proteomics (SOMAscan) and antibody-based proteomics (Olink) have the advantage of high sensitivity and the ability to detect thousands of proteins31,32, MS-based proteomics offers an unbiased and hypothesis-free way of analysing the serum proteome. In comparison to a previous short-term longitudinal study involving 106 participants using MS-based proteomics15, we identified an overlap of 20 ageing-related proteins, with 12 exhibiting consistent directions and 8 showing opposite directions. Compared to a recent large-scale study that investigated the cross-sectional association of age with plasma proteins measured by aptamer-based proteomics17, we found an overlap of 42 ageing-related proteins, with 35 exhibiting consistent directions and 7 showing opposite directions (Supplementary Table 18). Overall, our study identified 38 ageing-related proteins that were novel compared to the two previous studies. Furthermore, our large-scale longitudinal study, conducted over an extended follow-up period, holds an advantage in addressing the potential heterogeneities in individual patterns of proteomic changes during ageing, which strengthens the validity of these novel proteins. However, additional longitudinal studies including diverse populations with different ethnic and environmental backgrounds are needed to examine the generalizability of our findings.
As ageing is commonly associated with declining health33, probing into the biological and clinical implications of ageing-related biomarkers is of high importance. We have shown that the identified 86 ageing-related proteins are interconnected in functional networks including lipid metabolism, organismal injury and abnormalities, neurological disease and cell-to-cell signalling and interaction, which are closely correlated with the ageing process4,27,34,35. In line with previous studies14, further functional enrichment analysis for proteins from these networks has identified over 1,000 GO terms and functional pathways, suggesting that ageing-related proteins may contribute to the ageing process in a more complex manner. Nevertheless, these functional networks as well as diverse biological pathways have deepened our understanding of the roles of ageing-related proteins in ageing biology.
Our findings on the prospective associations of ageing-related proteins with long-term risk of cardiometabolic diseases appear to be biologically plausible. For instance, the inverse associations of alpha-1 antitrypsin and zinc-alpha2-glycoprotein with incident T2D and fatty liver could be explained by their multifunctional roles in regulating metabolism. Alpha-1 antitrypsin, an alpha globulin glycoprotein, has been suggested to prevent overt hyperglycemia, increase insulin secretion and protect pancreatic β cells from apoptosis in diabetes36. Similarly, the zinc-alpha2-glycoprotein has been linked to improved glycemic control and insulin sensitivity37. Although our study did not establish causal relationships between ageing-related proteins and diseases, the identified prospective protein–disease associations may unveil potential therapeutic targets for cardiometabolic diseases that need further investigation, especially given that nearly two-thirds of the identified disease-associated ageing proteins are drug-targetable. For example, ten disease-associated ageing proteins can be targeted by zinc and zinc compounds, which could be particularly promising for intervention as zinc is essential for immune responses and protein synthesis38 and is frequently deficient in the elderly population39.
Using a random forest machine-learning model, we developed the PHAS to distinguish between healthy and unhealthy participants. As anticipated, PHAS demonstrated associations with improved clinical phenotypes and with lower incidence of overall and specific ageing-related cardiometabolic diseases. Given that the PHAS is constructed using a concise combination of 22 serum ageing-related proteins, it would be a readily accessible tool for monitoring cardiometabolic health in the future. Furthermore, our analyses of lifestyle and multi-omics determinants of PHAS unveil several potential therapeutic targets for future investigation. We found that gut microbiota may be one of the most important modifiable factors influencing PHAS. A microbial score derived from variance-contributing microbial species for PHAS showed a robust positive association with PHAS, aligning with previous evidence suggesting that gut microbiome patterns can reflect healthy ageing and predict survival in older age40. However, our findings on the determinants of PHAS should be interpreted with caution. For instance, the relatively low explained variance of PHAS by lifestyle factors could be because of the low prevalence of smoking (6.87%) and alcohol drinking (7.02%) among participants. Importantly, this observation does not inherently conflict with the well-established detrimental health effects associated with smoking and alcohol drinking.
We acknowledge several limitations in our study. Firstly, our MS-based proteomics approach identified a total of 438 serum proteins after quality control, which is fewer than targeted proteomics methods such as the SomaScan assay14. Nonetheless, this approach still enables us to explore and identify novel ageing-related proteins in a hypothesis-free way. Secondly, although we demonstrate the temporal relationship of the serum ageing-related proteins with various clinical outcomes, causality could not be established at this stage. Thirdly, defining ‘healthy’ and ‘unhealthy’ participants based on the presence of 14 ageing-related diseases may be insufficient and could potentially introduce bias caused by misclassification. Fourthly, well-studied proteins tend to have richer annotations than less-known proteins, which could introduce bias and affect the comprehensiveness of pathway analyses and the interpretation of results. Lastly, it is important to note that our longitudinal analyses were limited to a cohort of middle-aged and elderly Chinese participants, and the external validation of our primary findings was based on a small cohort of the elderly population with a short follow-up period. Therefore, it is imperative to carry out additional large-scale longitudinal studies to validate and generalize our findings.
In conclusion, this longitudinal study expands our knowledge of the serum proteomic landscape in the context of ageing and its implications for human health. Our study has identified serum proteomic biomarkers associated with ageing and provided valuable insights into the underlying mechanisms of human ageing from a proteomics perspective. Importantly, our findings indicate that these discovered proteomic biomarkers have the potential to serve as valuable tools for monitoring and predicting ageing-related cardiometabolic disease. These ageing-related proteomic biomarkers hold great clinical relevance, offering promising intervention and therapeutic targets for addressing ageing-related morbidities.
Methods
Study design and participants
The present study complies with all relevant ethical regulations and was approved by the Ethics Committee of the School of Public Health at Sun Yat-sen University and the Ethics Committee of Westlake University. All participants provided written informed consent. Our study used data from the GNHS, an ongoing community-based prospective cohort study involving 4,048 middle-aged and older Chinese adults living in Guangzhou City in southern China (ClinicalTrials.gov identifier: NCT03179657). Participants were recruited between 2008 and 2013 and were followed up approximately every 3 years. Socio-demographic and lifestyle characteristics, dietary factors, medical history, anthropometric data and clinical traits were collected through face-to-face interviews and health examinations during follow-up41.
We collected a total of 7,890 serum samples from 3,840 participants during the 9-year follow-up period. After excluding participants who did not provide detailed demographic information (n = 44) and performing serum proteome data cleaning and filtration, we included 3,796 participants with 7,565 serum samples for analysis. We divided the 3,796 participants into two subcohorts: subcohort 1, which included 1,939 participants from a multi-omics substudy within the GNHS (with genomic and faecal metagenomic data), was set as the GNHS discovery cohort; and subcohort 2, which comprised the remaining 1,857 participants, was set as the GNHS validation cohort. Figure 1 illustrates the distributions of 7,565 serum samples with proteomic profiles across participants during follow-up in the GNHS discovery and validation cohorts. The median baseline age in the GNHS discovery and validation cohorts was 57.6 years (first quartile, 53.9 years; third quartile, 61.8 years) and 57.1 years (first quartile, 53.6 years; third quartile, 62.1 years), respectively (Supplementary Table 1).
We included 124 participants from an independent external cohort. These participants, with a median age of 70 years (first quartile, 64 years; third quartile, 74 years), were recruited in 2009, and 76 of them were further followed up approximately 4 years later. Among these participants, we collected 200 serum samples from the two visits for proteomics measurement. We set this cohort as an external validation cohort to verify the ageing-related proteins.
Serum proteome profiling
Serum proteins were identified and quantified by MS-based proteomics42–44. In brief, peptide samples were prepared from the serum samples and injected into an Eksigent NanoLC 400 System coupled to a TripleTOF 5600 system (SCIEX) for the SWATH-MS analysis.
We measured serum samples of the GNHS discovery and validation cohorts in 178 and 132 batches, respectively, each containing 29 serum samples, two biological replicates and one pooled serum sample for quality control. The 200 serum samples from the external validation cohort were randomly assigned to seven out of the 178 measurement batches for the GNHS discovery cohort. Serum samples were acquired two to three times using the 20-min DIA-MS method as previously described45. The MS files were analysed using DIA-NN software (v.1.8) within a spectral library containing 5,102 peptides and 819 unique proteins from the Swiss-Prot database of Homo sapiens44,46. After data cleaning and filtration for samples used in this study, we obtained a proteomic matrix containing 438 proteins from 4,637 serum samples in the GNHS discovery cohort, a matrix containing 413 proteins from 2,928 serum samples in the GNHS validation cohort and a matrix containing 432 proteins from 200 serum samples in the external validation cohort. Details of the data cleaning process have been described previously44. Our proteomic data showed high consistency and reproducibility, as the median Pearson correlation coefficients between pooled serum samples, biological replicates and technical replicates were all ≥0.93 (ref. 44).
To validate our key findings based on the DIA-MS-proteomics assay, we further quantified the levels of 22 crucial proteins that constitute the PHAS in 179 available serum samples from 115 participants within the external validation cohort by using MRM, a more sensitive, reproducible and high-throughput MS-based targeted proteomic assay. A total of 49 peptides selected from the aforementioned spectral library44,46 were quantified by MRM, including 15 common internal retention time standard peptides used for retention time calibration and 34 peptides used for quantification of the 22 proteins (Supplementary Table 19). Peptides were separated using a Jasper high-performance liquid chromatography (HPLC) system (SCIEX) at a flow rate of 0.2 ml min−1 over a 10 min liquid chromatography gradient, ranging from 3% to 35% buffer B (buffer A, 0.1% formic acid (Fisher Chemical, cat. no. A117-50) in HPLC water (Fisher Chemical, cat. no. W6-4); buffer B, 0.1% formic acid in acetonitrile (Fisher Chemical, cat. no. A955-4)). The ionized peptides were analysed by the TRIPLE QUAD 4500MD system (SCIEX) in MRM acquisition mode. A total of 161 transitions from 49 peptides were quantified within a 5-min time window using a time-scheduled acquisition model. The cycle time was configured to 10 ms, comprising 5 ms of dwell time and 5 ms of pause time.
We log10-transformed the relative abundance of serum proteins and normalized the transformed values of each protein within each cohort. Note that this normalization of protein levels was essential when clustering serum proteome trajectories during follow-up.
Clustering serum proteome trajectories during follow-up
To capture the global proteome trajectories during ageing, we clustered changes in serum protein levels across three time points over the 9-year follow-up period. The analysis included 1,018 participants from the GNHS discovery cohort who had serum proteome data available across all three time points. We calculated the mean z-score for each of the 438 proteins at each time point and then clustered them by k-means clustering. The optimal number of clusters was determined by the elbow method using the sum of squared errors. Trajectories of proteins in each cluster were visualized by line plots. We also captured these trajectories for participants stratified by sex and baseline age (>60 years or ≤60 years) to explore potential heterogeneity.
Linear mixed models
To handle the longitudinal data during follow-up, we fitted linear mixed models to investigate the linear associations between variables:
where Y is a vector of observations, β is a vector of fixed effects, μ is a vector of random effects and ε is a vector of random errors. X and Z are model matrices of independent variables related to β and μ, respectively. In this study, we fitted random intercepts for participant ID.
To identify protein trajectories that showed significant trends during follow-up, we fitted the following linear mixed model:
where the ordinal variable of time points during follow-up was set as the independent variable, proteomic measurement batches and instruments were set as covariates and participant ID was accounted for random intercepts. α represents the intercept of the model. We did not include age as a covariate because of collinearity with follow-up.
To investigate associations of age and sex with protein levels during follow-up, we fitted the following linear mixed model:
We applied this model to the GNHS discovery cohort and validation cohort, respectively, to identify proteins associated with age and sex. The interaction between age and sex on protein levels was examined by incorporating an interaction term (multiplying age and sex) into the linear mixed model. We also fitted the above linear mixed model in the external validation cohort for proteins that showed significant associations in both the GNHS discovery and validation cohorts.
To determine the relationship between identified ageing-related proteins and PHAS with 32 clinical traits, we used the following linear mixed model:
Here, the 32 clinical traits of interest were anthropometric parameters including BMI, waist circumference, systolic blood pressure and diastolic blood pressure; serum lipid profiles including high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, triglycerides and total cholesterol; biomarkers of glucose metabolism including fasting blood glucose, insulin and haemoglobin A1c; inflammatory cytokines including IL-1, IL-6 and tumour necrosis factor; hepatic biomarkers including serum alanine transaminase, aspartate aminotransferase and superoxide dismutase; renal biomarkers including serum alkaline phosphatase, urine acid and urine creatinine; and total Mini-Mental State Examination (MMSE)47 score and MMSE scores of immediate orientation, spatial orientation, temporal memory, attention, delayed recall, naming, verbal repetition, verbal comprehension, reading, writing and constructional praxis. We standardized the values of these clinical traits to facilitate the comparability of coefficients for the associations between these traits and proteins. We conducted the analyses only with the GNHS discovery and GNHS validation cohort but not the external validation cohort owing to data availability. In addition, MMSE measures were only available at the third follow-up for the GNHS discovery and validation cohorts.
For all these models, we calculated the FDR-adjusted P values (Q values) using the Benjamini and Hochberg approach to control for multiple testing48.
Generalized linear mixed models by L1-penalized estimation
To examine whether the identified ageing-related proteins could predict chronological age, we used the GLMMLasso20 model based on the longitudinal data. We trained the model within 1,018 participants from the GNHS discovery cohort who had three serum proteome measurements taken during follow-up and tested the performance of the model on the remaining GNHS discovery cohort (1,583 observations), GNHS validation cohort (2,928 observations) and the external validation cohort (200 observations). We included sex and the 86 ageing-related proteins as initial input variables and determined the optimal GLMMLasso model using the Akaike information criterion. A subset of 83 ageing-related proteins was included in the final GLMMLasso model. The performance of the GLMMLasso model was evaluated using the Pearson correlation between actual and predicted age.
Random forest model for the PHAS
We used a random forest model (randomForest package in R49) to identify the proteomic signatures that can differentiate healthy from unhealthy participants at baseline. We defined participants as unhealthy if they had any of the 14 ageing-related diseases including dyslipidemia, T2D, hypertension, stroke, coronary heart diseases, fatty liver, cirrhosis, hepatitis, renal disease, cancer, gout, rheumatoid arthritis, cataracts or Parkinson’s disease. To ensure the independence of samples in the random forest model, we set the 1,785 participants at baseline from the GNHS discovery cohort as the training dataset to train the model and the 1,629 participants from the GNHS validation cohort at baseline as the validation dataset to test the performance of the model. We initially trained two random forest models: one included all the 408 serum proteins that are common in both the training and validation dataset as input features and another included 86 ageing-related proteins as the input features. Based on the model that included the 86 ageing-related proteins, we identified a more concise model that included the top 22 important ageing proteins evaluated by a mean decrease in accuracy by performing tenfold cross-validation. To validate the superiority of the model using the top 22 important ageing-related proteins, we compared it to ten simulated models, each using a random subset of 22 proteins selected from the total pool of 408 proteins. As the 22 ageing-related proteins were associated with age and also largely associated with sex and BMI, we investigated whether the predictive value of the 22 ageing-related proteins was attributed to age, sex and BMI by training other two random forest models: one model only including age, sex and BMI and another including age, sex, BMI and the 22 ageing-related serum proteins. The performances of all the random forest models were assessed by calculating the AUC. Differences in model performance were examined by DeLong test.
We used the random forest model based on the top 22 important ageing-related proteins to generate the PHAS, which reflects the probability of participants being classified as ‘healthy’ according to the random forest model. This probability (PHAS) was estimated by using the tree voting aggregation approach50, which calculates the proportion of trees voting for the ‘healthy’ class. The calculation was performed according to the following formula:
where X represents the matrix of the 22 ageing-related proteins for a particular participant, T represents the total number of trees in the random forest model (which was set to 1,000 in our study) and the function ht (X, ‘healthy’) denotes the prediction of whether a participant was classified as ‘healthy’ in tree t (t = 1, 2, 3, …, T).
Functional enrichment analysis for the serum proteins
To explore the biological significance of the identified groups of proteins, we conducted functional enrichment analysis using the g:Profiler toolkit18. We mapped all the proteins to gene Entrez ID as input for functional enrichment analysis. For proteins mapped to multiple Entrez IDs, we only used the first Entrez ID to avoid false positive enrichment. We tested the over-representation of gene groups of interest against the background of H. sapiens (human) genes. To correct for multiple testing, we calculated the Q values using the Benjamini and Hochberg approach48 independently for KEGG51, Reactome52 and WikiPathways databases53 as well as the three subclasses of GO54: GO molecular function, GO cellular component and GO biological process.
IPA for protein networks
To gain insight into the biological implications and protein–protein networks of the identified ageing-related proteins, we used QIAGEN IPA (v.122103623), an application that facilitates analysis and interpretation of omics data based on the Ingenuity Knowledge Base26. We input the gene symbols of 86 ageing-related proteins into IPA. For the protein haemoglobin subunit alpha (HBA) that mapped to multiple genes (HBA1, HBA2) (Supplementary Table 2), we used the first gene symbol (HBA1). We then conducted the IPA network analysis to identify interactions and networks of the ageing-related proteins. Protein networks are algorithmically generated, including both direct and indirect confirmed relationships between genes and gene products. Within each network, molecules having the most interactions with others were identified as hubs, and edges were used to represent functional activation or inhibition as well as the regulation of biological processes between molecules. Each network was limited by a maximum of 35 molecules to keep it concise and discrete from the others. To further determine the biological relevance of proteins in each network, we performed functional enrichment analysis for ageing-related proteins in each network using the g:Profiler toolkit18. For exploratory purposes, we also identified the upstream regulators for ageing-related proteins using IPA core analysis.
Cox proportional hazards models
To explore the potential role of the identified ageing-related proteins in healthy ageing, we examined the prospective associations between baseline levels of ageing-related proteins and the incidence of 14 chronic diseases during follow-up using Cox proportional hazards models in Stata (v.15.0). The diseases of interest included dyslipidemia, T2D, hypertension, stroke, coronary heart diseases, fatty liver, cirrhosis, hepatitis, renal disease, cancer, gout, rheumatoid arthritis, cataracts and Parkinson’s disease. To ensure statistical power, particularly for diseases with limited cases during the follow-up period, we analysed data from the entire GNHS study (3,414 participants available at baseline) instead of separating the GNHS discovery and GNHS validation cohort. To account for potential heterogeneity, we included the subcohort information as a covariate in the Cox proportional hazards models. We excluded participants with diseases of interest at baseline, and adjustments were made for age, sex, BMI, subcohort and the presence of other diseases. To correct for multiple testing, we calculated Q values using the Benjamini and Hochberg approach48. We then explored the drug-targetable information of proteins significantly associated with the incidence of any of the 14 chronic diseases (FDR < 0.05) by consulting the DrugBank database55.
To investigate the long-term health implications of the PHAS, we examined the prospective associations between baseline PHAS and incidence of chronic diseases during follow-up by performing the following Cox proportional hazards models: model 1 was adjusted for age, sex, BMI and subcohort; model 2 was adjusted for covariates in model 1 plus the presence of the other 13 diseases at baseline. It is important to note that for the overall incidence of chronic diseases, model 2 was the same as model 1, given that participants with any of the 14 diseases at baseline would be excluded.
Distance-based and linear model-based variance estimation
To explore the potential determinants of PHAS and the 22 ageing-related proteins included in the random forest model, we estimated the proportion of their variance explained by the intrinsic factors (age, sex, BMI), lifestyle (smoking, drinking and physical activity), dietary factors, gut microbiota and host genetics. Among them, physical activity was measured by metabolic equivalent for task. Dietary intake of 15 food groups, including whole grain, refined grain, vegetables, fruits, legumes, nuts, red meat, poultry, fish, dairy, egg, tea, coffee, juices and sweetened beverages, was assessed by a validated food frequency questionnaire41. For the gut microbiota, we used 219 out of 1,008 microbial species that were present in at least 10% of the samples. For genetic variants, we used 65 genetic variants that were significantly associated with any of the 18 proteins at P < 5 × 10−8 from our previous Chinese proteome genome-wide association study28. We performed the variance estimation analysis based on 1,325 participants from the GNHS discovery cohort for whom multi-omics data were available.
We used a distance-based PERMANOVA29 procedure to estimate the explained variance for the entire set of 22 serum proteins. To account for potential overfitting of included factors, we first identified individual factors that were significantly associated with the β-diversity of the 22 serum proteins by PERMANOVA. Only significant individual factors were included in the overall PERMANOVA model. We then performed backward selection; that is, we eliminated individual factors that were not significant in the overall PERMANOVA model and re-fitted the model until all included factors were significant.
We applied linear models to estimate the explained variance of PHAS and each of the 22 serum proteins. We used the adjusted R2 to represent the explained variance. To address potential overfitting issues, we selected contributing factors by using the LASSO model30 at ‘Lambda.min’ with tenfold cross-validation, which provides a conservative estimation for the explained variance56. For exploratory purposes, we examined the pairwise associations of PHAS with the 18 microbial species that were selected by LASSO to explain the variance of PHAS. Linear models adjusted for age, sex and BMI were used for this analysis. To account for multiple testing, we calculated FDR-adjusted P values (Q values) using the Benjamini and Hochberg approach48. Furthermore, we calculated a microbial score by summing the relative abundance weighted by the coefficients representing the association between each of the 18 microbial species and the PHAS. We then investigated the associations between microbial score and PHAS using linear models adjusted for age, sex and BMI and further replicated this analysis with participants from the external validation cohort for whom multi-omics data were available.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Supplementary information
Supplementary Tables 1–19.
Source data
Statistical source data for Fig. 2.
Statistical source data for Fig. 3.
Statistical source data for Fig. 4.
Statistical source data for Fig. 5.
Statistical source data for Fig. 6.
Statistical source data for Fig. 7.
Acknowledgements
We thank the High-Performance Computing Center and High-Throughput Core Facility at Westlake University for assistance with computing and data generation. We thank the Westlake Education Foundation and Westlake Laboratory of Life Sciences and Biomedicine for support. Figure 1 was created with BioRender.com. This study was funded by the National Natural Science Foundation of China (92374112 to J.-S.Z., 82073546 to Y.-m.C, 82103828 to J.T., 82103826 to Y.F. and 82204161 to K.D.), the ‘Pioneer’ and ‘Leading goose’ R&D Program of Zhejiang (2024SSYS0032 to J.-S.Z. and 2024SSYS0035 to T.G.), Zhejiang Provincial Natural Science Foundation of China (LQ21H260002 to Y.F.), the Research Program of Westlake Laboratory of Life Sciences and Biomedicine (No. 202208012 to J.-S.Z.) and the National Key R&D Program of China (2022YFA1303900 to J.-S.Z. and 2021YFA1301600 to T.G.). The funders had no role in the study design, data collection and analysis, decision to publish or preparation of the manuscript.
Extended data
Author contributions
J.-S.Z., Y.-m.C. and T.G. contributed to the study conceptualization and design. J.T., L.Y. and F.X. contributed to the data analysis. L.Y., Y.X., F.X., X.C., Y.F., Z.M., W.G., W.H., Z.X., K.D., L.S., Z.J., M.S., X.L., C.X. and Y.X. collected data. J.T. and J.-S.Z. drafted the manuscript; J.-S.Z., Y.-m.C. and T.G. revised the manuscript; all authors read, edited and approved the final version of the manuscript. J.-S.Z., Y.-m.C. and T.G. are the guarantors of this work and, as such, had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Peer review
Peer review information
Nature Metabolism thanks Luigi Ferrucci, Manuel Mayr and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Christoph Schmitt, in collaboration with the Nature Metabolism team.
Data availability
The raw data for serum proteomics has been uploaded to iProX (https://www.iprox.cn/page/home.html) under accession numbers PXD039236, PXD039231, PXD038253 and IPX0008927000. Sex information for these raw proteome data is publicly available at https://github.com/nutrition-westlake/Longitudinal-serum-proteome-mapping-for-healthy-ageing. The raw data of metagenomic sequencing in this study have been deposited in the Genome Sequence Archive (GSA) (https://ngdc.cncb.ac.cn/gsa) under accession number CRA008796. The protein quantitative trait loci dataset used in this study is available in an interactive web resource at https://omics.lab.westlake.edu.cn/data/proteins. The DrugBank database can be consulted at https://go.drugbank.com. The metadata are available under restricted access because of participant consent and privacy regulations within our cohort. Access can be obtained upon reasonable request by contacting the corresponding authors. Source data are provided with this paper.
Code availability
All software and algorithms are mentioned where appropriate. Codes used for data analysis and presentation are publicly available at https://github.com/nutrition-westlake/Longitudinal-serum-proteome-mapping-for-healthy-ageing.
Competing interests
T.G. is a shareholder of Westlake Omics. The other authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Jun Tang, Liang Yue, Ying Xu, Fengzhe Xu.
Contributor Information
Tiannan Guo, Email: guotiannan@westlake.edu.cn.
Yu-ming Chen, Email: chenyum@mail.sysu.edu.cn.
Ju-Sheng Zheng, Email: zhengjusheng@westlake.edu.cn.
Extended data
is available for this paper at 10.1038/s42255-024-01185-7.
Supplementary information
The online version contains supplementary material available at 10.1038/s42255-024-01185-7.
References
- 1.Jylhävä, J., Pedersen, N. L. & Hägg, S. Biological age predictors. EBioMedicine21, 29–36 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Rutledge, J., Oh, H. & Wyss-Coray, T. Measuring biological age using omics data. Nat. Rev. Genet.23, 715–727 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.López-Otín, C., Blasco, M. A., Partridge, L., Serrano, M. & Kroemer, G. The hallmarks of aging. Cell153, 1194–1217 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.López-Otín, C., Blasco, M. A., Partridge, L., Serrano, M. & Kroemer, G. Hallmarks of aging: an expanding universe. Cell186, 243–278 (2023). [DOI] [PubMed] [Google Scholar]
- 5.Campisi, J. et al. From discoveries in ageing research to therapeutics for healthy ageing. Nature571, 183–192 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Williams, S. A. et al. Plasma protein patterns as comprehensive indicators of health. Nat. Med.25, 1851–1857 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Walker, K. A. et al. Large-scale plasma proteomic analysis identifies proteins and pathways associated with dementia risk. Nat. Aging1, 473–489 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Boucherat, O. et al. Identification of LTBP-2 as a plasma biomarker for right ventricular dysfunction in human pulmonary arterial hypertension. Nat. Cardiovasc. Res.1, 748–760 (2022). [DOI] [PubMed] [Google Scholar]
- 9.Niu, L. et al. Noninvasive proteomic biomarkers for alcohol-related liver disease. Nat. Med.28, 1277–1287 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Carrasco-Zanini, J. et al. Proteomic signatures for identification of impaired glucose tolerance. Nat. Med.28, 2293–2300 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Johnson, A. A., Shokhirev, M. N., Wyss-Coray, T. & Lehallier, B. Systematic review and analysis of human proteomics aging studies unveils a novel proteomic aging clock and identifies key processes that change with age. Ageing Res. Rev.60, 101070 (2020). [DOI] [PubMed] [Google Scholar]
- 12.Moaddel, R. et al. Proteomics in aging research: a roadmap to clinical, translational research. Aging Cell20, e13325 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Tanaka, T. et al. Plasma proteomic signature of age in healthy humans. Aging Cell17, e12799 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Lehallier, B. et al. Undulating changes in human plasma proteome profiles across the lifespan. Nat. Med.25, 1843–1850 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Ahadi, S. et al. Personal aging markers and ageotypes revealed by deep longitudinal profiling. Nat. Med.26, 83–90 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Hornburg, D. et al. Dynamic lipidome alterations associated with human health, disease and ageing. Nat. Metab.5, 1578–1594 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Oh, H. S.-H. et al. Organ aging signatures in the plasma proteome track health and disease. Nature624, 164–172 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Raudvere, U. et al. g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Res.47, W191–W198 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Wilkinson, D. J., Piasecki, M. & Atherton, P. J. The age-related loss of skeletal muscle mass and function: measurement and physiology of muscle fibre atrophy and muscle fibre loss in humans. Ageing Res. Rev.47, 123–132 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Groll, A. & Tutz, G. Variable selection for generalized linear mixed models by L1-penalized estimation. Stat. Comput.24, 137–154 (2014). [Google Scholar]
- 21.Austad, S. N. & Fischer, K. E. Sex differences in lifespan. Cell Metab.23, 1022–1033 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Hägg, S. & Jylhävä, J. Sex differences in biological aging with a focus on human studies. eLife10, e63425 (2021). [DOI] [PMC free article] [PubMed]
- 23.Folkersen, J. et al. Circulating levels of pregnancy zone protein: normal range and the influence of age and gender. Clin. Chim. Acta110, 139–145 (1981). [DOI] [PubMed] [Google Scholar]
- 24.Nijholt, D. A. et al. Pregnancy zone protein is increased in the Alzheimer’s disease brain and associates with senile plaques. J. Alzheimers Dis.46, 227–238 (2015). [DOI] [PubMed] [Google Scholar]
- 25.Ijsselstijn, L. et al. Serum levels of pregnancy zone protein are elevated in presymptomatic Alzheimer’s disease. J. Proteome Res.10, 4902–4910 (2011). [DOI] [PubMed] [Google Scholar]
- 26.Krämer, A., Green, J., Pollard, J. Jr & Tugendreich, S. Causal analysis approaches in Ingenuity Pathway Analysis. Bioinformatics30, 523–530 (2014). [DOI] [PMC free article] [PubMed]
- 27.Johnson, A. A. & Stolzing, A. The role of lipid metabolism in aging, lifespan regulation, and age-related disease. Aging Cell18, e13048 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Xu, F. et al. Genome-wide genotype-serum proteome mapping provides insights into the cross-ancestry differences in cardiometabolic disease susceptibility. Nat. Commun.14, 896 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Anderson, M. J. Permutational Multivariate Analysis of Variance (PERMANOVA) (Wiley StatsRef: Statistics Reference Online, 2017); 10.1002/9781118445112.stat07841
- 30.Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B Stat. Methodol.58, 267–288 (1996). [Google Scholar]
- 31.Candia, J. et al. Assessment of variability in the SOMAscan assay. Sci. Rep.7, 14248 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Haslam, D. E. et al. Stability and reproducibility of proteomic profiles in epidemiological studies: comparing the Olink and SOMAscan platforms. Proteomics22, 2100170 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Beard, J. R. et al. The World report on ageing and health: a policy framework for healthy ageing. Lancet387, 2145–2154 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Hou, Y. et al. Ageing as a risk factor for neurodegenerative disease. Nat. Rev. Neurol.15, 565–581 (2019). [DOI] [PubMed] [Google Scholar]
- 35.Fafián-Labora, J. A. & O’Loghlen, A. Classical and nonclassical intercellular communication in senescence and ageing. Trends Cell Biol.30, 628–639 (2020). [DOI] [PubMed] [Google Scholar]
- 36.Kim, M., Cai, Q. & Oh, Y. Therapeutic potential of alpha-1 antitrypsin in human disease. Ann. Pediatr. Endocrinol. Metab.23, 131–135 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Pearsey, H. M. et al. Zinc-alpha2-glycoprotein, dysglycaemia and insulin resistance: a systematic review and meta-analysis. Rev. Endocr. Metab. Disord.21, 569–575 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Chasapis, C. T., Ntoupa, P.-S. A., Spiliopoulou, C. A. & Stefanidou, M. E. Recent aspects of the effects of zinc on human health. Arch. Toxicol.94, 1443–1460 (2020). [DOI] [PubMed] [Google Scholar]
- 39.Vural, Z., Avery, A., Kalogiros, D. I., Coneyworth, L. J. & Welham, S. J. M. Trace mineral intake and deficiencies in older adults living in the community and institutions: a systematic review. Nutrients12, 1072 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Wilmanski, T. et al. Gut microbiome pattern reflects healthy ageing and predicts survival in humans. Nat. Metab.3, 274–286 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Ling, C.-W. et al. Cohort profile: Guangzhou Nutrition and Health Study (GNHS): a population-based multi-omics study. J. Epidemiol.34, 301–306 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Shen, B. et al. Proteomic and metabolomic characterization of COVID-19 patient sera. Cell182, 59–72.e15 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Zhang, Y. et al. Potential use of serum proteomics for monitoring COVID-19 progression to complement RT-PCR detection. J. Proteome Res.21, 90–100 (2021). [DOI] [PubMed] [Google Scholar]
- 44.Cai, X. et al. Population serum proteomics uncovers a prognostic protein classifier for metabolic syndrome. Cell Rep. Med.4, 101172 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Gillet, L. C. et al. Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis. Mol. Cell. Proteomics11, O111.016717 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Demichev, V., Messner, C. B., Vernardis, S. I., Lilley, K. S. & Ralser, M. DIA-NN: neural networks and interference correction enable deep proteome coverage in high throughput. Nat. Methods17, 41–44 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Shigemori, K., Ohgi, S., Okuyama, E., Shimura, T. & Schneider, E. The factorial structure of the Mini-Mental State Examination (MMSE) in Japanese dementia patients. BMC Geriatr.10, 36 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B Stat. Methodol.57, 289–300 (1995). [Google Scholar]
- 49.Liaw, A. & Wiener, M. Classification and regression by random forest. R News2, 18–22 (2002). [Google Scholar]
- 50.Sage, A. J., Genschel, U. & Nettleton, D. Tree aggregation for random forest class probability estimation. Stat. Anal. Data Min.13, 134–150 (2020). [Google Scholar]
- 51.Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y. & Morishima, K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res.45, D353–D361 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Fabregat, A. et al. The reactome pathway knowledgebase. Nucleic Acids Res.46, D649–D655 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Slenter, D. N. et al. WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research. Nucleic Acids Res.46, D661–D667 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet.25, 25–29 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Wishart, D. S. et al. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res.46, D1074–D1082 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Chen, L. et al. Influence of the microbiome, diet and genetics on inter-individual variation in the human plasma metabolome. Nat. Med.28, 2333–2343 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplementary Tables 1–19.
Statistical source data for Fig. 2.
Statistical source data for Fig. 3.
Statistical source data for Fig. 4.
Statistical source data for Fig. 5.
Statistical source data for Fig. 6.
Statistical source data for Fig. 7.
Data Availability Statement
The raw data for serum proteomics has been uploaded to iProX (https://www.iprox.cn/page/home.html) under accession numbers PXD039236, PXD039231, PXD038253 and IPX0008927000. Sex information for these raw proteome data is publicly available at https://github.com/nutrition-westlake/Longitudinal-serum-proteome-mapping-for-healthy-ageing. The raw data of metagenomic sequencing in this study have been deposited in the Genome Sequence Archive (GSA) (https://ngdc.cncb.ac.cn/gsa) under accession number CRA008796. The protein quantitative trait loci dataset used in this study is available in an interactive web resource at https://omics.lab.westlake.edu.cn/data/proteins. The DrugBank database can be consulted at https://go.drugbank.com. The metadata are available under restricted access because of participant consent and privacy regulations within our cohort. Access can be obtained upon reasonable request by contacting the corresponding authors. Source data are provided with this paper.
All software and algorithms are mentioned where appropriate. Codes used for data analysis and presentation are publicly available at https://github.com/nutrition-westlake/Longitudinal-serum-proteome-mapping-for-healthy-ageing.