Abstract
While previous studies identified common genetic variants associated with longevity in centenarians, the role of the rare loss-of-function (LOF) mutation burden remains largely unexplored. Here, we investigated the burden of rare LOF mutations in Ashkenazi Jewish individuals from the Longevity Genes Project and LonGenity study cohorts using whole-exome sequencing data. We found that centenarians had a significantly lower burden (11-22%) of LOF mutations compared to controls. Similar effects were also observed in their offspring. Gene-level burden analysis identified 35 genes with depleted LOF mutations in centenarians, with 14 of these validated in the UK Biobank. Mendelian randomization and multi-omic analyses on these genes identified RGP1, PCNX2, and ANO9 as longevity genes with consistent causal effects on multiple aging-related traits and altered expression during aging. Our findings suggest that a protective genetic background, characterized by a reduced burden of damaging variants, contributes to exceptional longevity, likely acting in concert with specific protective variants to promote healthy aging.
Subject terms: Genotype, Ageing, Genetic variation
Previous studies have identified common genetic variants linked to longevity, but the impact of rare damaging mutations remains unclear. Here, the authors show that centenarians carry fewer harmful loss-of-function mutations and identify genes that may contribute to extreme longevity and healthy aging
Introduction
Aging is a complex process characterized by an accumulation of molecular damage, progressive decline in physiological function, increased susceptibility to disease, and, ultimately, higher risk of mortality1. While chronological age is a major risk factor, there is remarkable variability in how individuals age, with some experiencing severe disability and premature death while others maintain good health well into old age2,3. This heterogeneity suggests that aging is a multifactorial process shaped by both genetic and environmental factors4.
At the extreme end of the lifespan spectrum are centenarians, individuals with exceptional longevity who have reached the age of 100 years or more. Centenarians represent a rare and valuable model of successful aging, often displaying delayed onset or escape from major age-related diseases such as cardiovascular disease, diabetes, and dementia5,6. Furthermore, many maintain physical and cognitive function, as well as independence, well into old age7. Understanding the factors that contribute to their exceptional longevity could provide valuable insights into the biology of healthy aging and lifespan determination.
Studies in model organisms have firmly established that lifespan has a significant genetic component. Single gene mutations in pathways related to insulin/insulin-like growth factor-1 (IGF-1) signaling, mechanistic target of rapamycin (mTOR) signaling, and AMP-activated protein kinase (AMPK) signaling have been shown to dramatically extend lifespan in yeast, worms, flies, and mice8. Many of these pathways are evolutionarily conserved, suggesting they may play a role in human aging as well. Indeed, functional variants in the IGF-1 receptor have been identified in centenarians, supporting a role for this pathway in exceptional longevity9.
In humans, genome-wide association studies (GWAS) have identified numerous common genetic variants associated with longevity, defined as attaining exceptional old age or having long-lived parents10,11. However, these variants explain only a small portion of the heritability (12%)11, suggesting that rare variants may also play an important role12. Rare variants, particularly those that lead to loss of gene function (LOF), are of great interest in studying human lifespan. LOF variants, including nonsense, splice-site, and frameshift mutations, are generally deleterious and subject to strong purifying selection13. An increased burden of LOF mutations has been observed in individuals with shorter lifespans and shorter period of life people spent free of disease (or healthspan), suggesting this may significantly impact human health14. However, LOF variants that confer protective effects, such as those in the APOC3 and PCSK9 genes associated with a lower risk of cardiovascular disease have also been identified15,16.
Despite the growing evidence for the importance of rare variants in aging, the overall burden of LOF mutations in exceptionally long-lived individuals compared to controls has not been systematically examined. A previous study observed no difference in the burden of pathogenic variants between centenarians, their offspring, and controls17. However, this study did not specifically focus on LOF variants or incorporate key covariates that may introduce batch effects confounding the results. Furthermore, the sample size was smaller than in the present study, limiting the power to detect significant differences. Another study found that the burden of rarest protein-truncating variants (PTVs) in two large cohorts was negatively associated with human healthspan and lifespan, accounting for 0.4 and 1.3 years of their variability, respectively14.
In this study, we leveraged whole-exome sequencing data from a large cohort of Ashkenazi Jewish centenarians and controls to comprehensively compare the burden of rare LOF variants (Fig. 1a). By focusing on a genetically homogeneous population, we minimized the potential confounding effects of population stratification. Importantly, we incorporated the dates of recruitment and birth as coefficients in our analysis to control for cohort effects and potential secular trends in environmental and lifestyle factors that may impact lifespan. Our results suggest that centenarians have a lower burden of LOF mutations compared to controls. This depletion was observed across multiple categories of predicted deleterious variants. Furthermore, we performed a genome-wide association study to identify specific genes and pathways that were enriched for protective variants in centenarians. Several genes reached suggestive significance levels, and pathway analysis revealed a depletion of variants in pathways related to hyaluronan metabolism, G-protein receptors, post-translational protein modification, and mitochondrial translation. Notably, 14 out of 35 of these gene associations were validated in an independent cohort from the UK Biobank based on parental lifespan-related traits, supporting the reproducibility of our findings. Together, these results provide new insights into the genetic architecture of human exceptional longevity and highlight potential molecular mechanisms that may contribute to healthy aging. Further studies will be necessary to validate and functionally characterize the roles of these genes and pathways in promoting longevity.
Results
The whole-exome sequencing data was obtained from 637 centenarians, 917 offspring of centenarians, and 595 controls from the Longevity Genes Project (LGP) and LonGenity study cohorts of Ashkenazi Jewish individuals (Table 1, Fig. 1a, Methods)18. Based on the demographic characteristics, participants were recruited continuously over a period of 20 years (2000–2020). However, the recruited centenarians were mostly born between 1900 and 1920, while most of the offspring and controls were born between 1920 and 1960 (Fig. 1b). This suggests that the direct comparison of mutation burden between centenarians and controls may potentially be confounded by the date of recruitment and date of birth.
Table 1.
Cohort | Sample size | Males | Females | Mean age at visit (age range, years) | Standard deviation of age at visit (years) |
---|---|---|---|---|---|
Longevity Genes Project Control | 224 | 125 | 99 | 70.99 (42.26-93.33) | 9.85 |
Longevity Genes Project Control (Filtered) | 147 | 84 | 63 | 66.27 (42.26-84.51) | 7.96 |
Longevity Genes Project Offspring | 473 | 246 | 227 | 67.53 (42.88-92.45) | 7.98 |
Longevity Genes Project Proband (Centenarians) | 637 | 464 | 173 | 97.70 (84.45-110.10) | 3.35 |
Longevity Genes Project Proband (Centenarians, Filtered) | 338 | 249 | 89 | 99.46 (84.45-110.10) | 3.27 |
LonGenity Study Offspring of Parents with Exceptional Longevity | 444 | 265 | 179 | 74.20 (61.90-94.08) | 6.13 |
LonGenity Study Offspring of Parents with Usual Survival | 371 | 196 | 175 | 76.32 (64.67-97.93) | 7.02 |
LonGenity Study Offspring of Parents with Usual Survival (Filtered) | 273 | 141 | 132 | 73.34 (64.67-87.33) | 5.22 |
The table presents the sample size, number of males and females, female proportion, mean age at visit, and standard deviation of age at visit for each cohort in the Longevity Genes Project and LonGenity Study.
We identified loss-of-function (LOF) mutations based on the following criteria: alternate allele frequency (AAF) < 1%, Hardy–Weinberg equilibrium (HWE) threshold of 10−15, and variant missingness <10%. We classified the variants into different categories based on their predicted deleteriousness: pLOF only, pLOF and missense, pLOF and predicted deleterious missense (5/5 algorithms predict a deleterious variant), and pLOF and predicted deleterious missense (at least 1/5 algorithms predict a deleterious variant). The deleteriousness of missense variants was assessed using five different computational methods (Method).
We counted the cumulative mutation burden in centenarians, their offspring, and controls across different categories of predicted deleterious variants. Consistent with the potential confounding effects of dates of recruitment and birth, we initially observed a similar distribution of LOF mutations across the different categories in centenarians and controls (Fig. 1c). We performed quality control and filtering, retaining 338 centenarians with recorded age over 100 years old and 420 controls with age less than 90 years old (Table 1, Methods). We observed a similar distribution of raw mutation count after filtering (Supplementary Fig. 1) We then performed the count-based burden test using linear regression models. Furthermore, we found that even without adjusting for potential confounders, offspring but not centenarians showed a significantly lower mutation burden in all pLOF categories (Supplementary Fig. 2). This is likely due to the smaller batch effect between offspring group and control, compared to the centenarian group. We also showed that there is no significant difference observed between centenarians and their offspring (Supplementary Fig. 3).
To account for these potential confounders, we binned the dates of recruitment and birth and added them as coefficients in the burden test model. After adjusting for these covariates, we found a consistent and significant trend of lower burden of LOF mutations in centenarians and their offspring compared to controls across all categories of predicted deleterious variants (Fig. 2). Notably, the depletion of LOF variants was statistically significant for centenarians in all categories, including the pLOF-only category (b = −5.5, p = 0.0453). The effect sizes for centenarians ranged from −5.5 to −39.6, indicating a 11% to 22% reduction in mutation burden compared to controls. Furthermore, the offspring of centenarians also exhibited a significantly lower mutation burden compared to controls in both the LGP (related to LGP centenarians) and LonGenity (unrelated to LGP centenarians) cohorts (Fig. 2). The effect sizes for offspring were smaller than those observed for centenarians, but still significant, with p-values ranging from 1.17e-07 to 4.99e-4 in the LGP cohort and from 4.52e-4 to 0.021 in the LonGenity cohort. These results suggest that the protective effect of a lower LOF mutation burden may be inherited by the offspring of centenarians, contributing to their increased likelihood of exceptional longevity.
We also performed a sensitivity analysis by using different covariates, including age at recruitment, top 10 genetic principal components, numerical date of birth, and date of recruitment, and found consistent results for centenarian offspring in the LGP cohort (Supplementary Fig. 4). Statistical significance for centenarians and the LonGenity cohort was sensitive to the choice of covariates, suggesting that the genetic associations with longevity are complex and possibly influenced by unmeasured factors.
To identify specific genes and pathways that carry a lower mutation burden in centenarians, we performed a gene-level and pathway-level burden test. The gene-level analysis identified 35 genes that reached the significance level at FDR < 0.05 (Fig. 3a). Remarkably, 14 out of these 35 genes were validated in an independent study from the UK Biobank using parental lifespan-related traits (Fig. 3a)19. Note that this is an indirect validation as the genetics of exceptional longevity and parental lifespan, while having similarities, may still obtain different characteristics. Pathway-level analysis revealed processes related to hyaluronan metabolism, Class A/1 (Rhodopsin-like receptors), post-translational protein modification, and mitochondrial translation reached the significance level at FDR < 0.05 (Fig. 3b). We observed a mild inflation in our test statistics, with a genomic inflation factor (λ) of 1.57. After adjusting for the inflation, the top three pathways still reached the suggestive FDR threshold of 0.2. These results suggest that the depletion of mutations in these pathways may contribute to exceptional longevity.
To further investigate the potential causal effects of the identified longevity-associated genes on lifespan-related traits, we performed Mendelian Randomization (MR) analyzes using public blood gene expression QTL data from eQTLgen and GWAS summary statistics of multiple lifespan-related traits (Fig. 4a)20. It is important to note that while the MR analysis uses common variants (eQTLs) rather than rare coding variants, it can provide complementary evidence about a gene’s role in longevity through different mechanisms. MR analysis revealed that seven genes had significant causal effects on multiple lifespan-related traits, such as frailty index, healthspan, lifespan, and extreme longevity (90th and 99th percentiles), and lifespan-GIP1 (the genetic principal component of healthy longevity, Methods). Among them, three genes (RGP1, PCNX2, and ANO9) showed consistent pro-longevity effects across the multiple traits tested, supporting their potential roles in promoting longevity as suggested by burden analysis, while the other four genes showed anti-longevity effects. On the other hand, two of the genes (DYNC1H1 and GALNT12) only show a significant protective effect on one trait (lifespan and extreme longevity at 99th percentile), while PKP4 only shows a significant positive effect on healthspan but not in other traits. The other four genes (ZNF446, PLA2G4B, EFNA3, and ABCF3) show inconsistent effects on lifespan-related traits.
We then profiled the multi-omic associations of the identified longevity-associated genes to provide a systematic evaluation of their expression and regulation during aging (Fig. 4b–e). Comparison with exome-wide gene-level associations with parental lifespan obtained from GeneBass (Fig. 4b)19, showed that six out of seven causal genes were significantly associated with parental lifespan, three genes (MLXIP, PCNX2, and DYNC1H1) remain significant after corrected with multiple-testing with FDR. Analysis of age-related changes in promoter DNA methylation using data from 500 individuals in the Massachusetts General Brigham (MGB) biobank (Fig. 4c) revealed significant changes for most longevity-associated genes, except two (RGP1 and BCLAF1). Similarly, age-related changes in blood gene expression obtained from the transcriptome-wide association study (TWAS) for aging by Peters et al. (Fig. 4d) showed significant changes for genes such as OPN3, PCNX2, GALNT12, and RGP121. Furthermore, age-related changes in plasma protein levels using Olink data from 53,015 UK Biobank participants (Fig. 4e) revealed significant changes for proteins encoded by DYNC1H1 and FLT4 genes. The results suggest that the expression and regulation of these longevity-associated genes are altered during the aging process.
To gain further insights into the potential relevance of the identified longevity-associated genes in aging and interventions, we further compared their significance scores across different signatures of aging and longevity interventions (Fig. 4f)22,23. The signature analysis results in 69 significant associations after adjusting for multiple testing of 266 tests using FDR (Fig. 4f). It revealed that many of these genes (18 out of 21 tested) were also significantly associated with aging in humans and rodents, as well as with interventions known to extend lifespan, such as caloric restriction (ABCF3, CKAP2L, and CEP68), rapamycin treatment (PKP4, CTNND1, and RTRAF), growth hormone deficiency (HOGA1, ANKRD33, and MLXIP), as well as overall lifespan after intervention (HOGA1). Together, this multi-layered evidence supports the potential roles of these genes in regulating healthy aging and longevity.
Discussion
In this study, we have discovered that centenarians, within the large cohort we examined, possess a significantly lower burden of predicted deleterious LOF variants compared to controls. This finding suggests that a protective genetic background, characterized by the depletion of damaging coding mutations, contributes to the exceptional longevity of centenarians. Notably, we also observed a lower mutation burden in centenarian offspring, although the effect was less pronounced. These findings support the notion of a heritable component to longevity outside of protective and common variants and suggest that the combined genetic background, including protective variants and depletion of damaging variants, may be transmitted across generations to support exceptional longevity.
Our results are consistent with previous studies that reported an increased burden of LOF variants in individuals with shorter lifespans and age-related diseases14,24, and provide further evidence for the role of rare coding variants in extreme human longevity. Our study extends these findings by demonstrating that the depletion of LOF variants in centenarians is not limited to the rarest variants but is observed across multiple categories of predicted deleterious variants. However, our findings contrast with those of another study that observed no difference in the burden of pathogenic variants between centenarians, their offspring, and controls17. This discrepancy may be due to differences in study design, such as the focus on LOF variants specifically, the larger sample size of our study, and the adjustment for potential confounding factors such as date of recruitment, age at recruitment, and date of birth. Besides, due to the retrospective nature of the centenarian study, the centenarians usually have different demographic properties (age, date of birth, and potentially other early life exposures) compared to the control group. While this can be addressed by including these features as covariates, this demographic disparity between centenarians and controls emerges as a critical factor limiting the statistical power of centenarian studies (Fig. 1). In contrast, centenarian offspring, demographically more similar to controls, yield stronger statistical evidence, corroborating our findings in centenarians. Future prospective studies with improved demographic matching are essential to elucidate the role of LOF variants in exceptional longevity.
Our pathway analysis revealed that centenarian exomes are depleted of LOF variants in several pathways related to aging and disease, including Class A/1 (Rhodopsin-like receptors), hyaluronan metabolism, post-translational protein modification, and mitochondrial translation. Class A/1 (Rhodopsin-like) receptors are involved in various physiological processes and have been implicated in age-related diseases, suggesting their potential role in longevity25. Hyaluronan is a key component of the extracellular matrix that has been shown to decline with age, and its increase contributes to the extension of lifespan26. Variants that maintain hyaluronan homeostasis may, therefore, promote healthy aging in humans. Post-translational protein modifications play crucial roles in protein function and stability, and their dysregulation has been associated with various age-related diseases1. Mitochondrial translation has also been linked to lifespan extension in model organisms27.
To complement our analysis of rare LOF variants, we also investigated the causal role of identified longevity genes in aging-related traits using MR analyzes. This approach allows us to infer potential causal relationships between gene expression and phenotypes of interest by using eQTLs (common variants that are associated with gene expression) as instrumental variables. Our MR analyzes provided evidence for the causal effects of several longevity-associated genes, including RGP1, PCNX2, and ANO9, on multiple aging-related traits. PCNX2 was identified to be associated with longevity in an independent GWAS study28, while ANO9 was associated with various cancers29. These findings suggest that these genes may directly influence the aging process and contribute to the extended healthspan and lifespan. The consistent causal effect estimates across different aging-related traits further support the robustness of these associations. Interestingly, our analyzes also revealed genes with more nuanced effects on longevity. For instance, DYNC1H1 and GALNT12 showed significant deleterious effects on only one trait each (lifespan and extreme longevity at the 99th percentile, respectively), while PKP4 demonstrated a significant positive effect solely on healthspan. This suggests that these genes may influence particular aspects of the aging process rather than having a broad impact on all longevity-related traits. Moreover, the inconsistent effects observed for genes such as ZNF446, PLA2G4B, EFNA3, and ABCF3 across different lifespan-related traits underscore the complexity of genetic influences on aging and longevity. The multi-omic analyzes revealed that the expression and regulation of many longevity-associated genes are altered during aging, specifically, 29 out of 31 for DNA methylation, 4 out of 11 for gene expression, and 2 out of 2 for plasma protein (Fig. 4). Follow-up studies are needed to elucidate the specific mechanisms by which these genes and their encoded proteins contribute to healthy aging and longevity.
Future studies could also explore the relationship between the burden of deleterious germline mutations and the rate of biological aging in centenarians and the general population. Epigenetic clocks, which measure biological age based on DNA methylation patterns, have emerged as a promising tool for assessing the pace of aging30,31. Previous studies have shown that centenarians exhibit slower epigenetic aging rates compared to the general population32. Integrating rare variant burden data with epigenetic clock measures could provide novel insights into the interplay between genetic and epigenetic factors in shaping the rate of aging and exceptional longevity, especially with current standardized tools like ClockBase and Biolearn33,34, as well as advanced aging clocks, including GrimAge235, DunedinPace36, and causality-enriched clocks37. Such studies may uncover whether the reduced burden of harmful mutations observed in centenarians contributes to their slower biological aging rates.
Our study also has several limitations. First, while we adjusted for several important covariates, there may be other confounding factors that were not accounted for, such as environmental exposures and lifestyle factors. Second, our study focused on a specific population (Ashkenazi Jews), although validation analysis in UK biobank suggests that the result may be generalizable to other ethnic groups. Future studies in diverse populations will be necessary to confirm the generalizability of our findings. Third, the validation analysis is based on parental lifespan traits in the UK biobank. Although previous studies on common variants show a substantial similarity between parental lifespan and exceptional longevity (rg = 0.81)38, it is unclear how similar the rare genetic variants contribute to these two traits. Future validation and meta-analysis with other centenarian cohorts may help strengthen the robustness of our findings. Fourth, our study relied on computational predictions of variant deleteriousness, which may not always reflect the true biological impact of a variant. Functional studies will be necessary to validate the causal roles of the identified variants and genes in longevity.
It is important to acknowledge that some LOF and missense variants can be protective, as demonstrated by previous studies39–41. However, our hypothesis is that the overall probability of LOF variants being protective is lower than the probability of them being deleterious. This is because damaging a component in a complex system is more likely to have a detrimental effect than a protective one42. Additionally, there is a selection bias, as highly damaging mutations are under-represented in the population, while highly protective mutations are preserved43. These factors may explain the small effect sizes observed in our study. It should also be noted that we did not identify any protective LOF variants (i.e., enrichment of LOF variants in centenarians) as demonstrated in previous study18, because we used a one-tailed test, focusing only on the depletion of LOF variants.
In conclusion, our study provides new insights into the genetic architecture of human exceptional longevity, exemplified by individuals who live to 100 years or beyond, highlighting the importance of rare LOF variants and identifying novel genes and pathways that may promote healthy aging. We demonstrate that centenarians have a lower burden of predicted deleterious LOF variants compared to controls and that this protective genetic background may be transmitted across generations. Our findings also underscore the complex interplay between genetic variation, environmental factors, and age-related diseases in shaping human lifespan. Further studies in diverse populations and integrating multiple omics data will be necessary to fully elucidate the mechanisms underlying exceptional longevity and develop targeted interventions to promote healthy aging. Nonetheless, our results represent an important step towards understanding the genetic basis of human longevity and provide a foundation for future studies in this field.
Methods
Study population and data collection
The study population was derived from two ongoing studies of aging and longevity in the Ashkenazi Jewish population: the cross-sectional Longevity Genes Project (LGP) and the longitudinal LonGenity study18. The LGP cohort consisted of 637 individuals with exceptional longevity, 473 offspring of long-lived individuals, and 224 controls, while the LonGenity cohort included 444 offspring of centenarians and 371 controls. All participants provided written informed consent, and the study was approved by the Institutional Review Board at Albert Einstein College of Medicine.
For the analysis, we applied filtering criteria to ensure the inclusion of appropriate individuals in each group. In the centenarian group, we removed individuals with a death or dropout record before 100 years, retaining 338 exceptionally long-lived centenarians. Similarly, in the control group, we removed individuals without death or dropout record before 90 years, resulting in 147 individuals from the LGP cohort and 273 individuals from the LonGenity cohort being included in the analysis (Table 1).
Whole-exome sequencing
DNA samples from all participants were subjected to whole-exome sequencing using the Illumina HiSeq 2000 platform at the Regeneron Genetics Center17. The sequencing reads were aligned to the human reference genome (hg38) using the Burrows-Wheeler Aligner (BWA-mem v0.7.17)44, and duplicate reads were removed using Picard tools (version 1.96, http://broadinstitute.github.io/picard/). Variant calling was performed using the Genome Analysis Toolkit (GATK v3.7)45.
Quality control and variant annotation
After genomic principal component analysis (PCA), four individuals with non-European ancestry were excluded from the study. Quality control filtering was applied to remove potentially false-positive variants and genotype calls. Variants were filtered based on the following criteria: alternate allele frequency (AAF) < 1% in the Ashkenazi Jewish population, Hardy–Weinberg equilibrium (HWE) P-value > 10−15, and variant missingness <10%, as suggested by a previous study46. After QC filtering, autosomal-only variants with a minimum allele count (MAC) of 1 were divided into sets for centenarians, offspring, and controls for downstream analysis.
Loss-of-function (LOF) variants were defined as nonsense, splice-site, or frameshift mutations. Missense variants were classified as (1) possible deleterious missense mutation if they were predicted to be damaging by at least 1 out of 5 algorithms (SIFT47, Polyphen2_HDIV48, Polyphen2_HVAR48, LRT49, and MutationTaster50) or (2) deleterious missense mutation if all five algorithms predicted them to be damaging. SIFT (v6.2.1), Polyphen2_HDIV (v2.2.2), Polyphen2_HVAR (v2.2.2), LRT (v2016), and MutationTaster (v2021) were used in this analysis.
Burden test analysis
Prior to the burden test, we removed the individual in the extreme longevity group with a lifespan or last reported age less than 100 years old. Therefore, only the 338 centenarians are kept. Similarly, individuals in the control group with last reported age larger than 90 years old were also removed, with the remaining 147 individuals from LGP and 273 individuals for lonGenity (Table 1).
Descriptive statistics were used to summarize the demographic characteristics of the study population. The cumulative mutation burden for each individual was calculated as the total number of population-level LOF (pLOF) and predicted deleterious missense variants. Mutation burden is calculated based on different categories of predicted deleterious variants (pLOF only, pLOF and deleterious missense [5/5 algorithms], and pLOF and possible deleterious missense [≥1/5 algorithms], pLOF and all missense).
Count-based burden tests were performed using linear models with binned covariates to account for potential confounding factors, such as date of recruitment, date of birth, gender, age at visit, and top four genomic principal components51. The cumulative mutation burden was used as the dependent variable, and the independent variables included centenarian status (or offspring status), binned date of recruitment, and binned date of birth. Sensitivity analyzes were conducted by including additional covariates, such as age at recruitment, top 10 genetic principal components, numerical date of birth (i.e., number of days since 1900-01-01), and date of recruitment.
Gene-level and pathway-level burden analysis
Gene-level and pathway-level burden tests were performed using linear models, with the cumulative mutation burden in each gene or pathway as the dependent variable and centenarian status as the independent variable. Only genes containing at least five pLOF variants across the cohort were included. In total, 4925 unique genes were tested, and the significance threshold for gene-level tests was set at FDR < 0.05. Significant gene-level associations were replicated using summary statistics from a gene-based association study of paternal or maternal lifespan in the GeneBass from UK biobank19. The significance threshold for replication was set at P < 0.05.
Mendelian randomization
To investigate the causal relationships between gene expression and aging-related traits, we performed Mendelian Randomization (MR) analyzes using blood cis-eQTL data from eQTLgen, which includes 31,684 blood samples from 37 studies20. The outcome traits included aging-GIP1, frailty index, healthspan, lifespan, and extreme longevity (90th and 99th percentiles).
The parental lifespan GWAS was used as a proxy for individual lifespan and included 512,047 mothers and 500,193 fathers of European ancestry11. The extreme longevity GWAS included 11,262 European subjects with a lifespan above the 90th percentile and 25,483 controls below the 60th percentile age10. Healthspan, defined as the age of the first incidence of major age-related diseases or death, was analyzed using a GWAS of 300,447 UK Biobank participants aged 37–7352. The frailty index GWAS included 164,610 UK Biobank participants aged 60–70 and 10,616 Swedish TwinGene participants aged 41–8753. Aging-GIP1, the first genetic principal component of six human aging traits, captures both length of life and well-being indices54.
We performed cis-Mendelian Randomization following the approach described by Ying et al37. Genetic variants strongly associated with whole blood gene expression levels (FDR < 0.05) were selected as instrumental variables for the MR analysis. To minimize pleiotropic effects, only cis-eQTLs (located within 2 MB of target genes) were used, and LD clumping was applied to remove eQTLs with strong LD (r2 > 0.3). We employed three MR methods based on the number of available eQTLs: Wald ratio for a single eQTL, generalized inverse variance weighted (gIVW) for at least two eQTLs, and generalized MR-Egger regression (gEgger) for at least three eQTLs55. The gEgger method is robust to directional pleiotropy, we therefore reported the P value from gEgger if pleiotropy is detected by gEgger intercept.
Multi-omic analysis of the identified longevity-associated genes
To systematically evaluate the expression and regulation of the identified longevity-associated genes during aging, we profiled their multi-omic associations using various datasets. We obtained the exome-wide gene association with parental lifespan using summary statistics from GeneBass19. Blood gene expression changes with age were obtained from the transcriptome-wide association study (TWAS) for aging by Peters et al21.
Age-related changes in promoter DNA methylation were assessed using data from 500 individuals in the Mass General Brigham (MGB) Biobank, which is also described in this study56. DNA methylation profiles were generated using the Illumina Infinium MethylationEPIC v2.0 array, which covers over 935,000 CpG sites enriched for regulatory regions56. The cohort comprised subjects of diverse ages, roughly balanced between male and female, and generally representative of the racial/ethnic distribution of the local area. For each CpG site associated with our identified longevity-associated genes, we performed a linear regression to predict the methylation beta value using age, where the regression coefficient and p-value are calculated. The CpG with the strongest association with age is used to represent the result.
Age-related changes in plasma protein levels were investigated using Olink proteomics data from 53,015 UK Biobank participants (UK Biobank Record Table 1072). Only two of our identified longevity-associated genes are presented in the Olink panel. We then performed a linear regression to predict the protein level using age, where the regression coefficient and p-value are calculated. FDR was applied to adjust for multiple testing of all 471 sites tested.
We performed FDR to adjust for multiple tests in each omic layer.
Longevity signature analysis
To further explore the potential relevance of the identified longevity-associated genes in aging and interventions, we compared their significance scores across different signatures of aging and longevity interventions using the GENtervention database57. For transcriptomic signatures of lifespan-extending interventions, we selected the ones reflecting the most established longevity interventions that were identified based on gene expression data from at least 3 independent sources, as described in Tyshkovskiy et al. 201923. The signatures included human aging and rodent aging, and interventions (caloric restriction, rapamycin treatment, and growth hormone deficiency). We also include signatures of lifespan across interventions based on a larger set of longevity and lifespan-shortening interventions22. The significance scores were calculated as the -log10(P-value) multiplied by the sign of the effect size (beta) for each gene in each signature. Nominal significance was set at P < 0.05. Hierarchical clustering with Euclidean distance was performed for the genes based on significance score.
Statistics & reproducibility
The study included a total of 2149 participants: 338 centenarians (aged 100 or older), 917 offspring of long-lived individuals, and 894 controls. Detailed age and sex/gender breakdowns for each group are provided in Table 1. Sex and gender were considered in the study design and determined based on self-report at the time of recruitment. All participants provided written informed consent as stated in the “Study population and data collection” section. Participants were not compensated for their involvement in the study. No statistical method was used to predetermine the sample size. Data exclusion criteria are detailed in the “Study population and data collection” section. No other data were excluded from the analyzes. Statistical analyzes primarily employed linear models for burden tests and Mendelian Randomization, with adjustments for potential confounding factors as described in the “Burden test analysis” and “Mendelian Randomization” sections. Multiple testing corrections were applied using FDR. The experiments were not randomized, and the investigators were not blinded to allocation during experiments and outcome assessment, as this was an observational genetic study. Reproducibility was addressed through replication in independent datasets (UK Biobank).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Supplementary information
Source data
Acknowledgements
We thank members of the Gladyshev laboratory for the discussions. This study was supported by NIH R01 AG064223 to V.N.G., and R01AG061155 and P01AG017242 to N.B. K.Y. was supported by NIH F99AG088431. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Author contributions
V.N.G. and K.Y. conceived the project. K.Y. conducted the main data analysis. J.P.C., A.V.S., A.T., M.M., and L.J.E.G. assisted with data analysis. S.M. and Z.D.Z. provided clinical samples and data. N.B. supervised the clinical aspects of the study. V.N.G. supervised the project. K.Y. and V.N.G. drafted the manuscript with input from all authors. All authors reviewed and approved the final version of the manuscript.
Peer review
Peer review information
Nature Communications thanks Harold Bae and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Data availability
All summary statistics for the gene- and pathway-based burden tests in the Ashkenazi Jewish longevity cohort are available in Supplementary Data 1, Supplementary Data 2, and Source Data files. The individual-level genetic data from the Einstein longevity study are available under restricted access due to privacy concerns of research participants. Qualified academic investigators (typically faculty members or postdoctoral researchers with relevant expertize) can request access by contacting Dr. Nir Barzilai (nir.barzilai@einsteinmed.edu) and the study’s principal investigator, Dr. Vadim Gladyshev (vgladyshev@rics.bwh.harvard.edu). We aim to respond to all requests within 10 business days. Access is subject to approval by the Institutional Review Board and requires a material transfer agreement. Upon approval, data use will be restricted by a comprehensive data use agreement that includes conditions such as using the data solely for the approved research purpose, maintaining participant anonymity, and acknowledging the Einstein longevity study in any resulting publications. Exact procedures for data transfer will be provided upon approval. The UK Biobank data used for validation is available through application to the UK Biobank (https://www.ukbiobank.ac.uk/). Summary statistics from the eQTLGen consortium are publicly available at https://www.eqtlgen.org/. The GeneBass exome-wide association results are publicly accessible at https://genebass.org/. Other publicly available datasets used in this study include: parental lifespan GWAS summary statistics (https://datashare.ed.ac.uk/handle/10283/3209), healthspan GWAS summary statistics (https://www.gwasarchive.org/), frailty index GWAS summary statistics (https://figshare.com/articles/dataset/Genome-Wide_Association_Study_of_the_Frailty_Index_-_Atkins_et_al_2019/9204998), longevity GWAS summary statistics (https://www.longevitygenomics.org/downloads/). Source data are provided with this paper.
Code availability
All of the analyzes are done in R 4.1. The custom code used for the burden test analysis and gene-level and pathway-level burden analysis is available at 10.5281/zenodo.13756349 with a detailed readme file58. Other software used in our analysis was open source and is described in the Methods section of the manuscript.
Competing interests
After the initiation of this project, A.V.S. had a change in employment status (Retro Biosciences). Analysis work was completed before this employment change. The other authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-024-52967-2.
References
- 1.López-Otín, C., Blasco, M. A., Partridge, L., Serrano, M. & Kroemer, G. The hallmarks of aging. Cell153, 1194–1217 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Beard, J. R. et al. The World report on ageing and health: a policy framework for healthy ageing. Lancet387, 2145–2154 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Lowsky, D. J., Olshansky, S. J., Bhattacharya, J. & Goldman, D. P. Heterogeneity in healthy aging. J. Gerontol. A. Biol. Sci. Med. Sci.69, 640–649 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Melzer, D., Pilling, L. C. & Ferrucci, L. The genetics of human ageing. Nat. Rev. Genet.21, 88–101 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Andersen, S. L., Sebastiani, P., Dworkis, D. A., Feldman, L. & Perls, T. T. Health span approximates life span among many supercentenarians: compression of morbidity at the approximate limit of life span. J. Gerontol. A. Biol. Sci. Med. Sci.67A, 395–405 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Milman, S. & Barzilai, N. Discovering biological mechanisms of exceptional human health span and life span. Cold Spring Harb. Perspect. Med.13, a041204 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Leung, Y. et al. Cognition, function, and prevalent dementia in centenarians and near-centenarians: An individual participant data (IPD) meta-analysis of 18 studies. Alzheimers Dement19, 2265–2275 (2023). [DOI] [PubMed] [Google Scholar]
- 8.Kenyon, C. J. The genetics of ageing. Nature464, 504–512 (2010). [DOI] [PubMed] [Google Scholar]
- 9.Suh, Y. et al. Functionally significant insulin-like growth factor I receptor mutations in centenarians. Proc. Natl Acad. Sci.105, 3438–3442 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Deelen, J. et al. A meta-analysis of genome-wide association studies identifies multiple longevity genes. Nat. Commun.10, 3669 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Timmers, P. R. et al. Genomics of 1 million parent lifespans implicates novel pathways and common diseases and distinguishes survival chances. eLife8, e39856 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Kaplanis, J. et al. Quantitative analysis of population-scale family trees with millions of relatives. Science360, 171–175 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature536, 285–291 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Shindyapina, A. V. et al. Germline burden of rare damaging variants negatively affects human healthspan and lifespan. eLife9, e53449 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Ference, B. A. et al. Variation in PCSK9 and HMGCR and risk of cardiovascular disease and diabetes. N. Engl. J. Med.375, 2144–2153 (2016). [DOI] [PubMed] [Google Scholar]
- 16.Jørgensen, A. B., Frikke-Schmidt, R., Nordestgaard, B. G. & Tybjærg-Hansen, A. Loss-of-function mutations in APOC3 and risk of ischemic vascular disease. N. Engl. J. Med.371, 32–41 (2014). [DOI] [PubMed] [Google Scholar]
- 17.Gutman, D. et al. Similar burden of pathogenic coding variants in exceptionally long‐lived individuals and individuals without exceptional longevity. Aging Cell19, e13216 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Lin, J.-R. et al. Rare genetic coding variants associated with human longevity and protection against age-related diseases. Nat. Aging1, 783–794 (2021). [DOI] [PubMed] [Google Scholar]
- 19.Karczewski, K. J. et al. Systematic single-variant and gene-based association testing of thousands of phenotypes in 394,841 UK Biobank exomes. Cell Genomics2, 100168 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Võsa, U. et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genet.53, 1300–1310 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Peters, M. J. et al. The transcriptional landscape of age in human peripheral blood. Nat. Commun.6, 8570 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Tyshkovskiy, A. et al. Transcriptomic Hallmarks of Mortality Reveal Universal and Specific Mechanisms of Aging, Chronic Disease, and Rejuvenation. 2024.07.04.601982 Preprint at 10.1101/2024.07.04.601982 (2024).
- 23.Tyshkovskiy, A. et al. Identification and application of gene expression signatures associated with lifespan extension. Cell Metab.30, 573–593.e8 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Liu, J. Z. et al. The burden of rare protein-truncating genetic variants on human lifespan. Nat. Aging2, 289–294 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Lagunas-Rangel, F. A. G protein-coupled receptors that influence lifespan of human and animal models. Biogerontology23, 1–19 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Zhang, Z. et al. Increased hyaluronan by naked mole-rat Has2 improves healthspan in mice. Nature621, 196–205 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Houtkooper, R. H. et al. Mitonuclear protein imbalance as a conserved longevity mechanism. Nature497, 451–457 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Sebastiani, P. et al. Genetic signatures of exceptional longevity in humans. PloS One7, e29848 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Jun, I. et al. ANO9/TMEM16J promotes tumourigenesis via EGFR and is a novel therapeutic target for pancreatic cancer. Br. J. Cancer117, 1798–1809 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Moqri, M. et al. Biomarkers of aging for the identification and evaluation of longevity interventions. Cell186, 3758–3775 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Moqri, M. et al. Validation of biomarkers of aging. Nat. Med. 1–13 (2024) 10.1038/s41591-023-02784-9.
- 32.Daunay, A. et al. Centenarians consistently present a younger epigenetic age than their chronological age with four epigenetic clocks based on a small number of CpG sites. Aging14, 7718–7733 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Ying, K. et al. A Unified Framework for Systematic Curation and Evaluation of Aging Biomarkers. 2023.12.02.569722 Preprint at 10.1101/2023.12.02.569722 (2024).
- 34.Ying, K. et al. ClockBase: a comprehensive platform for biological age profiling in human and mouse. Preprint at 10.1101/2023.02.28.530532 (2023).
- 35.Lu, A. T. et al. DNA methylation GrimAge version 2. Aging14, 9484–9549 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Belsky, D. W. et al. DunedinPACE, a DNA methylation biomarker of the pace of aging. eLife11, e73420 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Ying, K. et al. Causality-enriched epigenetic age uncouples damage and adaptation. Nat. Aging 1–16 (2024) 10.1038/s43587-023-00557-0. [DOI] [PMC free article] [PubMed]
- 38.Timmers, P. R. H. J., Wilson, J. F., Joshi, P. K. & Deelen, J. Multivariate genomic scan implicates novel loci and haem metabolism in human ageing. Nat. Commun.11, 3570 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Freudenberg-Hua, Y. et al. Disease variants in genomes of 44 centenarians. Mol. Genet. Genom. Med.2, 438–450 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Ryu, S. et al. Genetic signature of human longevity in PKC and NF-κB signaling. Aging Cell20, e13362 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Simon, M. et al. A rare human centenarian variant of SIRT6 enhances genome stability and interaction with Lamin A. EMBO J.42, e113326 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Trajanovski, S., Martín-Hernández, J., Winterbach, W. & Van Mieghem, P. Robustness envelopes of networks. J. Complex Netw.1, 44–62 (2013). [Google Scholar]
- 43.Bamshad, M. J. et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nat. Rev. Genet.12, 745–755 (2011). [DOI] [PubMed] [Google Scholar]
- 44.Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinforma. Oxf. Engl.25, 1754–1760 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res20, 1297–1303 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Anderson, C. A. et al. Data quality control in genetic case-control association studies. Nat. Protoc.5, 1564–1573 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Ng, P. C. & Henikoff, S. Predicting deleterious amino acid substitutions. Genome Res11, 863–874 (2001). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Adzhubei, I. A. et al. A method and server for predicting damaging missense mutations. Nat. Methods7, 248–249 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Chun, S. & Fay, J. C. Identification of deleterious mutations within three human genomes. Genome Res19, 1553–1561 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Schwarz, J. M., Cooper, D. N., Schuelke, M. & Seelow, D. MutationTaster2: mutation prediction for the deep-sequencing age. Nat. Methods11, 361–362 (2014). [DOI] [PubMed] [Google Scholar]
- 51.Lee, S., Abecasis, G. R., Boehnke, M. & Lin, X. Rare-variant association analysis: study designs and statistical tests. Am. J. Hum. Genet.95, 5–23 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Zenin, A. et al. Identification of 12 genetic loci associated with human healthspan. Commun. Biol.2, 41 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Atkins, J. L. et al. A genome‐wide association study of the frailty index highlights brain pathways in ageing. Aging Cell20, e13459 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Timmers, P. R. H. J. et al. Mendelian randomization of genetically independent aging phenotypes identifies LPA and VCAM1 as biological targets for human aging. Nat. Aging2, 19–30 (2022). [DOI] [PubMed] [Google Scholar]
- 55.Burgess, S., Zuber, V., Valdes‐Marquez, E., Sun, B. B. & Hopewell, J. C. Mendelian randomization with fine‐mapped genetic data: Choosing from large numbers of correlated instrumental variables. Genet. Epidemiol.41, 714–725 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Moqri, M. et al. Integrative epigenetics and transcriptomics identify aging genes in human blood. 2024.05.30.596713 Preprint at 10.1101/2024.05.30.596713 (2024).
- 57.Tyshkovskiy, A. et al. Distinct longevity mechanisms across and within species and their association with aging. Cell186, 2929–2949.e20 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Ying, K. Centenarian genetic burden code. Zenodo 10.5281/zenodo.13756349 (2024).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All summary statistics for the gene- and pathway-based burden tests in the Ashkenazi Jewish longevity cohort are available in Supplementary Data 1, Supplementary Data 2, and Source Data files. The individual-level genetic data from the Einstein longevity study are available under restricted access due to privacy concerns of research participants. Qualified academic investigators (typically faculty members or postdoctoral researchers with relevant expertize) can request access by contacting Dr. Nir Barzilai (nir.barzilai@einsteinmed.edu) and the study’s principal investigator, Dr. Vadim Gladyshev (vgladyshev@rics.bwh.harvard.edu). We aim to respond to all requests within 10 business days. Access is subject to approval by the Institutional Review Board and requires a material transfer agreement. Upon approval, data use will be restricted by a comprehensive data use agreement that includes conditions such as using the data solely for the approved research purpose, maintaining participant anonymity, and acknowledging the Einstein longevity study in any resulting publications. Exact procedures for data transfer will be provided upon approval. The UK Biobank data used for validation is available through application to the UK Biobank (https://www.ukbiobank.ac.uk/). Summary statistics from the eQTLGen consortium are publicly available at https://www.eqtlgen.org/. The GeneBass exome-wide association results are publicly accessible at https://genebass.org/. Other publicly available datasets used in this study include: parental lifespan GWAS summary statistics (https://datashare.ed.ac.uk/handle/10283/3209), healthspan GWAS summary statistics (https://www.gwasarchive.org/), frailty index GWAS summary statistics (https://figshare.com/articles/dataset/Genome-Wide_Association_Study_of_the_Frailty_Index_-_Atkins_et_al_2019/9204998), longevity GWAS summary statistics (https://www.longevitygenomics.org/downloads/). Source data are provided with this paper.
All of the analyzes are done in R 4.1. The custom code used for the burden test analysis and gene-level and pathway-level burden analysis is available at 10.5281/zenodo.13756349 with a detailed readme file58. Other software used in our analysis was open source and is described in the Methods section of the manuscript.