Abstract
Extreme longevity in humans has a strong genetic component, but whether this involves genetic variation in the same longevity pathways as found in model organisms is unclear. Using whole exome sequences of a large cohort of Ashkenazi Jewish centenarians to examine enrichment for rare coding variants, we found most longevity-associated rare coding variants converge upon conserved insulin/insulin-like growth factor 1 signaling (IIS) and AMP-activating protein kinase (AMPK) signaling pathways. Centenarians have a similar number of pathogenic rare coding variants as control individuals, suggesting the rare variants detected in the conserved longevity pathways are protective against age-related pathology. Indeed, we detected a pro-longevity effect of rare coding variants in the WNT signaling pathway on individuals harboring the known common risk allele APOE4. The genetic component of extreme human longevity constitutes, at least in part, rare coding variants in pathways that protect against aging, including those that control longevity in model organisms.
Keywords: rare variants, aging, lifespan, longevity, centenarians, human genetics
Species-specific lifespan is limited by aging, a multifactorial process accompanied by a general decline in tissue function and increased risk for many diseases1. Instead of a passive, entropic process of deterioration, aging is subject to active modulation by signaling pathways and transcription factors conserved across species2,3. In model organisms, single gene mutations have been demonstrated to affect lifespan4. For example, at the extreme end, the lifespan of nematode worms can be increased up to nearly ten-fold by mutations in genes involved in insulin/insulin-like growth factor 1 signaling (IIS)5,6. But even in more complicated organisms, such as flies and mice, lifespan can be extended up to 50% by mutations affecting the same pathway7-9 or other pathways involved in growth, metabolism and nutrient sensing, such as the mechanistic target of rapamycin (mTOR) and AMP-activating protein kinase (AMPK)10. On the basis of homology, it is widely hypothesized that these conserved signaling pathways are similarly involved in human aging and longevity.
In humans, lifespan is a complex trait affected by multiple factors that vary considerably within human populations. While non-genetic factors, including diet, physical activity, health habits, and psychosocial factors are important, lifespan clearly has a genetic component as suggested by human population-based studies11,12. At increasingly older ages, especially beyond 100 years, this genetic component becomes exceedingly strong13,14. As a highly complex trait, the genetic underpinnings of human lifespan likely encompass different types of genetic variants and epistasis across the allele frequency spectrum. Common variants associated with human survival have been extensively searched for in many recent genome-wide association studies (GWAS) using a variety of trait definitions and study designs15. Together, these studies identified more than 50 longevity-associated genetic loci of genome-wide significance, among which only few, especially APOE, were replicated by multiple studies16. On the other hand, several previous studies detected association of human longevity with variants in several aging genes – such as insulin signaling genes17 and FOXO318,19 – by using candidate gene approaches. Most of these longevity-associated SNPs have small effect sizes, and currently common variants collectively only explain a very small proportion of heritability for human longevity. As several recent studies suggest, rare variants likely account for at least some of the 'missing' heritability20-22.
Here we examined rare coding variants in a cohort of 515 Ashkenazi Jewish centenarians by whole-exome sequencing (WES) and tested for enrichment using a case-control design. The exceptional longevity of this cohort and their homogeneous genetic background provided us with increased power to detect causal rare variants23. As controls we used 496 Ashkenazi Jewish individuals, mostly from the same households as the centenarians, between age ~70 and 95 without a parental history of extreme longevity (neither parent survived beyond 95 years of age) (Tables 1 and Supplementary Table S1).
Table 1. Information of study cohorts.
| Characteristic | Centenarians | Controls (or non-centenarians) |
|---|---|---|
| Rare variant association study cohort | ||
| Number of subjects | 515 | 496 |
| Female, % | 72.4% | 53% |
| Age at enrollment, years (mean±SD) | 97.6±3.5 | 73.3±8.4 |
| Disease PRS study cohort | ||
| Number of subjects | 479 | 431 |
| Female, % | 73.1% | 51.3% |
| Age at enrollment, years (mean±SD) | 97.6±3.5 | 73.2±8.7 |
| Lifespan study cohort | ||
| Number of subjects | 356 | 197 |
| Female, % | 74.2% | 44.2% |
| Age at enrollment, years (mean±SD) | 97.7±3.7 | 77.9±7.8 |
| Age at death, years (mean±SD) | 100.5±3.4 | 84.2±7.3 |
RESULTS
Longevity genes and pathways implicated by rare variants
Using a joint genotyping procedure and stringent quality control metrics, we identified 130,297 rare coding variants, including 126,405 SNPs and 3,892 indels, with minor allele frequencies < 0.01 and missing rates < 0.1 in 17,561 genes in centenarians and controls. Of all SNPs, a total of 45,493 SNPs were found to be synonymous. The remaining 84,804 non-synonymous SNPs and all indels include 75,567 missense variants, more than 3,500 loss-of-function variants (1,755 frameshift, 1,736 stop-gain, and 79 stop-loss variants), and other variants with multiple functional annotations. We did not exclude synonymous rare variants from our analysis as not all of them are functionally silent24. At the whole exome level, we found no significant difference in the number of rare coding variants between centenarians and controls (P = 0.243, logistic regression including gender and the top 10 multidimensional scaling (MDS) components as covariates).
We next examined rare variant association with longevity at the variant or the gene level. At the variant level, we applied the “firth logistic regression for rare variant association tests” to examine association between the minor allele count of each rare coding variant and the longevity status. The variant with the strongest association signal was rs2229426 in FASN (fatty acid synthase) (P = 6.23E-05) (Supplementary Table S2). At the gene level, we applied two complementary region-based association tests25 – the “burden test of rare variants” and Sequence Kernel Association Test (SKAT) – to examine the association between the aggregate effect of rare coding variants in each gene and longevity. The burden test searches for a significant excess of rare alleles in longevity cases or controls, while SKAT implements a variance component test to detect effects of variants on longevity even if they have opposite directions. CLCN6 (chloride voltage-gated channel 6) presented the strongest variant association with longevity (The burden test; P = 3.45E-06 and 1.03E-05 as the lowest and the combined P-values, respectively) (Supplementary Table S3). Although these top associations at the variant or the gene level did not reach genome-wide significance after multiple-test correction, quantile-quantile (QQ) plots of association signals showed upward deviation in the tails – the lowest P-values smaller than expected from uniform distribution (0, 1) – for several groups of rare variants such as functional rare variants (CADD26 score ≥ 20) and functional but recessive benign rare variants (CADD score ≥ 20 and PrimateAI27 score < 0.5) (see the variant masking in Methods for variant groups and their interpretation) (Figures 1A and 1B, Supplementary Figures S1 and S2, and Supplementary Table S4). These rare variants include several genes known to be related to aging such as FASN28 and the DNA repair gene BLM RecQ like helicase (BLM)29.
Figure 1. Longevity association of rare variants.

(A) The QQ plots for single rare variant association tests. P-values from tests of 2,787 functional rare variants (CADD score ≥ 20) and 3,127 synonymous rare variants were used to construct two separate QQ plots. Only rare variants with a minor allele count ≥ 15 in the case-control cohort were included in the plots. (B) The QQ plot for gene-based rare variant association tests. SKAT P-values from tests of functional but recessive benign rare variants (CADD score ≥ 20 and PrimateAI scores < 0.5) in 3,717 genes were used to construct the QQ plot. Only genes with two or more masked rare variants in the case-control study cohort were included in the QQ plot. (QQ plots under different rare-variant masks are in Supplementary Figure S2.) (C) Pathway enrichment analysis of genes implicated by rare variants aggregated in a gene functional network. Top 100 IGSP-scored genes showing the trend of network aggregation were analyzed. Top 10 non-redundant (covering unique putative longevity genes) enriched pathways are shown. For a gene in a pathway in the heatmap, the color of its cell indicates a weighted burden of rare variants in centenarians (deeppink) or controls (blue) (See Supplementary Table S5). Genes in the heatmap were ordered based on their hierarchical clustering. (D) Gene-set rare variant association for aging-related pathways. P* denotes P-value corrected for 6 categories of rare variants using the minimal-P value test from Flannick et al52 (Methods). The text for the significant association denotes the lowest nominal P-value among different categories of rare variants and FDR.
The extreme rarity of centenarians in human populations essentially constrains the possibility of performing the large studies necessary to discover rare variants through statistically significant genetic associations with longevity. Instead of a candidate gene approach, we used Integrated Gene Signal Processing (IGSP) to prioritize genes based on the longevity association of rare variants in an unbiased manner through data integration30. About 94% of human protein-coding genes were in the functional linkage network used by IGSP, and only half of them also had knockout phenotype data for their mouse homologs. To include most genes in our analysis, we opted for a network integration – instead of a full one, which needs both gene network and mouse phenotype data. Individual genes were scored by jointly analyzing the longevity association of genes implicated by rare variants in a gene functional linkage network (predicted based on independent genomic high-throughput data)31, which implicitly incorporates information of gene-gene functional similarity. Data simulation showed that such integrated scoring greatly increases the prioritization power and effectively uncovers risk genes with marginal association signals30. The negative-control evaluation showed that ~100 top ranked genes had higher IGSP scores when scored by real data than by randomized data (P = 0.037) (Supplementary Figure S3 and Table S5), which suggests a clustering of longevity-associated genes implicated by rare variants in the gene network captured by a network integration in gene scoring30. Subsequent pathway enrichment analysis showed that these predicted longevity genes are significantly enriched in insulin signaling (FDR = 0.00879) and mTOR signaling (FDR = 0.0129) (Figure 1C and Supplementary Figure S4). Some predicted longevity genes have an indirect connection to insulin signaling as they are in the pathway of signaling by the insulin receptor (e.g., PSMB9). Interestingly, many of the putative longevity genes carry a burden of rare variants in centenarians, among which potential protective rare variants were also found in previous studies such as ABCA132 and PLCG233 (Supplementary Table S5).
To further increase power, longevity association of rare coding variants can be studied at the pathway level. Since aging is characterized by evolutionarily conserved, parallel and interacting mechanistic hallmarks, we next analyzed rare variants collectively in 20 pathways of all nine aging hallmarks1 (Supplementary Table S6). Functional but recessively benign rare variants in insulin signaling (SKAT, P = 5.57E-05, FDR = 0.012) and AMPK signaling (SKAT, P = 1.59E-04, FDR = 0.017) pathways were found to be significantly associated with extreme longevity (Figure 1D, Supplementary Tables S7 and S8) after multiple testing correction that took into account the total numbers of pathways, tests, and variant masks.
When studying genetic variants in association studies, it is important to validate the results by replicating any observed association signals in unrelated cohorts. Our approach followed the sequence-based replication strategy, which is more powerful than the variant-based strategy that only analyzes rare variants uncovered in the discovery cohort34. Specifically, we examined three replication cohorts for longevity association of rare coding variants in insulin and AMPK signaling pathways (Supplementary Table S9): a German longevity cohort of 1,265 centenarians (mean age: 99 years) and 4,195 blood donors (mean age: 35 years) as controls, a UK Biobank longevity cohort of 104 participants with at least one long-lived parent (lifespan ≥ 100 years) and 23,405 participants with parents of usual survival (lifespan < 95 years), and an Alzheimer's Disease Sequencing Project (ADSP) longevity cohort of 1,121 non-AD individuals aged ≥ 90 years and 38 non-AD individuals aged < 75 years35.
In the German longevity cohort, we detected a significant longevity association of functional but recessive benign ultra-rare variants (AAF < 0.05% among non-Finnish European in gnomAD36) in insulin signaling (SKAT, P = 4.41E-04, FDR = 0.018) after appropriate multiple testing correction (Extended Data Figure 1A). In the UK Biobank longevity cohort, we identified significant longevity associations of functional rare variants in insulin signaling (SKAT, P = 9.64E-06, FDR = 3.87E-04) and functional but recessive benign ultra-rare variants in AMPK signaling (SKAT, P = 2.08E-03, FDR = 0.041) pathways (Extended Data Figure 2A). In the ADSP longevity cohort, we identified significant longevity associations of recessive pathogenic rare variants in insulin signaling (Burden test, P = 8.98E-5, FDR = 3.6E-03; Direction on controls) (Extended Data Figure 3).
Next, we focused on identifying rare variants associated with human age-related disease. A genetic relationship between extreme human longevity and disease is supported by multiple observation in independent studies of a genetic association between extreme human longevity and APOE, a locus causally related to both cardiovascular and neurodegenerative disease37. Here we hypothesized that rare genetic variants associated with human longevity can exert their beneficial effects, at least in part, by protecting against chronic disease. Hence, we examined the rare coding variants in the 20 aging hallmark pathways in more refined subgroups of our cohort based on their APOE haplotype status and analyzed separately the longevity sub-cohorts of APOE4 carriers and non-carriers (hereinafter APOE4+ and APOE4−, respectively) to identify longevity-associated pathways in these two distinct genetic backgrounds. Among APOE4−, functional but recessively benign rare variants in both insulin and AMPK signaling pathways were again found significantly associated with longevity (SKAT, P = 6.21E-06 and 7.9E-05, FDR = 2.63E-03 and 0.013, respectively) (Extended Data Figure 4, Supplementary Tables S10 and S11). Interestingly, among APOE4+, we detected a significant association between longevity and 152 functional rare variants in WNT signaling genes after multiple testing correction using both the burden test and SKAT (the burden test, P = 9.16E-05, FDR = 0.013; SKAT, P = 3.40E-04, FDR = 0.036) (Extended Data Figure 4, Supplementary Tables S12 and S13). The direction of association suggests that these rare variants are enriched for protective variants among centenarian APOE4+ in our cohort (Supplementary Table S14). Indeed, only six of them were predicted as highly pathogenic rare variants (PrimateAI score ≥ 0.9), and they are not enriched, individually or collectively, among APOE4+ centenarians. The WNT association was replicated in the UK Biobank longevity cohort with a significant longevity association of functional rare variants in WNT signaling pathway among APOE4+ (SKAT, P = 1.79E-10, FDR = 2.14E-08) (Extended Data Figure 2B and Supplementary Table S9). We did not detect significant longevity association signals from rare variants in WNT signaling pathway in either of APOE4-stratified German longevity sub-cohorts ( Extended Data Figure 1B).
We examined further the protective effect of functional rare variants in WNT signaling genes on individual human lifespan in our lifespan cohort of 553 individuals with verifiable ages at death. Starting with the full linear model of lifespan that included gender, APOE4 status, the alternative allele count of protective rare variants in WNT signaling, and all two-way and three-way interaction terms among them, we identified a statistically significant interaction, the only one, between APOE4 status and the allele count in WNT signaling (P = 1.13E-04) (Supplementary Table S15). As the APOE4 status is determined mainly by rs429358, a common variant (MAF = 0.14) associated with aging and age-related diseases, the lifespan analysis result indicates the existence of epistasis between rare variants and aging-associated common variants in the genetic architecture of human aging. We then analyzed the relationship between individual lifespan and the alternative allele count of protective rare variants in WNT signaling in sub-cohorts stratified by the status of both longevity and APOE4 (Figure 2A). Among centenarians, the allele count in WNT signaling has no effect on the lifespan regardless of the APOE4 status. Among non-centenarians, there was a significant positive correlation between the burden of WNT rare variants and lifespan among APOE4+ (r = 0.406, P = 8.39E-03, FDR = 0.026) (Figure 2A, the middle blue panel), compared with non-carriers. The relationship between APOE4 status and the allele count in WNT signaling can also be more readily appreciated by comparing the average lifespan of sub-cohorts stratified based on both APOE and WNT signaling. Among APOE4+, the median difference in lifespan was over nine years (P = 2.38E-03) between individuals with low or high allele counts in WNT signaling (Figure 2B). And the negative effect of APOE4 on lifespan became weaker among individuals with a high burden of potentially protective WNT rare variants (Figure 2C). Interestingly, the aforementioned 152 rare variants in WNT signaling genes are associated with the disease status of individuals in the ADSP (SKAT, P = 4.82E-03). Finally, using the same framework, we analyzed lifespans of centenarians and non-centenarians separately and demonstrated a similar protective effect of the 152 rare variants in WNT signaling genes (Supplementary Table S14) on lifespan among non-centenarian APOE4+ (Supplementary Table S9, Extended Data Figures 5 and 6).
Figure 2. Protective rare variants in WNT signaling genes for APOE4+.

P denotes uncorrected P-value derived from linear regression with the log-transformed age at death as the outcome and the gender as a covariate (See Methods). 'WNT low' and 'WNT high' represent the alternative allele count of rare variants in WNT signaling genes ≤ 1 and > 1 (the median), respectively. In parentheses are the numbers of individuals. MD stands for 'median difference'. (A) Correlation between lifespan and the alternative allele count of protective rare variants in WNT signaling genes. (B) The lifespan difference of individuals carrying a high and low burden of protective rare variants in WNT signaling genes. The horizontal lines and vertical thick lines in violin plots represent median and interquartile range, respectively. (C) Negative effects of APOE4 on lifespan compensated by protective rare variants in WNT signaling genes.
Longevity and common polygenic risk of age-related diseases
The phenotypic outcome of individuals that carry rare variants of large effects can also be influenced by the background of common polygenic variation. To assess how rare variants may interact with the genetic background of common variants to affect human aging, we specifically examined in our longevity cohort common variants associated with seven age-related diseases: Alzheimer's disease, coronary artery disease, type 2 diabetes, stroke, breast cancer, prostate cancer, and pancreatic cancer. This analyzed cohort consists of 479 centenarians and 431 controls with both WES and SNP array data available (Table 1 and Supplementary Table S1). We calculated polygenic risk scores (PRS) of individuals for these diseases using summary statistics from their corresponding GWAS (See Methods). Empirical P-values provided by PRSice2 that account for over-fitting indicated significant genetic overlap between longevity and each of Alzheimer's disease, coronary artery disease, and type 2 diabetes (Figure 3, Extended Data Figure 7, and Table S16), which was further supported by the significant results of cross-validation (Supplementary Table S17). PRS for Alzheimer's disease, coronary artery disease, and type 2 diabetes explained 1.93% (P = 0.0019), 1.32% (P = 0.013) and 1.29% (P = 0.015) variance of the longevity status, respectively. Measured by PRS, centenarians tend to have reduced genetic susceptibility to not only Alzheimer's disease and coronary artery disease (Bonferroni-Holm P* = 0.0067 and 0.039, respectively), which were previously found associated with healthy aging38 but also type 2 diabetes (P* = 0.039). The predictive power of PRS for Alzheimer's disease was mainly driven by the APOE haplotype defined by SNPs rs7412 and rs429358 (Figure 3B and 3C). It's not the case for coronary artery disease (Extended Data Figure 7B)39. To further examine the genetic overlap between longevity and diseases, we applied an 'extreme-longevity phenotyping' strategy and found that the variance explained by PRS for Alzheimer’s disease and coronary artery disease increased almost four times to 4~7% between people with age ≥ 100 years and < 80 years (Figure 3, Supplementary Figure S5, and Table S16). PRS for type 2 diabetes showed a stronger association with the longevity status among males than females in our cohort. PRS for Alzheimer's disease and coronary artery disease, however, showed no such gender difference (Supplementary Figure S6).
Figure 3. Common polygenic risk of age-related diseases.

(A) Common polygenic risk for seven different age-related diseases on subjects were calculated using PRS of the corresponding diseases. Nagelkerke's R2 is based on correlation between the disease PRS and the centenarian status. The bar color denotes the statistical significance of R2 after adjusting MDS1-10 and gender (except breast cancer and prostate cancer, which are tested with females and males, respectively) as covariates. The statistical significance is based on the permutation P-values of using PRSice-2. For Alzheimer's disease and coronary artery disease, the middle bars show the results of PRS analyses excluding SNPs within 1 Mbps of APOE haplotype SNPs – rs429358 and rs7412. The bottom bars (for Alzheimer's disease, coronary artery disease, breast cancer and prostate cancer) show the results of PRS analyses using extreme-longevity phenotypes (cases and controls with ages ≥ 100 years and < 80 years, respectively. See Supplementary Table S16). (B-D) PRS analyses of Alzheimer's disease as it is, excluding SNPs within 1Mbps of rs7412 or rs429358, or using extreme-longevity phenotypes. In the boxplots, points represent individuals, and horizontal lines represent upper fence (maximum in Q3+1.5×IQR), upper quartile (Q3), median, lower quartile (Q1), lower fence (minimum in Q1–1.5×IQR), sequentially from top to bottom; IQR: interquartile range (25th to the 75th percentile). n = 910 biologically independent samples. Above the boxplot on the right are raw and adjusted (in parentheses) P-values for the best prediction in the Nagelkerke's R2 plot on the left, which were calculated based on logistic regression and the permutation test in PRSice2, respectively.
Pathogenic rare variants and longevity
Since the genetic component of extreme longevity could be explained, at least in part, by a reduced burden of pathogenic variants as compared with that of the general population, we compared the counts of predicted pathogenic rare coding variants (PrimateAI score ≥ 0.9). No significant difference between centenarians and controls (P = 0.243, logistic regression including gender and the top 10 MDS components as covariates; Figure 4A) was observed. Using our lifespan cohort, we next investigated whether pathogenic rare coding variants affect lifespan, whether the effect depends on the common polygenic disease background, and whether the effect is different between centenarians and controls. Consistent with the general observation, females also had significantly better survivorship than males in our lifespan cohort (P = 1.71E-07, Extended Data Figure 8). So had APOE4− than APOE4+ (P = 9.32E-04, Extended Data Figure 9). In our lifespan cohort, 853 pathogenic rare variants were identified. No correlation between the exome-wide burden of pathogenic rare variants and the lifespan was observed among centenarians and non-centenarians together (the full lifespan cohort) or either of them separately (Figure 4B).
Figure 4. Analysis of pathogenic rare variants.

(A) Exome-wide burden of pathogenic rare variants in centenarians and controls. (B) Correlation between lifespan and the exome-wide burden of pathogenic rare variants. The left panel shows the result based on all 553 individuals. The middle and the right panels show the results based on individuals with lifespan ≥ 95 years and < 95 years, respectively. (C) Correlation between lifespan and the exome-wide burden of pathogenic rare variants among individuals with high genetic risk of age-related diseases. The left panel shows the correlation among 94 APOE4+. The right panel shows the correlation among 20 APOE4+ with PRS among top 45% for CAD and T2D (see Supplementary Table S18 for the results of using other cutoffs).
Human extreme longevity could be causally driven by a lack of genetic risk factors for chronic disease, by protective variants or both. Measured by the polygenic risk score (PRS), centenarians in our cohort tend to have reduced genetic susceptibility to Alzheimer's disease (AD), coronary artery disease (CAD), and type 2 diabetes (T2D) among seven age-related diseases that we examined (Figure 3 and Supplementary Table S16). Using the APOE4 status and PRS of CAD and T2D, we stratified our lifespan cohort according to their common genetic risk of AD, CAD, and T2D – the three age-related diseases with significant genetic overlap with longevity in our cohort – and examined how the common polygenic disease risk background and pathogenic rare variants may together affect human lifespan. We first re-examined the effect of pathogenic rare variants on lifespan on an AD risk background based on the APOE4 status and found a weak negative correlation (r = −0.184, P = 0.064) among APOE4+. However, this relationship became significantly stronger (r = −0.605, P = 2.85E-03, FDR = 7.13E-03) if substantial genetic risk of both CAD and T2D also was included (i.e., APOE4+ with PRS for both diseases higher than the respective median of the longevity cohort) (Figure 4C and Supplementary Table S18). These results suggest that pathogenic rare variants and disease-associated common variants interact. Such genetic interactions may affect the deleterious effect of pathogenic rare variants on human lifespan, a possibility that we formally investigated using a full linear model of lifespan including gender, APOE4 status, separate PRS of CAD and T2D, the pathogenic rare variant counts, and all two-way and higher-order interaction terms among them. The subsequent stepwise model selection identified multiple interactions, among which the most significant is a three-way interaction among the pathogenic rare variant count and the common polygenic disease risk of AD and T2D in our lifespan cohort (P = 3.12E-04) (Supplementary Table S15). Our analyses of stratified sub-cohorts showed that the negative effect of common polygenic disease risk on human lifespan can intensify under a high burden of pathogenic rare variants. For example, the presence of APOE4 reduced life by ~1.5 years on average in our cohort. However, among individuals with ≥ 7 pathogenic rare variants (the median = 3), APOE4+ lived ~17 years less than non-carriers in general (P = 2.77E-04; FDR = 8.31E-04) (Supplementary Figure S7). To replicate this discovery of the relationship between pathogenic rare variants and lifespan, we first constructed a UK Biobank parental lifespan cohort (Methods), which consists of 20,823 unrelated (to the first-degree kinship) participants with known parental ages at death, and then examined the relationship between the exome-wide burden of pathogenic rare coding variants and the parental lifespan in this cohort (Supplementary Figure S8). We observed a negative correlation among APOE4+ (r = −0.024, P = 0.044) (Supplementary Figure S9 and Supplementary Table S9). The stepwise model selection procedure identified a significant interaction related to parental lifespan between the APOE4 status and the exome-wide burden of pathogenic rare coding variants (P = 5.48E-05).
DISCUSSION
In summary, in this first large-scale genetic study of rare coding variants and human longevity, our network-integrated analysis identified an enrichment of longevity-associated rare coding variants in conserved aging pathways and gene-set association tests confirmed longevity association of rare variants in insulin and AMPK signaling pathways. These results suggest that rare variants in conserved aging pathways important for aging of model organisms also affect human lifespan and constitute a part of the genetic architecture of human longevity. As expected, based on the many species-specific characteristics of aging, the pattern is not completely identical between human and animal longevity. For example, we did not find any association of extreme longevity with variants in the mTOR pathway, which has been associated with longevity in model organisms, including the mouse. On the other hand, we did find other pathways critical to human aging not yet identified in model organisms. For example, we demonstrated protective effects of rare variants in WNT signaling on human lifespan. Interestingly, in the klotho-knockout mouse model of accelerated aging, continuous WNT exposure triggered accelerated cellular senescence, implicating WNT signaling in mammalian aging40. Finally, our results confirm previous reports that centenarians do not have a lower burden of pathogenic variants. Instead, from our present study, it appears that rare protective variants suppress the adverse effects of pathogenic variants on longevity.
To investigate whether the same conserved pathways are important to aging of both model organisms and human, we can examine the effects of rare variants on lifespan-related traits. This is particularly challenging, however, due to a strong intrinsic stochasticity in aging processes: among isogenic C. elegans in a constant environment, lifespan of long-lived (age-1) mutants overlaps with that of the wildtype controls41,42. Thus, the same genetic variants may have highly variable effects on lifespan among different individuals. While this stochasticity complicates the identification of longevity-associated variants in conserved aging pathways, using appropriate statistical tests and study cohort can help overcome the challenge.
In this study, we identified rare coding variants in aging pathways that affect human longevity. Future studies of their molecular functions could generate actionable biological insights on aging. In particular, uncovering the downstream pathways that mediate protective effects of rare variants found in WNT signaling genes is imperative to translate this finding into therapeutic interventions against age-related diseases. Experiments with mouse cells suggest that APOE4 may inhibit WNT signaling43. Dysregulation of WNT signaling contributes to different types of age-related diseases such as cancer44, AD45, and cardiovascular disease46. Thus, protective rare variants in WNT signaling genes could counteract the adverse effects of APOE4-induced WNT inhibition on the progression of downstream age-related diseases and thus affect lifespan. While coding variants are more likely to reduce than to enhance the function of the protein product, conclusive confirmation and understanding of the functional effects at the molecular, cellular, and organismal levels require experimental validation using functional assays and genome-editing16.
Our study suggests that rare variants can have distinct effects on lifespan on different genetic backgrounds of age-related diseases (such as the APOE4 status), underlying the difficulty to detect and replicate effects of rare variants on lifespan without considering other genetic factors. On the other hand, while common variants associated with age-related diseases are known to influence lifespan47, our finding of potential genetic interactions between common and rare variants in the context of human lifespan provides novel insights into the mechanism of disease resilience as a part of the genetics of healthy aging among centenarians. How perturbation of conserved aging pathways contributes to human longevity and healthspan cannot be answered by genetics alone. However, while the molecular mechanisms of many conserved aging pathways have been widely studied in model organisms, our findings about rare variants – especially those from centenarians with high common polygenic risk of age-related diseases – can help translate those established longevity-regulating mechanisms in model organisms to therapeutic targets for healthy aging of humans.
A limitation of whole-exome sequencing, used in our present study, is the absence of rare, non-coding variants that have been implicated in aging of model organisms48-50 and thus of potential interest for human longevity. These include, for example, rare variants in non-coding RNAs or other regulatory elements relevant for tissue specificities and variants in long tandem repeats connected to brain health and various neurological disorders. To identify the latter, long sequencing reads are required51.
METHODS
Our WES study on the Einstein longevity cohorts complies with all relevant ethical regulations and was approved by the Institutional Review Board at Albert Einstein College of Medicine. Informed consent was obtained from participants or from a proxy if the participant lacked decisional capacity. The WES studies of all the three replication cohorts have informed consent from participants and were approved by the respective ethics committee or institutions: the Ethics Committee at Medical Faculty of Kiel University for the German longevity cohort; the Ethics Advisory Committee and the external ethics committees for the UK Biobank; and the ethics committees of the Broad Institute, Baylor College of Medicine’s Human Genome Sequencing Center, and Washington University’s McDonnell Genome Institute for the ADSP cohort.
Recruitment of Einstein longevity cohorts
The study subjects were Ashkenazi Jewish participants from two longevity cohorts, the Longevity Genes Project (LGP) and the LonGenity study, who were recruited and characterized at the Albert Einstein College of Medicine since 1998. Cases (centenarians) were defined as individuals with age ≥ 95 years, and individuals with age < 95 without a parental history of longevity (neither parent survived beyond 95 years of age) were classified as controls. The centenarians' dates of birth were confirmed by birth certificates or government issued identification. Vital status and date of death, where applicable, were determined as of April 3, 2019, based on documentation of last contact with the study participant, reports from the next of kin, and search of publicly available databases. In the LGP and LonGenity cohorts, 555 and 508 individuals were classified as longevity cases (mean age: 101) and controls (mean age: 83), respectively. Mortality status was confirmed for 650 individuals, and these individuals were subjects for the lifespan analysis.
SNP-array genotyping
SNP-array genotyping was performed using Illumina Global Screening Array-24 v1.0 BeadChip with 642,824 markers, 7201 of which could not be 'lifted over' to human genome assembly GRCh38 and thus removed. 2,026 samples were genotyped by SNP-arrays. After removing duplicates and samples not in our longevity studies, 635,623 variants in 1,830 samples were processed and analyzed (1,740 samples also have WES data). Quality control of array-based genotyped data was carried out using PLINK software (version 1.9)53. First, we checked the missing rate of SNPs and samples. SNPs and samples that miss over 20% genotype calls are removed and this missingness filtering is repeated with a more stringent threshold of 2%. Individuals whose self-reported gender is different from the one predicted based on sex chromosome heterozygosity are removed. SNPs whose genotype frequencies deviate from the Hardy-Weinberg equilibrium with a χ2-test P < 1E-6 among controls, followed by P < 1E-10 among cases are removed. Finally, samples whose heterozygosity deviated more than three standard deviations from the mean are removed.
Exome sequencing and genotyping
Exome sequencing of 2,112 individuals in LGP and LonGenity cohorts was performed at the Regeneron Genetics Center (RGC). Sample preparation and whole-exome sequencing were performed using previously described methods54 (Supplementary Note). Variants in our centenarian cohort were called on human genome assembly GRCh38. For our rare variant analyses using both binary (cases vs. controls) longevity and continuous lifespan data, only rare variants with missing rates < 0.1 in the corresponding study cohorts were analyzed; all samples in our study cohorts have a missing rate < 0.01 on rare variants that passed the quality control (Supplementary Note).
Aggregation of SNP-array and WES data
For PRS-related analyses, we used genotypic data aggregated from WES and SNP-array (Extended Data Figure 10A) for two reasons: (1) genotypes of common variants from the whole genome (not just the exome) need to be imputed (see the next sub-section) for PRS calculation; and (2) genome-wide imputation based on genotypic data from both WES (for better accuracy) and SNP-array (for better coverage) is better than imputation based on WES data alone. After the aggregation process (Supplementary Note), ~1,203k variants were kept in the merged VCF file.
Genotype imputation
We used the Michigan Imputation Server (Minimac3)55 for genotype imputation (n = 1,740). The Haplotype Reference Consortium (HRC, r1.1 2016)56 was used as the reference panel, Eagle v2.3 for phasing, and the European population (EUR) for quality control. After the post-imputation process (Supplementary Note and Supplementary Figure S10), we obtained ~14,079k polymorphic variants in our cohort. We evaluated the suitability of the HRC reference panel for cross-ethnicity genotype imputation in our study, using 196 Ashkenazi Jewish individuals in our cohort for whom the whole-genome sequencing data are available. Genotype imputation that we performed was highly accurate: in 183 individuals (out of 196), genotypes of >99% of 2,020 randomly selected non-coding variants that were not genotyped by either WES or SNP array data can be correctly imputed (Supplementary Figure S11).
Polygenic risk score analysis
We calculated polygenic risk scores (PRSs) using PRSice-257,58 to analyze disease risk from common variants in our longevity cohort. We first collected summary statistics from the most recent GWAS of seven complex diseases of European or predominantly European ancestry: AD59, CAD60, T2D61, stroke62, prostate cancer63, breast cancer64, and pancreatic cancer65. From combined genotype data after imputation for 1,740 samples, common SNPs (MAF > 5%) were selected in the cohort and carried out LD clumping if they are within 250 kbps and R2 > 0.1. After clumping, we used 19 P-values (1, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 0.01, 1E-3, 1E-4, 1E-5, 1E-6, 1E-7, 1E-8, 1E-9, and 1E-10) as cutoffs to select SNPs for scoring and, for AD, additional ones to restrict selection to most AD-associated SNPs. After removing outliers based on Multidimensional Scaling (MDS) analysis (Supplementary Figure S12), non-Ashkenazi Jewish individuals and kinship, 910 centenarians and controls among 1,740 samples were used to evaluate association between disease PRS and longevity in the cohort (Extended Data Figure 10A). To remove population substructure and sex difference, we included the top 10 MDS components derived from common SNPs in the combined genotype dataset and gender as covariates in the regression analysis to evaluate PRS association. When analyzing PRS of prostate cancer and breast cancer, only male and female individuals were considered respectively.
Rare variant association analysis
Among 2,021 Ashkenazi Jewish individuals with WES data, 536 were centenarians, and 506 were controls. Pairs of individuals with the proportion of alleles shared identity-by-descent (IBD) > 0.4 were identified as related – i.e., monozygotic twins, parents and children, and full siblings – and one sample per pair was excluded, with inclusion to achieve more cases, higher ages of cases, and lower ages of controls. In our study cohort, we identified 31 participants as related to other participants due to high IBD. After excluding them, we had 515 cases (mean age: 101 years) and 496 controls (mean age: 83 years) for rare variant association analysis (Table 1 and Extended Data Figure 10B). In this study, we analyzed rare variants with alternative allele frequencies < 1% in Ashkenazi Jewish populations, which were calculated based on the average of the allele frequencies in 731 unrelated (to the first-degree kinship) Ashkenazi Jewish individuals in our centenarian cohort (2,021) (excluding centenarians and other individuals included in our study (Table 1)) and the ones in Ashkenazi Jews reported in gnomAD. The longevity association was assessed on the variant, gene, and gene-set levels. We evaluated the longevity association of each rare coding variant using the firth logistic regression66. For association tests at gene and gene-set levels, we performed the burden test and SKAT (implemented in R67; version 1.3) to test longevity association of six different subsets of rare variants within each gene or gene-set. The variant-masking scheme52 was designed to group similar rare variants of specific properties based on CADD (version 1.4) and PrimateAI (version 0.2) annotation. CADD is widely used as a variant annotation tool to predict the functionality (i.e., being functional or neutral) of variants. In contrast, PrimateAI predicts their clinical impact (i.e., being pathogenic or benign). We defined different classes of variants based on the recommended thresholds of CADD and PrimateAI scores (Supplementary Table S4): all rare variants (without masking), functional (or non-neutral)26 rare variants (CADD score ≥ 20), dominant pathogenic rare variants (PrimateAI score > 0.8), recessive pathogenic rare variants (PrimateAI score > 0.7), functional but dominant benign rare variants (CADD score ≥ 20 & PrimateAI score < 0.6), and functional but recessive benign rare variants (CADD score ≥ 20 & PrimateAI score < 0.5). The minimum P-value test52 was used to combine P-values of the aforementioned six sets of rare variants at the gene or gene-set level. For gene-based association tests, only genes with multiple rare variants after masking were tested for the corresponding rare variant category. 15,935 genes were tested for at least one variant category. For gene-set association tests, we compiled 20 gene sets of aging pathways for nine aging hallmarks1 (Supplementary Table S6) and used the burden test and SKAT to test longevity association of those six sets of rare variants within each of those 20 gene sets. FDR was used to correct for 130,297 P-values at the variant level, 31,870 (2 × 15,935) combined P-values at the gene level, and 40 (2 × 20) combined P-values at the gene set level, respectively. For rare variant association at the gene-set level, we conducted an independent analysis using the same framework but in two sub-cohorts: APOE4+ and APOE4−. FDR was used to correct for 80 (2 × 2 × 20) combined P-values in this analysis. Gender and top 10 MDS were included as covariates in all rare variant association analyses in the discovery cohort.
Network/pathway enrichment of rare variants
In addition to conventional approaches of rare-variant association study, we investigated whether longevity-associated rare variants aggregate in a gene network and pathways. We first used IGSP30 to score longevity-associated genes by integrating rare-variant association tests at the gene level with gene functional network31. To consider information of all rare coding variants in IGSP scoring, we collected gene association signals by applying the weighted burden test (using the R package SKAT) on rare coding variants of each gene weighted by the corresponding CADD scores. We then tested whether top 100 genes tend to be scored higher than top 100 genes derived from randomized rare variant association signals using the Wilcoxon rank-sum test. To investigate enriched pathways of those top 100 genes implicated by longevity association of rare variants and the functional gene network in an unbiased manner, we first performed the pathway enrichment analysis using ToppGene Suite68, in which 1,245 pathways from different pathway databases were analyzed concurrently, to summarize top enriched pathways across pathway databases. In addition, we compared the top enriched KEGG and Reactome pathways identified by ToppGene Suite and other three widely used tools for pathway-enrichment analysis – Enrichr69, g:Profiler70, and GSEA71 – to derive enriched pathways supported by multiple analysis tools.
Lifespan analysis of rare variants
In our longevity cohort, after removing the kinship relatedness, we have date of death – and thus definitive lifespan information – on 553 Ashkenazi Jewish individuals (202 males and 351 females) (Table 1, Extended Data Figures 8 and 10), among which 550 (~99.5%) individuals with lifespans ≥ 65 years. Since no censored data were included in our lifespan cohort – i.e., all subjects reached the endpoint (death), for all lifespan analyses of rare variants, we tested the association between lifespan and the burden of rare variants in the lifespan cohort using a unified accelerated life linear model72 with the log-transformed age at death as the outcome and the gender as a covariate. Different from rare-variant association analyses that aimed to discover longevity-associated rare variants using a longevity case-control design, our lifespan analyses of rare variants investigate how pathogenic rare variants and protective rare variants discovered in our case-control study impact human lifespan through quantitative analyses.
Pathogenic rare variants and lifespan
We investigated whether pathogenic rare variants can adversely affect lifespan. We used PrimateAI27, which was specially designed and optimized for predicting disease-causing variants73, to select highly pathogenic rare coding variants using a stringent score threshold ≥ 0.9 and assessed how the total count of their alternative allele (the exome-wide burden) may affect lifespan. PrimateAI is a machine learning-based method that expands the data set for training by including common variants from non-human primates to improve the power for predicting human pathogenic variants. No direct comparison of variant effects on longevity was made between human and non-human primates by using PrimateAI.
Protective rare variants and lifespan
Our rare-variant association tests uncovered a burden of rare variants in WNT signaling genes that may have pro-longevity effects among APOE4+ (Supplementary Table S14). We investigated their impact on lifespan by examining those protective rare variants in our lifespan cohort through several analyses: We evaluated whether the alternative allele count of those protective rare variants in WNT signaling genes is correlated with lifespan among APOE4+ and APOE4−, respectively; from a complementary angle, we investigated whether APOE4 differentially affects lifespans of individuals with a high or low burden of those protective rare variants; and finally, we also examined centenarians and non-centenarians separately in the lifespan analysis to differentiate it from the association study in which the longevity status was used.
Replication studies
To maximize the extent of replicating human longevity association of rare variants, we prepared longevity case-control replication studies using cohort-specific criteria of determining longevity cases and controls. First, longevity cases are individuals older than the human life expectancy. Second, longevity controls are individuals substantially (> 15 years) younger than cases. We used the WES data from three cohorts – a German longevity cohort, a UK Biobank longevity cohort, and a longevity cohort from ADSP – to replicate the longevity association of rare variants discovered in our Ashkenazi Jewish longevity cohort. The German sample comprised 1,265 long-lived individuals (age range: 94 - 110 years; mean age: 99 years) as described previously74 and 4,195 younger controls (mean age: 35 years) recruited as part of the FoCus cohort75 and as blood donors at the University Hospital Schleswig-Holstein in Kiel and Lübeck, Germany. For exome sequencing and data analysis (including alignment and variant calling), the same wet lab processes and bioinformatic pipelines at the Regeneron Genetics Center were employed as for the Einstein cohort. The UK Biobank longevity cohort was collected from 49,960 individuals whole-exome sequenced in the UK Biobank76, consisting of 104 cases and 23,405 controls of British and white ethnicity with at least one long-lived parent (lifespan ≥ 100 years) (mean longest-lifespan of parents: 101 years) and with parents of usual survival (lifespan < 95 years) (mean longest-lifespan of parents: 80 years), respectively. The ADSP longevity cohort consists of 1,121 non-AD individuals aged ≥ 90 years (the ADSP recorded age is right truncated at 90) as cases and 38 non-AD individuals aged < 75 years as controls (mean age: 71 years). Both the German and the UK Biobank longevity cohorts were used to replicate our findings in the full and APOE4-stratified Ashkenazi Jewish longevity cohorts. The ADSP longevity cohort was used only to replicate findings made in the full discovery cohort due to the limited number of its control samples.
Relativeness to the first-degree kinship were removed from all three cohorts. We applied the same framework of rare variant association analysis from our discovery study in the replication analysis. We tested the 6 masking-groups of not only rare variants (AAF < 1%) but of ultra-rare variants (AAF < 0.05%), separately, that were not examined specifically in our discovery cohort due to the limited sample size of allele reference panel in Ashkenazi Jews. The minimum P-value test was used to correct for 12 test P-values for a tested gene set accordingly. Rare variants in the three longevity cohorts are determined based on their AAF frequency in the corresponding WES data (5,460, 49,960, and 10,267 individuals in the German, UK Biobank and ADSP WES data, respectively). Ultra-rare variants were further determined from rare variants based on their AAF reported in the large Non-Finnish European reference panel in gnomAD (v2; 56,885 individuals). Rare variants with genotype missing rates ≥ 0.1 were excluded from our analyses. Gender and top 10 principal components from the PCA analyses accounting for the subpopulation structure were used as covariates in the burden test and SKAT. FDR was used to correct for 4 (2 gene sets: Insulin and AMPK; 2 tests: SKAT and the burden test) and 12 (3 gene sets: Insulin, AMPK and WNT; 2 tests: SKAT; 2 sub cohorts: APOE4+ and APOE4−) combined P-values for replication tests at a gene-set level in the full cohort and APOE4 stratified cohorts, respectively.
We used the UK Biobank WES data of a parental lifespan cohort to replicate the relationship between pathogenic rare variants and lifespan discovered in our Ashkenazi Jewish lifespan cohort, due to the lack of long-lived individuals with WES data (the longest lifespan is ~80 years). After removing relatedness to the first-degree kinship, this cohort consists of 20,823 individuals of British and white ethnicity with the average parental lifespan ≥ 65 years. We used the same regression framework for replication as we used for discovery, including top 10 principal components as covariates.
Statistics and reproducibility
No statistical methods were used to predetermine sample size as all the available samples from the WES data were considered. We used various statistical methods to analyze the data; please see the Methods subsections above for details. We used three independent longevity cohorts in which we successfully replicated our finding on longevity association of rare variant in aging pathways. No data were excluded from the analyses. The experiments were not randomized as this approach was not relevant to the study design. The investigators were not blinded to allocation during experiments and outcome assessment as this was not relevant to the study design.
Extended Data
Extended Data Figure 1. The replication study of gene-set longevity association using the WES data of the German longevity cohort.

The longevity case-control study consists of 1,265 longevity cases and 4,195 longevity controls. P* denotes P-value corrected for 12 categories of rare variants using the minimal-P value test from Flannick et al52 (Methods). The text for the significant association denotes the lowest raw P-value among different groups of tested rare variants and FDR. (A) Full longevity cohort. (B) APOE4 stratified cohorts.
Extended Data Figure 2. The replication study of gene-set longevity association using the UK Biobank WES data.

The longevity case-control study consists of 104 cases with at least one parent age at death ≥ 100 years and 23,405 controls with both parent age at death < 95 years. P* denotes P-value corrected for 12 categories of rare variants using the minimal-P value test from Flannick et al52 (Methods). The text for the significant association denotes the lowest raw P-value among different groups of tested rare variants and FDR. (A) Full longevity cohort. (B) APOE4 stratified cohorts.
Extended Data Figure 3. The replication study of gene-set longevity association using the ADSP WES data.

The longevity case-control study consists of 1,121 non-AD cases with age ≥ 90 years and 38 non-AD controls with age < 75 years. P* denotes P-value corrected for 12 categories of rare variants using the minimal-P value test from Flannick et al52 (Methods). The text for the significant association denotes the lowest raw P-value among different groups of tested rare variants and FDR.
Extended Data Figure 4. Gene-set rare variant association in the APOE4-stratied cohorts of the discovery (Ashkenazi Jewish) longevity cohort.

P* denotes P-value corrected for 6 categories of tested variants using the minimal-P value test from Flannick et al52 (Methods). The text for the significant association denotes the lowest raw P-value among different groups of tested rare variants and FDR.
Extended Data Figure 5. Lifespan analysis of protective variants in WNT signaling genes for non-centenarians.

P denotes uncorrected P-value derived from linear regression with the log-transformed age at death as the outcome and the gender as a covariate (See Methods). 'WNT low' and 'WNT high' represent the alternative allele count of rare variants in WNT signaling genes ≤ 1 and > 1 (the median), respectively. In parentheses are the numbers of individuals. MD stands for 'median difference'. (A) The lifespan difference of individuals carrying a high and low burden of protective rare variants in WNT signaling genes. (B) Negative effects of APOE4 on lifespan with high and low burden of protective rare variants in WNT signaling for centenarians.
Extended Data Figure 6. Lifespan analysis of protective variants in WNT signaling genes for centenarians.

P denotes uncorrected P-value derived from linear regression with the log-transformed age at death as the outcome and the gender as a covariate (See Methods). 'WNT low' and 'WNT high' represent the alternative allele count of rare variants in WNT signaling genes ≤ 1 and > 1 (the median), respectively. In parentheses are the numbers of individuals. MD stands for 'median difference'. (A) The lifespan difference of individuals carrying a high and low burden of protective rare variants in WNT signaling genes. (B) Negative effects of APOE4 on lifespan with high and low burden of protective rare variants in WNT signaling for centenarians.
Extended Data Figure 7. Disease-PRS analyses for centenarian and control.

This shows the results of PRS analyses for age-related diseases in the centenarian cohort. In the boxplots, points represent individuals, and horizontal lines represent upper fence (maximum in Q3+1.5×IQR), upper quartile (Q3), median, lower quartile (Q1), lower fence (minimum in Q1–1.5×IQR), sequentially from top to bottom; IQR: interquartile range (25th to the 75th percentile). n = 910 biologically independent samples in the boxplots on the right panels for coronary artery disease, type 2 diabetes, stroke, and pancreatic cancer. n = 339 and 571 biologically independent samples in the boxplots on the right panels for prostate cancer and breast cancer, respectively. Above the boxplot on the right are raw and adjusted (in parentheses) P-values for the best prediction in the Nagelkerke R2 plot on the left, which were calculated based on logistic regression and the permutation test in PRSice2, respectively. For stroke, breast cancer, prostate cancer, and pancreatic cancer, no robust association was observed between their PRS and the longevity status as originally defined in our cohort. (A) Coronary artery disease. (B) Coronary artery disease without considering SNPs within 1Mbps of rs7412 or rs429358 (SNPs for the APOE haplotype). (C) Type 2 diabetes. (D) Stroke. (E) Prostate cancer. Only males are considered. (F) Breast cancer. Only females are considered. (G) Pancreatic cancer.
Extended Data Figure 8. Basic statistics of the lifespan cohort.

(A) Lifespan distribution of 553 individuals. (B) Survival curves of 202 males and 351 females composing the analyzed cohort. Females have a significant survival rate than males based on cox regression model (P = 1.71E-07; coxph in R package).
Extended Data Figure 9. Correlation between lifespan and common-variant genetic risk of age-related diseases.

P-values were based on the result of linear regression (regress log lifespan on genetic disease risk) corrected for gender. (A) Alzheimer's disease. The plots on the left and right show the boxplot and survival curves of APOE4+ and APOE4−, respectively. MD stands for 'Median Difference'. In the boxplots, points represent individuals, and horizontal lines represent upper fence (maximum in Q3+1.5×IQR), upper quartile (Q3), median, lower quartile (Q1), lower fence (minimum in Q1–1.5×IQR), sequentially from top to bottom; IQR: interquartile range (25th to the 75th percentile). n = 553 biologically independent samples. (B) Coronary artery disease. r represents 'correlation coefficient'. (C) Type 2 diabetes.
Extended Data Figure 10. Flowcharts of sample collection for different analyses.

(A) Flowchart of sample collection for PRS analyses and lifespan analyses of rare variants and disease PRS. Refer 'Rare variant association analysis' subsection for the strategy of removing kinship for PRS analysis that involves longevity status. The strategy of removing kinship in lifespan analyses is to randomly exclude one in pairs of individuals with the proportion of alleles shared identity-by-descent (IBD) > 0.4. (B) Flowchart of sample collection for rare variant association tests, network-integrated analyses, and lifespan analyses of rare variants (and APOE4).
Supplementary Material
ACKNOWLEDGEMENTS
This work was supported by NIH grants R01 HG008153 (Z.D.Z.), R01 AG061155 (S.M.), R01 AG057909 (N.B. and Z.D.Z.), P01 AG017242 (J.V.), U19 AG056278 (J.V., P.R., L.N., Y.S., and W.L.), and a Career Scientist Award from Irma T. Hirschl Trust to Z.D.Z. We thank the the Popgen Biobank and the Popgen 2.0 Network (P2N) at Kiel University for help with the recruitment of some of the long-lived individuals. G.G.T. was supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) through the project number 390870439 (EXC 2150 - ROOTS). We thank Dr. Tao Wang (Albert Einstein College of Medicine) for comments and suggestions. We thank Management and Leadership Team in RGC for contributing to securing funding, study design and oversight and reviewing the manuscript: Goncalo Abecasis, Aris Baras, Michael Cantor, Giovanni Coppola, Andrew Deubler, Aris Economides, Luca A. Lotta, John D. Overton, Jeffrey G. Reid, Alan Shuldiner. We thank Sequencing and Lab Operations in RGC for performing and being responsible for sample sequencing (Justin Marcovici, Emilia Weihenig, Alexander Lopez, and John D. Overton), performing and being responsible for exome sequencing (Alex DeVito, Joseph LaRosa, Louis Widom, Christina Beechert, Caitlin Forsythe, Erin D. Fuller, Michael Lattari, Maria Sotiropoulos Padilla, Sarah E. Wolf, Alexander Lopez, and John D. Overton), conceiving and being responsible for laboratory automation (Thomas D. Schleicher, Zhenhua Gu, Alexander Lopez, and John D. Overton), and being responsible for sample tracking and the library information management system (Manasi Pradhan, Kia Manoochehri, Ricardo H. Ulloa, and John D. Overton). We thank Genome Informatics in RGC for performing and being responsible for analysis needed to produce exome and genotype data (Xiaodong Bai, Alicia Hawes, William Salerno, and Jeffrey G. Reid), providing compute infrastructure development and operational support (Gisu Eom and Jeffrey G. Reid), providing variant and gene annotations and their functional interpretation of variants (Suganthi Balasubramanian and Jeffrey G. Reid), conceiving and being responsible for creating, developing, and deploying analysis platforms and computational methods for analyzing genomic data (Evan K. Maxwell, Jeffrey C. Staples, Lukas Habegger, and Jeffrey G. Reid). We thank Research Program Management in RGC for contributing to the management and coordination of all research activities, planning, execution, and reviewing the manuscript (Marcus B. Jones and Lyndon J. Mitnaul).
Footnotes
COMPETING INTERESTS
J.V. is a founder of Singulomics Corp. P.D.R. and L.J.N. are co-founders of NRTK Biosciences. All other authors declare no competing interests.
Data availability
All summary statistics for the longevity association of rare coding variants in our Ashkenazi Jewish longevity cohort are available at http://zdzlab.einsteinmed.org/1/longevity.html. Due to privacy concerns for our research participants, individual-level genetic data from the Einstein longevity study are not publicly available; however, anonymized data will be shared by request from a qualified academic investigator as long as the data transfer is approved by the Institutional Review Board and regulated by a material transfer agreement. The German longevity cohort data are part of the PopGen Biobank (Schleswig-Holstein, Germany) and can be accessed through a Material Data Access Form (http://www.uksh.de/p2n/Information+for+Researchers.html). Sequence and phenotype data of the UK Biobank and the ADSP cohorts are available at https://bbams.ndph.ox.ac.uk/ams/ and https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000572, respectively. All software used in our analyses was open source and was described in the Methods section.
REFERENCES
- 1.Lopez-Otin C, Blasco MA, Partridge L, Serrano M & Kroemer G The hallmarks of aging. Cell 153, 1194–217 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Kenyon CJ The genetics of ageing. Nature 464, 504–12 (2010). [DOI] [PubMed] [Google Scholar]
- 3.Kirkwood TB Understanding the odd science of aging. Cell 120, 437–47 (2005). [DOI] [PubMed] [Google Scholar]
- 4.Campisi J. et al. From discoveries in ageing research to therapeutics for healthy ageing. Nature 571, 183–192 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Kenyon C, Chang J, Gensch E, Rudner A & Tabtiang R A C. elegans mutant that lives twice as long as wild type. Nature 366, 461–4 (1993). [DOI] [PubMed] [Google Scholar]
- 6.Ayyadevara S, Alla R, Thaden JJ & Shmookler Reis RJ Remarkable longevity and stress resistance of nematode PI3K-null mutants. Aging Cell 7, 13–22 (2008). [DOI] [PubMed] [Google Scholar]
- 7.Tatar M. et al. A mutant Drosophila insulin receptor homolog that extends life-span and impairs neuroendocrine function. Science 292, 107–10 (2001). [DOI] [PubMed] [Google Scholar]
- 8.Clancy DJ et al. Extension of life-span by loss of CHICO, a Drosophila insulin receptor substrate protein. Science 292, 104–6 (2001). [DOI] [PubMed] [Google Scholar]
- 9.Holzenberger M. et al. IGF-1 receptor regulates lifespan and resistance to oxidative stress in mice. Nature 421, 182–7 (2003). [DOI] [PubMed] [Google Scholar]
- 10.Johnson SC, Rabinovitch PS & Kaeberlein M mTOR is a key modulator of ageing and age-related disease. Nature 493, 338–45 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Herskind AM et al. The heritability of human longevity: a population-based study of 2872 Danish twin pairs born 1870-1900. Hum Genet 97, 319–23 (1996). [DOI] [PubMed] [Google Scholar]
- 12.Vijg J & Suh Y Genetics of longevity and aging. Annu Rev Med 56, 193–212 (2005). [DOI] [PubMed] [Google Scholar]
- 13.Christensen K, Johnson TE & Vaupel JW The quest for genetic determinants of human longevity: challenges and insights. Nat Rev Genet 7, 436–48 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Perls TT, Bubrick E, Wager CG, Vijg J & Kruglyak L Siblings of centenarians live longer. Lancet 351, 1560 (1998). [DOI] [PubMed] [Google Scholar]
- 15.Melzer D, Pilling LC & Ferrucci L The genetics of human ageing. Nat Rev Genet 21, 88–101 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Zhang ZD et al. Genetics of extreme human longevity to guide drug discovery for healthy ageing. Nat Metab (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Deelen J. et al. Gene set analysis of GWAS data for human longevity highlights the relevance of the insulin/IGF-1 signaling and telomere maintenance pathways. Age (Dordr) 35, 235–49 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Broer L. et al. GWAS of longevity in CHARGE consortium confirms APOE and FOXO3 candidacy. J Gerontol A Biol Sci Med Sci 70, 110–8 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Deelen J. et al. A meta-analysis of genome-wide association studies identifies multiple longevity genes. Nat Commun 10, 3669 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Cash TP et al. Exome sequencing of three cases of familial exceptional longevity. Aging Cell 13, 1087–90 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Nygaard HB et al. Whole Exome Sequencing of an Exceptional Longevity Cohort. J Gerontol A Biol Sci Med Sci (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Shindyapina AV et al. Germline burden of rare damaging variants negatively affects human healthspan and lifespan. Elife 9(2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Guha S. et al. Implications for health and disease in the genetic signature of the Ashkenazi Jewish population. Genome Biol 13, R2 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Hunt RC, Simhadri VL, Iandoli M, Sauna ZE & Kimchi-Sarfaty C Exposing synonymous mutations. Trends Genet 30, 308–21 (2014). [DOI] [PubMed] [Google Scholar]
- 25.Lee S, Abecasis GR, Boehnke M & Lin X Rare-variant association analysis: study designs and statistical tests. Am J Hum Genet 95, 5–23 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Rentzsch P, Witten D, Cooper GM, Shendure J & Kircher M CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res 47, D886–D894 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Sundaram L. et al. Predicting the clinical impact of human mutation with deep neural networks. Nat Genet 50, 1161–1170 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Fafian-Labora J. et al. FASN activity is important for the initial stages of the induction of senescence. Cell Death Dis 10, 318 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Brosh RM Jr. & Bohr VA Human premature aging, DNA repair and RecQ helicases. Nucleic Acids Res 35, 7527–44 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Lin JR, Zhang Q, Cai Y, Morrow BE & Zhang ZD Integrated rare variant-based risk gene prioritization in disease case-control sequencing studies. PLoS Genet 13, e1007142 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Tasan M. et al. Selecting causal genes from genome-wide association studies via functionally coherent subnetworks. Nat Methods 12, 154–9 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Lupton MK et al. The role of ABCA1 gene sequence variants on risk of Alzheimer's disease. J Alzheimers Dis 38, 897–906 (2014). [DOI] [PubMed] [Google Scholar]
- 33.Sims R. et al. Rare coding variants in PLCG2, ABI3, and TREM2 implicate microglial-mediated innate immunity in Alzheimer's disease. Nat Genet 49, 1373–1384 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Liu DJ & Leal SM Replication strategies for rare variant complex trait association studies via next-generation sequencing. Am J Hum Genet 87, 790–801 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Beecham GW et al. The Alzheimer's Disease Sequencing Project: Study design and sample selection. Neurol Genet 3, e194 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Lek M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–91 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Mahley RW Apolipoprotein E: from cardiovascular disease to neurodegenerative disorders. J Mol Med (Berl) 94, 739–46 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Erikson GA et al. Whole-Genome Sequencing of a Healthy Aging Cohort. Cell 165, 1002–11 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Eichner JE et al. Apolipoprotein E polymorphism and cardiovascular disease: a HuGE review. Am J Epidemiol 155, 487–95 (2002). [DOI] [PubMed] [Google Scholar]
- 40.Liu H. et al. Augmented Wnt signaling in a mammalian model of accelerated aging. Science 317, 803–6 (2007). [DOI] [PubMed] [Google Scholar]
- 41.Kirkwood TB & Finch CE Ageing: the old worm turns more slowly. Nature 419, 794–5 (2002). [DOI] [PubMed] [Google Scholar]
- 42.Kirkwood TB et al. What accounts for the wide variation in life span of genetically identical organisms reared in a constant environment? Mech Ageing Dev 126, 439–43 (2005). [DOI] [PubMed] [Google Scholar]
- 43.Caruso A. et al. Inhibition of the canonical Wnt signaling pathway by apolipoprotein E4 in PC12 cells. J Neurochem 98, 364–71 (2006). [DOI] [PubMed] [Google Scholar]
- 44.Klaus A & Birchmeier W Wnt signalling and its impact on development and cancer. Nat Rev Cancer 8, 387–98 (2008). [DOI] [PubMed] [Google Scholar]
- 45.Palomer E, Buechler J & Salinas PC Wnt Signaling Deregulation in the Aging and Alzheimer's Brain. Front Cell Neurosci 13, 227 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Ng LF et al. WNT Signaling in Disease. Cells 8(2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Timmers PR et al. Genomics of 1 million parent lifespans implicates novel pathways and common diseases and distinguishes survival chances. Elife 8(2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Yang J. et al. MiR-34 modulates Caenorhabditis elegans lifespan via repressing the autophagy gene atg9. Age (Dordr) 35, 11–22 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Piazzesi A. et al. Replication-Independent Histone Variant H3.3 Controls Animal Lifespan through the Regulation of Pro-longevity Transcriptional Programs. Cell Rep 17, 987–996 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Reichwald K. et al. High tandem repeat content in the genome of the short-lived annual fish Nothobranchius furzeri: a new vertebrate model for aging research. Genome Biol 10, R16 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Course MM et al. Evolution of a Human-Specific Tandem Repeat Associated with ALS. Am J Hum Genet 107, 445–460 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Flannick J. et al. Exome sequencing of 20,791 cases of type 2 diabetes and 24,440 controls. Nature 570, 71–76 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Chang CC et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Dewey FE et al. Inactivating Variants in ANGPTL4 and Risk of Coronary Artery Disease. N Engl J Med 374, 1123–33 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Das S. et al. Next-generation genotype imputation service and methods. Nat Genet 48, 1284–1287 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.McCarthy S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet 48, 1279–83 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Euesden J, Lewis CM & O'Reilly PF PRSice: Polygenic Risk Score software. Bioinformatics 31, 1466–8 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Choi SW & O'Reilly PF PRSice-2: Polygenic Risk Score software for biobank-scale data. Gigascience 8(2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Jansen IE et al. Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer's disease risk. Nat Genet (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Nikpay M. et al. A comprehensive 1,000 Genomes-based genome-wide association meta-analysis of coronary artery disease. Nat Genet 47, 1121–1130 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Mahajan A. et al. Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps. Nat Genet 50, 1505–1513 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Malik R. et al. Multiancestry genome-wide association study of 520,000 subjects identifies 32 loci associated with stroke and stroke subtypes. Nat Genet 50, 524–537 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Schumacher FR et al. Association analyses of more than 140,000 men identify 63 new prostate cancer susceptibility loci. Nat Genet 50, 928–936 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Michailidou K. et al. Association analysis identifies 65 new breast cancer risk loci. Nature 551, 92–94 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Wolpin BM et al. Genome-wide association study identifies multiple susceptibility loci for pancreatic cancer. Nat Genet 46, 994–1000 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Wang X. Firth logistic regression for rare variant association tests. Front Genet 5, 187 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Wu MC et al. Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet 89, 82–93 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Chen J, Bardes EE, Aronow BJ & Jegga AG ToppGene Suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res 37, W305–11 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Kuleshov MV et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res 44, W90–7 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Raudvere U. et al. g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Res 47, W191–W198 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Subramanian A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102, 15545–50 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Oakes D & Peterson DR Survival methods: additional topics. Circulation 117, 2949–55 (2008). [DOI] [PubMed] [Google Scholar]
- 73.Chen H. et al. Comprehensive assessment of computational algorithms in predicting cancer driver mutations. Genome Biol 21, 43 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Flachsbart F. et al. Immunochip analysis identifies association of the RAD50/IL13 region with human longevity. Aging Cell 15, 585–8 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Muller N. et al. IL-6 blockade by monoclonal antibodies inhibits apolipoprotein (a) expression and lipoprotein (a) synthesis in humans. J Lipid Res 56, 1034–42 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Van Hout CV et al. Exome sequencing and characterization of 49,960 individuals in the UK Biobank. Nature 586, 749–756 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All summary statistics for the longevity association of rare coding variants in our Ashkenazi Jewish longevity cohort are available at http://zdzlab.einsteinmed.org/1/longevity.html. Due to privacy concerns for our research participants, individual-level genetic data from the Einstein longevity study are not publicly available; however, anonymized data will be shared by request from a qualified academic investigator as long as the data transfer is approved by the Institutional Review Board and regulated by a material transfer agreement. The German longevity cohort data are part of the PopGen Biobank (Schleswig-Holstein, Germany) and can be accessed through a Material Data Access Form (http://www.uksh.de/p2n/Information+for+Researchers.html). Sequence and phenotype data of the UK Biobank and the ADSP cohorts are available at https://bbams.ndph.ox.ac.uk/ams/ and https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000572, respectively. All software used in our analyses was open source and was described in the Methods section.
