Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2024 Jan 15;14:1306. doi: 10.1038/s41598-023-50495-5

Biologically informed deep learning for explainable epigenetic clocks

Aurel Prosz 1, Orsolya Pipek 2, Judit Börcsök 1,3, Gergely Palla 4,5, Zoltan Szallasi 1, Sandor Spisak 6,, István Csabai 2
PMCID: PMC10789766  PMID: 38225268

Abstract

Ageing is often characterised by progressive accumulation of damage, and it is one of the most important risk factors for chronic disease development. Epigenetic mechanisms including DNA methylation could functionally contribute to organismal aging, however the key functions and biological processes may govern ageing are still not understood. Although age predictors called epigenetic clocks can accurately estimate the biological age of an individual based on cellular DNA methylation, their models have limited ability to explain the prediction algorithm behind and underlying key biological processes controlling ageing. Here we present XAI-AGE, a biologically informed, explainable deep neural network model for accurate biological age prediction across multiple tissue types. We show that XAI-AGE outperforms the first-generation age predictors and achieves similar results to deep learning-based models, while opening up the possibility to infer biologically meaningful insights of the activity of pathways and other abstract biological processes directly from the model.

Subject terms: Ageing, Machine learning

Introduction

Aging, defined as some form of functional decline over time, has always attracted a considerable interest among humankind, and has been in the focus of intense research from a wide range of perspectives1. According to the related studies, certain biomarkers can rather precisely predict the functional capability of tissues, organs and even patients2,3. Furthermore, age-related biomarkers enable the introduction of the concept of biological age4,5, which can bring additional information in the risk assessments for age-related conditions on top of chronological age.

One of the most promising age-predictive biomarkers are the ones based on DNA-methylation68, which can be used for basically any source of DNA from sorted cells through tissues to organs.

Age-related changes in DNA methylomes are generally occurring processes, during which up to 2–14% of all cytosine-guanine dinucleotide (CpG) sites display consistent changes in their methylation levels throughout ageing918.

The combination of multiple CpGs or even individual CpG sites are often used to estimate the chronologial age of cells, tissues, or individuals based on their DNA methylation levels, and are generally referred to as epigenetic age estimators or epigenetic clocks. The obtained estimated age is often referred to as DNAm age, or epigenetic age8, which is highly correlated with chronological age, but also affected by other biological factors13,1922 such as health status.

Typically developed using supervised machine learning methods, DNA methylation-based age estimators often employ penalized regression models. These models are designed to autonomously identify the CpGs that are most informative for estimating age8,23. However, the construction of a multi-tissue DNA methylation based age estimator is non-trivial, due to the significant differences between different tissues19,20 and the distinct biological processes that drive the observed age-related hypermethylation and hypomethylation. The first multi-tissue DNA methylation-based age estimator became widely known as Horvath’s clock6 (proposed by Steve Horvath), which relied on elastic net regression that selected altogether 353 CpGs from the overall 27k CpG dinucleotides in the data it was trained on, corresponding to about 8,000 microarray samples collected from patients of all ages between children and elderly. Aside some limitations24,25, Horvath’s clock proved to be a remarkably accurate age estimator in a variety of studies, yielding precise results for diverse DNA sources spanning the whole human lifespan8, e.g., together with other similar DNA-methylation-based clocks2628, Horvath’s clock was used to quantify the effectiveness of a program designed to regenerate the thymus, where the mean epigenetic age was 1.5 years younger than baseline after one year of treatment29. Possible relations between epigenetic aging and the previously identified aging hallmarks are in the focus of on going research, and very recent related results have shown that although epigenetic aging is distinct from genomic instability, cellular senescence and telomere attrition, it is associated with nutrient sensing, mitochondrial activity and stem cell composition30.

With the advent of the overwhelming success of neural network-based techniques and deep learning methods in pattern recognition problems in general, it became another natural alternative to use these approaches for the estimation of biological age3134. However, in spite of their high accuracy, the way neural networks make predictions about the age of input samples is difficult to interpret, and their operation is somewhat analogous to a “black box” method, where we have no explanation regarding why some methylation profiles are estimated to be older or younger compared to others. The need for interpretable neural network-based methods has risen also in the broader field of computational biology, and a very promising advancement in this direction was achieved by Elmarakeby et al.35 by the introduction of a biologically informed deep learning tool for predicting the state of prostate cancer and evaluating molecular drivers of treatment resistance for therapeutic targeting. The suggested model used a huge collection of curated biological pathways to construct a pathway-aware multi-layered hierarchical deep learning network, thereby incorporating previously acquired biologically established hierarchical knowledge in a neural network language.

Inspired by this, here we propose a similar, biologically informed, explainable deep learning model for predicting the chronological age across multiple tissue types based on their methylation profiles. The structure of the neural network follows the hierarchy dictated by the biological pathways, in complete analogy with the tool presented by Elmarakeby et al.35. We compare the performance of the obtained method to that of elastic net regression in different use cases, including e.g., the data set by Gill et al.36 related to the rejuvenation of fibroblast cells. According to these studies, beside a slight gain in the prediction precision, the most important benefit of our approach is given by the versatile possibilities for comparing the importance of different CpGs, genes, biological pathways or entire pathway branches and layers in predicting the age across the human lifespan.

Results

Explainable deep-learning age prediction model

We created a deep learning prediction model named XAI-AGE (XAI stands for Explainable AI) that integrates previously identified biologically hierarchical information in a neural network model for predicting the biological age based on DNA methylation data. The training of the model relied on the available chronological age of the patients in the training set. The construction of this pathway-aware multilayered hierarchical network was based on 3007 manually curated biological pathways parsed from the Reactome Pathway Knowledgebase37. The individual’s molecular profile as DNA methylation beta values was entered into the XAI-AGE model as input and spread across a layer of nodes representing a set of genes through weighted links. This input layer can be extended in a modular way to incorporate multiple data modalities, such as gene expression, gene mutation status or other measurable features representable on the gene level.

Subsequent layers of the network encode a collection of routes with increasing degrees of abstraction, representing complicated biological activities. The layers closer to the input layer correspond to finer biological pathways and deeper layers represent the higher levels of the hierarchy in the Reactome Pathway Knowledgebase as illustrated in Fig.1. The connections between various layers are bound to follow known descendant-ascendant relations among encoded properties, genes, and pathways, making the network interpretable by design. The architecture of the model is shown in more details in Supplementary Table S2 in the Supplementary Material.

Figure 1.

Figure 1

Schematic representation of the XAI-AGE model. The gene-CpG layer receives input data in the form of CpG methylation beta values. From here on, the information propagates in a restricted manner where nodes in the later layers are connected only if they are annotated jointly with the given node in the current layer according to the ReactomeDB. From left to right, the model’s abstraction becomes progressively more complex, as later levels in the neural network correspond to higher levels in the hierarchy defined by the ReactomeDB. The prediction of the chronological age for any given sample is given by the arithmetic mean of the outputs obtained for each individual layer. As indicated, the input layer can be easily extended by any new modalities which can be represented on the level of genes35.

To determine the relative importance of particular genes, pathways and biological processes contributing to the model prediction, we examined each layer and used the DeepLIFT38 attribution approach to get the overall importance score of the neurons. Since the architecture is constrained by the underlying genes and biological processes, we can assume that the obtained importance scores can be used to test biological hypotheses across different subsets of the data. We note that the importance score is also a signed quantity, making it possible to infer trends in the dataset, however, the exact meaning of the direction is still not well understood. Hence, we included both positive and negative trends found during the analysis.

Analysis of a pan-tissue data set

The XAI-AGE model was trained and first tested on a pan-tissue data set (details are given in the Methods), and for comparison, an elastic net regression model similar to Horvath’s original regressor was also trained and evaluated on the same dataset. The performance of the two models was measured using the Pearson correlation coefficient and the median absolute error (MAE)39. As indicated in Fig. 2, we obtained 3 years MAE for the elastic net (Fig. 2A), and 2.83 years MAE for XAI-AGE (Fig. 2B) on the test set of the pan-tissue dataset, whereas the Pearson’s correlation coefficient was 0.97 for both models. Furthermore, the two models showed high correlation with each other as well when considering either the predicted age (Fig. 2C) or the age acceleration (Fig. 2D), defined as the difference between the predicted age and the chronological age. To further validate the XAI-AGE model’s performance, the results were replicated in a 5-fold cross-validation setting, where an artificial neural network, where all the neurons are connected between the layers (fully connected dense network) were trained as well and compared to XAI-AGE and the elastic net models (Supplementary Fig. S1). According to these tests, the MAE was significantly lower for XAI-AGE when compared to the elastic net model (Mann-Whitney U test, p-value = 0.028), while the dense fully connected neural network outperformed both models. However, it is important to note that the dense network contained more than 200 times more parameters. The neural network architecture for the fully connected dense model is shown on Supplementary Table S3.

Figure 2.

Figure 2

Predicted age as a function of the chronological age. We show the scatter plot of the age estimated based on the methylation data according to elastic net regression (A) and according to XAI-AGE (B). The predicted age and the age acceleration (defined by subtracting the chronological age from the predicted age) for the elastic net as a function of the same quantity according to the XAI-AGE model is also shown in (C,D). The Pearson correlation coefficients and the median absolute errors are indicated beside the plots, the number of samples were n=1619.

The performance of the model was also examined by considering the various tissue types as displayed in Supplementary Fig. S2 in the Supplementary Material. By taking into consideration the varying amount of observations for the different tissue types, the results indicate that XAI-AGE provided the most accurate results for whole blood and blood PBMC tissue types, but performed poorly for blood cord, bone marrow, and esophagus.

Next, we investigated the explainable representations that XAI-AGE learnt from the pan-tissue cohort. Using the DeepLIFT attribution approach38, the feature importance scores were retrieved from each layer and neuron in the model. The top six characteristics that exhibited the greatest change between the beginning and the end of the timeline (from chronological age zero to the maximum of the cohort) were further classified based on whether they caused a positive or negative trend.

In Fig. 3, we display the results for the last layer (corresponding to the top level in the hierarchy of the ReactomeDB), whereas similar plots for the other layers are presented in the Supplementary Material (Supplementary Figs. S3S7).

Figure 3.

Figure 3

Standardized importance score as a function of chronological age. We show the z-score based on the distribution of the importance scores for the neurons (features) in the last layer of the network (corresponding to the top level in the hierarchy of the ReactomeDB) with the largest change in the z-score across the chronological age. The top 3 features where the trend in the z-score is negative are displayed in (A), whereas the top 3 features where the z-score is increasing with age are shown in (B).

From the features with a decreasing z-score over time, the top three features included the DNA Repair (R-HSA-73894), Chromatin organization (R-HSA-4839726) and the Reproduction (R-HSA-1474165) pathways. The top features where an increasing trend was observed in the z-score consisted of the Transport of small molecules (R-HSA-382551), Extracellular matrix organization (R-HSA-1474244) and a general pathway category called Disease (R-HSA-1643685). Interestingly, the latter exhibits a particular dynamics during the aging process, it remains constant until approximately the age of 70 then switches to a rapidly increasing tendency.

To demonstrate the advantages of XAI-AGE even further, a Plotly Dash graphical interface was built40, that renders Sankey plots similar to the one presented in Ref.35. This enables interactive navigation between the different layers of the network (each corresponding to a given level in the hierarchy of biological pathways according to the ReactomeDB), highlighting the features that contribute the most to the predictions (accessible at: https://k8plex-krft.vo.elte.hu/notebook/report/xgrp0j-sankeymethyl/).

Since the links in this network indicate that the given pair of nodes are annotated to be related according to the ReactomeDB, one can track the flow of information between the layers, and infer the relevant sources that contributed to the prediction. As an illustration, in Fig. 4, we show the layer-wise standardized and ordered importance score for the samples in the pan-tissue dataset.

Figure 4.

Figure 4

Sankey diagram of the top features calculated across all the samples in the pan-tissue dataset. The visualization of XAI-AGE layer structure shows the normalized relative importance score difference between the old (>65 years) and young samples for various nodes inside each layer. Darker colours indicate a larger difference between the importance scores for the two age groups at the given node. Only the top 5 nodes in each layer are displayed here, where the other nodes are indicated by the single semi-transparent nodes (named residual) at the bottom for each layer.

Measuring the biological age during fibroblast reprogramming

We also applied XAI-AGE to estimate the biological age of dermal fibroblast cells derived from middle age donors used in the reprogramming study by Gill et al.36. In this study, the cells were harvested for DNA methylation and RNA-sequencing in different time points during the reprogramming process. In the present study we used the methylation data for calculating the biological age predicted by XAI-AGE for both the treated and the (non-treated) control cells. According to Fig. 5, our age estimation framework gave results similar to that obtained by the Horvath clock-like elastic net. Both epigenetic clocks precisely predicted the biological age of the cells in the negative control and failed to reprogram group, as well as a significant drop for the transiently reprogrammed cells. However, according to the original study by Gill et al.36, the methylation levels go significantly down across all gene groups for these cells, which could provide a simple explanation for this effect. As anticipated, the predicted biological age of the iPSC cells was close to zero. Interestingly, the predicted biological age of the negative control cells shows a positive trend in time, consistent with the recent findings by Levine et al.41.

Figure 5.

Figure 5

Comparing the biological age during the time-dependent differentiation process of human fibroblast cells from three donors. At time zero, one can observe undifferentiated fibroblast cells, and as time progresses, the biological age as determined by (A) elastic net and (B) XAI-AGE models are calculated. The cells can be further subdivided into categories that include fibroblasts utilized as negative controls, cells that failed to be transiently reprogrammed, and cells that were successfully transiently reprogrammed. The differentiation endpoint of induced pluripotent stem cells is also displayed on the final day of the measurement. The error bars represent the standard deviation across the three donors. A small positive trend can be also observed in the negative control group for the predicted age with slightly better correlation for XAI-AGE (R=0.38, p=0.0038) than for elastic net (R = 0.2, p = 0.23).

Similarly to the analysis of the pan-tissue cohort, we calculated the importance scores for both the individual neurons and all the layers. This allowed the study of the importance in the age prediction of the different features and biological pathways during the reprogramming process. The results for the last layer in the neural network (highest level in the biological pathway hierarchy) are displayed in Fig. 6, showing the top six features according to the magnitude of the change over time for the negative controls and unsuccessfully reprogrammed cells (Fig. 6A), as well as for the transiently reprogrammed cells (Fig. 6B). The time-dependent dynamics of the importance scores shows interesting differences between the negative controls and the reprogrammed cells, e.g., the Metabolism of proteins (R-HSA-392499) and Muscle contraction (R-HSA-397014) significantly changed in the negative direction in the transiently reprogramming group, while the Chromatin organization (R-HSA-4839726) and Circadian clock (R-HSA-400253) increased. The extracted importance scores from the other layers of the model can be seen on Supplementary Figs. S8S12.

Figure 6.

Figure 6

Time dependence of the standardised importance score values in the human fibroblast experiment. We show the six scores from the last layer of the network where the largest deviation can be observed between the negative control or the failed to reprogram group (A) and the transiently reprogrammed group (B).

Biological age in umbilical cord plasma transfusion

As a further application of XAI-AGE, we also analysed a recently published dataset by Clement et al.42, related to umbilical cord plasma transfusion. Heterochronic parabiosis studies have shown favorable benefits in aged animals getting youthful blood across a variety of tissues43. The study presented in Ref.42, examined whether infusion of plasma or plasma-derived factors from young donors could be used to mitigate human age-related conditions by administering human umbilical cord plasma concentrate to elderly patients (n = 18, mean age = 74) and monitoring epigenetic age-related measures for a period of 10 weeks. The authors have shown that the treatment lowered DNA methylation-based GrimAge measure by an average of 0.82 years, indicating a decrease in the risk of morbidity and mortality. However, other epigenetic clocks that estimate chronological age did not detect a significant age-reversal effect.

In the present work, using this data, we first estimated the chronological age of the individuals using XAI-AGE. The comparison between the predicted age and the chronological age stratified by the pre-treatment and post-treatment samples is shown in the Supplementary Material in Supplementary Fig. S13, indicating a high correlation between the two variables. Next, we compared the age acceleration (corresponding to the difference between the estimated value and the actual chronological age) predicted by XAI-AGE between the two groups of samples derived from the same individuals, similarly as was described in42. A paired t-test was performed and reported no significant changes.

Furthermore, the importance score for each feature in each layer was extracted and compared between the pre-treatment and post-treatment groups. In Supplementary Figs. S14S19, the six top features from the last layer according to the magnitude of the difference between the two groups are shown, of which three correspond to the top features where this difference is positive, and the other three are the top features where the difference is negative. Our results indicate that the Cell-cycle (R-HSA-1640170), Cell-Cell communication (R-HSA-1500931) and the Reproduction (R-HSA-1474165) pathways were more important in the post-treatment samples, while the Circadian clock (R-HSA-400253), Mitophagy (R-HSA-5205647) and the Vesicle-mediated transport (R-HSA-5653656) pathways were more important in the pre-treatment group. Overall, the XAI-AGE results are less informative for this data set that may indicate that either the input data is not robust enough or may indicate weak points of XAI-AGE.

More extended data analyses of the results and the comparison of the importance scores from the other layers of the network are described in the Supplementary Materials.

Discussion

In this paper, we present an accurate and explainable neural network architecture allowing not only the estimation of age based on DNA methylation data with high precision but also the easy interpretation of results that are comparable across tissues, age groups, and differentiation processes in the case of cell lines. The resulting model can be used to generate hypotheses and visualize the underlying mechanisms connected to aging. We have demonstrated this feature of the model by examining the importance scores of the individual neurons in predicting the age when the neural network was trained on different datasets. In this aspect, probably the most noteworthy result was obtained for the pan-tissue dataset, where the standardised importance score for the Disease pathway (corresponding to a neuron in the last layer of the neural network) displayed a particular behaviour when plotted as a function of age, showing a roughly constant flat curve that is replaced by a rapidly increasing function at the age of 70.

The second important observation is related to the DNA Repair pathway, which demonstrated a decreasing tendency in the pan-tissue cohort when the importance z-score was visualized as a function of age (Fig. 3A). The DNA repair pathway is part of the DNA damage response system that is responsible for the maintenance of genome integrity. Living organisms are constantly exposed to exogenous and endogenous DNA damage. Unrepaired or faulty repair of DNA damage leads to the accumulation of somatic mutations as an organism ages, making genome instability a hallmark of aging1. The importance of DNA repair mechanisms to counteract the time- and exposure-dependent accumulation of DNA damage is highlighted by the fact that inherited mutations in genes that are involved in these pathways underlie several segmental premature ageing-like syndromes in humans44. Our result is in agreement with accumulating evidence suggesting that the integrity and maintenance of the genome are strongly associated with aging45,46. The Chromatin organization pathway was also selected as one of the top decreasing features in the last layer of the network based on the change in the importance z-score across the chronological age of the individuals (Fig. 3A). The Chromatin organization pathway includes chromatin modifying enzymes involved in processes that result in the specification, formation or maintenance of the physical structure of eukaryotic chromatin. The identification of this pathway as one of the top features in the XAI-AGE network is coherent with the well-established fact that epigenetic changes affecting DNA methylation patterns, histone modifications and chromatin remodeling are the hallmarks of ageing1.

Biological pathways that demonstrated the largest difference between the importance scores of the old (> 65 years) and young samples are shown in Fig. 4. In the third layer, one of the top 5 nodes is the Mitotic metaphase and anaphase pathway that regulate the proper segregation of chromosomes into daughter cells. Recently, several epigenetic mitotic clocks were developed, such as epiTOC47, epiTOC248 and solo-WCGW49. epiTOC and epiTOC2 rely on CpG sites in CpG-rich regions that are marked by the polycomb repressive complex 2 (PRC2), which are generally unmethylated across numerous different fetal tissue types, to calculate the rate of stem cell division47,48. On the other hand, solo-WCGW focuses on DNA methylation loss at partially methylated domains (PMDs) that showed increased hypomethylation with age and appeared to track the accumulation of cell divisions49. It seems that the identification of the Mitotic metaphase and anaphase pathway as significantly different between old and young individuals by the XAI-AGE model captures a different association between mitotic processes and ageing than the previously described epigenetic mitotic clocks since we did not identify overlapping genes between the Mitotic metaphase and anaphase pathway and the described epigenetic mitotic models. However, a detailed analysis of these interesting findings is to be explored in further studies.

We calculated the standardised importance scores from the last layer of the XAI-AGE model using the data from a fibroblast rejuvenation experiment36. The largest difference between the negative control or failed to reprogram group and the transiently reprogrammed group were observed in six biological pathways (Fig. 6). Among these pathways are the Extracellular matrix organization and the Muscle contraction pathways that likely to reflect the observations made by Gill and colleagues that the reprogrammed fibroblasts produced youthful levels of collagen proteins, and showed partial functional rejuvenation of their migration speed36. Interestingly, the Circadian clock pathway and several known associated pathways, such as the Cellular response to external stimuli, Chromatin organization and Metabolism of proteins, were also identified as important by the XAI-AGE model in the fibroblast reprogramming process during which the DNA methylation age measured by the multi-tissue epigenetic clock was significantly decreased36. The circadian clock is an endogenous, biological timing mechanism that responds to several external stimuli to maintain the synchronization of internal biological processes among themselves and with exogenous environmental cycles50. The core clock genes, including CLOCK1, BMAL1, PER and CRY genes, are rhythmically expressed and form a negative feedback loop that drives circadian oscillations.

The underlying transcription-translation feedback system of the circadian clock regulates the expression of clock-controlled genes that are involved in various processes, e.g., metabolism and chromatin remodelling51. A growing body of evidence suggests a link between the disruption of the circadian rhythms and ageing. Studies have shown that disturbances in the circadian clock and sleep homeostasis are linked to increased incidence of a variety of age-related health problems, such as neurodegenerative diseases, metabolic disorders, cardiovascular disease, obesity and cancer5255. Furthermore, the transcription factor BMAL1, which is the co-activator of the circadian clock, exhibited decreased regulatory activity with age independently from cell-type and tissue-type56.

According to the chrono-epigenetic theory, circadian oscillations of cytosine modification at specific CpG sites are robust in young individuals but diminish with age, potentially as a result of changed activity of ten-eleven translocation (TET) and DNA methyltransferase (DNMT) maintenance enzymes. Age-related changes in amplitudes of the oscillations precede linear DNA methylation changes and might predict age-dependent linear outcomes57. Our results suggest that the synchronization of oscillatory rhythms of internal biological processes is associated not only with ageing but also with rejuvenation of human cells by maturation phase transient reprogramming.

Additional advantages of the model include the modular construction of the underlying neural network: the input layers can be modified to incorporate additional modalities, allowing the integration of multiomics data, as demonstrated in an analogous example by Gill et al.36. In the case of age prediction, the logical next step would be to include RNA-seq data alongside DNA methylation values in the model. This can be easily accomplished by vertically increasing the input layer in the model and making the new data modalities representable at the level of genes. This modularity applies to the deeper layers in the model (corresponding to higher levels in the pathway hierarchy according to the ReactomeDB) as well. The core structure in the Reactome Pathway can be freely altered, enlarged, or replaced by another database. Along this line, the incorporation of the so-called Hallmarks of Aging1 into the interaction network to make it more aging-specific is an intriguing study topic for the future.

There are also some limitations to our analysis, e.g., compared to other deep learning based models like DeepMAge optimised solely for prediction accuracy, XAI-AGE performs worse by around half a year MAE34,58. Regarding our training data, there are substantial class imbalances of tissues and age groups, and batch effects from the various data sources that may be included can potentially bias the results. A curation bias can also alter the results of the Reactome Pathway Database, which is another issue. For instance, the HIV pathway was over-represented in our data, which may play little role in predicting the biological age because the same critical age-related genes are present in several pathways and the neural network amplifies the value of these neurons for better prediction. Pre-training the neural network on the CpG data level (for instance, by changing the architecture to an Autoencoder) and then fine-tuning it to predict the biological age is a potential solution to this issue.

Clarifying the causal relationship between the many CpG-s, genes, and biological processes associated with aging would be a future goal for biologically informed deep learning approaches. The primary advantage of XAI-AGE over other epigenetic clocks is the direct comparison and inference of relationships between more abstract data layers than using raw input data alone. Further supplementation of the model with additional biological data modalities, such as incorporating RNA-seq at the gene level or evaluating the data as a time series, as we demonstrated with the fibroblast reprogramming dataset, could facilitate the future discovery of causal relationships. Using XAI-AGE could assist by analyzing computationally its interpretable network, or by domain experts using the Sankey diagram interactive visualization.

Methods and data

Applied data sources in this study

All data used in the study is publicly available. The complete list of the data sources are shown in Supplementary Table S1. All methods were carried out in accordance with relevant guidelines and regulations.

Training the model on the pan-tissue data set

We trained and tested XAI-AGE with a set of 6547 patient samples across 54 cohorts and multiple tissues (Supplementary Table S1), divided into 75% training, 25% testing, to predict the chronological age based on the DNA methylome of the individuals. This estimation was later used to also infer the biological age, defined by the chronological age prediction of the model.

Fibroblast cell reprogramming data

In the study by Gill et al.36 the cells were harvested for DNA methylation and RNA-sequencing in different time points during the reprogramming process. Altogether 96 cells were analysed during the study from three different individuals which can be further subdivided into the categories of cells that were measured prior to the reprograming phase (fibroblasts), negative controls that received mock treatment, cells that failed to reprogram, and cells that transiently reprogrammed successfully. Cells that had been fully reprogrammed (iPSC) and were sampled on the final day were also measured.

Umbilical cord plasma transfusion data

The dataset contains 36 whole-blood samples collected at the beginning and at the end of the 10-week experiment period. In the present study, we used the already trained XAI-AGE model to estimate the biological age and biological age acceleration for each sample, and the results were compared between the pre-treatment and post-treatment groups.

Supplementary Information

Acknowledgements

Supported by the the European Union project RRF-2.3.1-21-2022-00004 within the framework of the MILAB Artificial Intelligence National Laboratory. G.P. received funding partly from the European Union’s Horizon 2020 research and innovation programme under grant agreement no. 101021607 and from the National Research, Development and Innovation Office under grant no. K128780. S.S. received funding from National Research Development and Innovation Office Hungary, under grant no. FK142835.

Author contributions

I.C., G.P., S.S. and A.P. devised the project, the main conceptual ideas and proof outline. O.P. performed the data collection. A.P. implemented the model, processed the experimental data, performed the analysis and designed the figures. A.P. and G.P. drafted the manuscript. J.B., Z.S. and S.S aided in interpreting the results and worked on the manuscript. All authors provided critical feedback and helped shape the research, the analysis, and the manuscript. All authors reviewed the manuscript.

Funding

Open access funding provided by HUN-REN Research Centre for Natural Sciences.

Data and materials availability

All data used in the study were downloaded from the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/) and from the The Cancer Genome Atlas data portal (https://portal.gdc.cancer.gov/) databases. The corresponding dataset ID-s are listed in the supplementary information file. Any data not public can be requested.

Code availability

The code for running inferences with the XAI-AGE model can be accessed at: https://github.com/Paureel/XAI-AGE.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-023-50495-5.

References

  • 1.López-Otín C, Blasco MA, Partridge L, Serrano M, Kroemer G. The hallmarks of aging. Cell. 2013;153(6):1194–1217. doi: 10.1016/j.cell.2013.05.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Baker G, Sprott R. Biomarkers of aging. Exp. Gerontol. 1988;23:223–239. doi: 10.1016/0531-5565(88)90025-3. [DOI] [PubMed] [Google Scholar]
  • 3.Warner HR. The future of aging interventions. J. Gerontol. A. 2004;59:B692–B696. doi: 10.1093/gerona/59.7.B692. [DOI] [Google Scholar]
  • 4.Jylhävä J, Pedersen NL, Hägg S. Biological age predictors. EBioMedicine. 2017;21:29–36. doi: 10.1016/j.ebiom.2017.03.046. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Field AE, Wang T, Havas A, Ideker T, Adams PD. Dna methylation clocks in aging: Categories, causes, and consequences. Mol. Cell. 2018;71:882–895. doi: 10.1016/j.molcel.2018.08.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Horvath S. Dna methylation age of human tissues and cell types. Genome Biol. 2013;14:R115. doi: 10.1186/gb-2013-14-10-r115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Lee HY, Lee SD, Shin K-J. Forensic DNA methylation profiling from evidence material for investigative leads. BMB Rep. 2016;49:359–369. doi: 10.5483/BMBRep.2016.49.7.070. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Horvath S, Raj K. DNA methylation-based biomarkers and the epigenetic clock theory of ageing. Nat. Rev. Genet. 2018;19:371–384. doi: 10.1038/s41576-018-0004-3. [DOI] [PubMed] [Google Scholar]
  • 9.Berdyshev G, Korotaev G, Boiarskikh G, Vaniushin B. Nucleotide composition of DNA and RNA from somatic tissues of humpback and its changes during spawning. Biokhimiia. 1967;31:988–993. [PubMed] [Google Scholar]
  • 10.Ahuja N, Li Q, Mohan AL, Baylin SB, Issa JP. Aging and DNA methylation in colorectal mucosa and cancer. Cancer Res. 1998;58:5489–5494. [PubMed] [Google Scholar]
  • 11.Fraga MF, Esteller M. Epigenetics and aging: The targets and the marks. Trends Genet. 2007;23(8):413–418. doi: 10.1016/j.tig.2007.05.008. [DOI] [PubMed] [Google Scholar]
  • 12.Bollati V, Schwartz J, Wright R, Litonjua A, Tarantini L, Suh H, Sparrow D, Vokonas P, Baccarelli A. Decline in genomic DNA methylation through aging in a cohort of elderly subjects. Mech. Ageing Dev. 2009;130(4):234–239. doi: 10.1016/j.mad.2008.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Christensen BC, Houseman EA, Marsit CJ, Zheng S, Wrensch MR, Wiemels JL, Nelson HH, Karagas MR, Padbury JF, Bueno R, Sugarbaker DJ, Yeh R-F, Wiencke JK, Kelsey KT. Aging and environmental exposures alter tissue-specific DNA methylation dependent upon CPG island context. PLoS Genet. 2009;5:e1000602. doi: 10.1371/journal.pgen.1000602. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Rodríguez-Rodero S, Fernández-Morera J, Fernandez A, Menéndez-Torre E, Fraga M. Epigenetic regulation of aging. Discov. Med. 2010;10:225–233. [PubMed] [Google Scholar]
  • 15.Mugatroyd C, Yonghe W, Bockmühl Y, Spengler D. The Janus face of DNA methylation in aging. Aging. 2010;2(2):107–110. doi: 10.18632/aging.100124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Teschendorff AE, Menon U, Gentry-Maharaj A, Ramus SJ, Weisenberger DJ, Shen H, Campan M, Noushmehr H, Bell CG, Peter Maxwell A, Savage DA, Mueller-Holzner E, Marth C, Kocjan G, Gayther SA, Jones A, Beck S, Wagner W, Laird PW, Jacobs IJ, Widschwendter M. Age-dependent DNA methylation of genes that are suppressed in stem cells is a hallmark of cancer. Genome Res. 2010;20(4):440–446. doi: 10.1101/gr.103606.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Bell JT, Tsai P-C, Yang T-P, Pidsley R, Nisbet J, Glass D, Mangino M, Zhai G, Zhang F, Valdes A, Shin S-Y, Dempster EL, Murray RM, Grundberg E, Hedman AK, Nica A, Small KS. Epigenome-wide scans identify differentially methylated regions for age and age-related phenotypes in a healthy ageing population. PLoS Genet. 2012;8:e1002629. doi: 10.1371/journal.pgen.1002629. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Zheng SC, Widschwendter M, Teschendorff AE. Epigenetic drift, epigenetic clocks and cancer risk. Epigenomics. 2016;8(5):705–719. doi: 10.2217/epi-2015-0017. [DOI] [PubMed] [Google Scholar]
  • 19.Bernstein BE, Stamatoyannopoulos JA, Costello JF, Ren B, Milosavljevic A, Meissner A, Kellis M, Marra MA, Beaudet AL, Ecker JR, Farnham PJ, Hirst M, Lander ES, Mikkelsen TS, Thomson JA. The NIH roadmap epigenomics mapping consortium. Nat. Biotechnol. 2010;28:1045–1048. doi: 10.1038/nbt1010-1045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Li Y, Zhu J, Tian G, Li N, Li Q, Ye M, Zheng H, Jian Yu, Honglong W, Sun J, Zhang H, Chen Q, Luo R, Chen M, He Y, Jin X, Zhang Q, Chang Yu, Zhou G, Sun J, Huang Y, Zheng H, Cao H, Zhou X, Guo S, Xueda H, Li X, Kristiansen K, Bolund L, Jiujin X, Wang W, Yang H, Wang J, Li R, Beck S, Wang J, Zhang X. The DNA methylome of human peripheral blood mononuclear cells. PLoS Biol. 2010;8(11):1–9. doi: 10.1371/journal.pbio.1000533. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Thompson RF, Atzmon G, Gheorghe C, Liang HQ, Lowes C, Greally JM, Barzilai N. Tissue-specific dysregulation of DNA methylation in aging. Aging Cell. 2010;9(4):506–518. doi: 10.1111/j.1474-9726.2010.00577.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Baubec T, Schübeler D. Genomic patterns and context specific interpretation of DNA methylation. Curr. Opin. Genet. Dev. 2014;25:85–92. doi: 10.1016/j.gde.2013.11.015. [DOI] [PubMed] [Google Scholar]
  • 23.Palla G, et al. Hierarchy and control of ageing-related methylation networks. PLoS Comput. Biol. 2021;17(9):e1009327. doi: 10.1371/journal.pcbi.1009327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Horvath S, Oshima J, Martin GM, Lu AT, Quach A, Cohen H, Felton S, Matsuyama M, Lowe D, Kabacik S, Wilson JG, Reiner AP, Maierhofer A, Flunkert J, Aviv A, Hou L, Baccarelli AA, Li Y, Stewart JD, Whitsel EA, Ferrucci L, Matsuyama S, Raj K. Epigenetic clock for skin and blood cells applied to Hutchinson Gilford progeria syndrome and ex vivo studies. Aging. 2018;10:1758–1775. doi: 10.18632/aging.101508. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Zhang Q, Vallerga CL, Walker RM, Lin T, Henders AK, Montgomery GW, He J, Fan D, Fowdar J, Kennedy M, Pitcher T, Pearson J, Halliday G, Kwok JB, Hickie I, Lewis S, Anderson T, Silburn PA, Mellick GD, Harris SE, Redmond P, Murray AD, Porteous DJ, Haley CS, Evans KL, McIntosh AM, Yang J, Gratten J, Marioni RE, Wray NR, Deary IJ, McRae AF, Visscher PM. Improved precision of epigenetic clock estimates across tissues and its implication for biological ageing. Genome Med. 2019;11:54. doi: 10.1186/s13073-019-0667-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Hannum G, Guinney J, Zhao L, Zhang L, Hughes G, Sadda SV, Klotzle B, Bibikova M, Fan J-B, Gao Y, Deconde R, Chen M, Rajapakse I, Friend S, Ideker T, Zhang K. Genome-wide methylation profiles reveal quantitative views of human aging rates. Mol. Cell. 2013;49(2):359–367. doi: 10.1016/j.molcel.2012.10.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Levine ME, Lu AT, Quach A, Chen BH, Assimes TL, Bandinelli S, Hou L, Baccarelli AA, Stewart JD, Li Y, Whitsel EA, Wilson JG, Reiner AP, Aviv A, Lohman K, Liu Y, Ferrucci L, Horvath S. An epigenetic biomarker of aging for lifespan and healthspan. Aging. 2018;10:573–591. doi: 10.18632/aging.101414. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Lu AT, Quach A, Wilson JG, Reiner AP, Aviv A, Raj K, Hou L, Baccarelli AA, Li Y, Stewart JD, Whitsel EA, Assimes TL, Ferrucci L, Horvath S. DNA methylation Grimage strongly predicts lifespan and Healthspan. Aging. 2019;11:303–327. doi: 10.18632/aging.101684. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Fahy GM, et al. Reversal of epigenetic aging and immunosenescent trends in humans. Aging Cell. 2019;18(6):e13028. doi: 10.1111/acel.13028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Kabacik S, Lowe D, Fransen L, Leonard M, Ang S-L, Whiteman C, Corsi S, Cohen H, Felton S, Bali R, Horvath S, Raj K. The relationship between epigenetic age and the hallmarks of aging in human cells. Nat. Aging. 2002;2:484–493. doi: 10.1038/s43587-022-00220-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Aliferi A, Ballard D, Gallidabino MD, Thurtle H, Barron L, Court DS. DNA methylation-based age prediction using massively parallel sequencing data and multiple machine learning models. Forensic Sci. Int. 2018;37:215–226. doi: 10.1016/j.fsigen.2018.09.003. [DOI] [PubMed] [Google Scholar]
  • 32.Galkin F, Mamoshina P, Aliper A, Putin E, Moskalev V, Gladyshev VN, Zhavoronkov A. Human gut microbiome aging clock based on taxonomic profiling and deep learning. iScience. 2020;23(6):101199. doi: 10.1016/j.isci.2020.101199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Levy JJ, et al. Methylnet: An automated and modular deep learning approach for DNA methylation analysis. BMC Bioinform. 2020;21:108. doi: 10.1186/s12859-020-3443-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Galkin F, Mamoshina P, Kochetov K, Sidorenko D, Zhavoronkov A. Deepmage: A methylation aging clock developed with deep learning. Aging Dis. 2021;12:1252–1262. doi: 10.14336/AD.2020.1202. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Elmarakeby H, Hwang J, Arafeh R, Crowdis J, Gang S, Liu D, AlDubayan S, Salari K, Kregel S, Richter C, Arnoff T, Hahn W, Van Allen E. Biologically informed deep neural network for prostate cancer discovery. Nature. 2021;598:1–5. doi: 10.1038/s41586-021-03922-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Gill D, et al. Multi-omic rejuvenation of human cells by maturation phase transient reprogramming. Elife. 2021;11:e71624. doi: 10.7554/eLife.71624. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Fabregat A, Jupe S, Matthews L, Sidiropoulos K, Gillespie M, Garapati P, Haw R, Jassal B, Korninger F, May B, Milacic M, Roca C, Rothfels K, Sevilla C, Shamovsky V, Shorser S, Varusai T, Viteri G, Weiser J, D’Eustachio P. The reactome pathway knowledgebase. Nucleic Acids Res. 2017;46:11. doi: 10.1093/nar/gkv1351. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Shrikumar, A., Greenside, P. & Kundaje, A. Learning Important Features Through Propagating Activation Differences (2017).
  • 39.McEwen L, O’Donnell K, McGill M, Edgar R, Jones M, MacIsaac J, Lin D, Ramadori K, Morin A, Gladish N, Garg E, Unternaehrer E, Pokhvisneva I, Karnani N, Kee M, Klengel T, Adler N, Barr R, Letourneau N, Kobor M. The pedbe clock accurately estimates dna methylation age in pediatric buccal cells. Proc. Natl. Acad. Sci. USA. 2019;117:201820843. doi: 10.1073/pnas.1820843116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Hossain, S. Visualization of Bioinformatics Data with Dash Bio 126–133 (2019).
  • 41.Minteer C, et al. Revisiting the bad luck hypothesis: Cancer risk and aging are linked to replication-driven changes to the epigenome. bioRxiv. 2022 doi: 10.1101/2022.09.14.507975. [DOI] [Google Scholar]
  • 42.Clement J, et al. Umbilical cord plasma concentrate has beneficial effects on DNA methylation grimage and human clinical biomarkers. Aging Cell. 2022;09:e13696. doi: 10.1111/acel.13696. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Conboy I, Conboy M, Wagers A, Girma E, Weissman I, Rando T. Rejuvenation of aged progenitor cells by exposure to a young systemic environment. Nature. 2005;433:760–764. doi: 10.1038/nature03260. [DOI] [PubMed] [Google Scholar]
  • 44.Hoeijmakers JHJ. DNA damage, aging, and cancer. N. Engl. J. Med. 2009;361(15):1475–1485. doi: 10.1056/NEJMra0804615. [DOI] [PubMed] [Google Scholar]
  • 45.Schumacher B, Pothof J, Vijg J, Hoeijmakers JHJ. The central role of DNA damage in the ageing process. Nature. 2021;592(7856):695–703. doi: 10.1038/s41586-021-03307-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Melzer D, Pilling LC, Ferrucci L. The genetics of human ageing. Nat. Rev. Genet. 2020;21(2):88–101. doi: 10.1038/s41576-019-0183-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Yang Z, et al. Correlation of an epigenetic mitotic clock with cancer risk. Genome Biol. 2016;17:205. doi: 10.1186/s13059-016-1064-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Teschendorff AE. A comparison of epigenetic mitotic-like clocks for cancer risk prediction. Genome Med. 2020;12:56. doi: 10.1186/s13073-020-00752-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Zhou W, et al. Dna methylation loss in late-replicating domains is linked to mitotic cell division. Nat. Genet. 2018;50:591–602. doi: 10.1038/s41588-018-0073-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Benitah SA, Welz PS. Circadian regulation of adult stem cell homeostasis and aging. Cell Stem Cell. 2020;26:817–831. doi: 10.1016/j.stem.2020.05.002. [DOI] [PubMed] [Google Scholar]
  • 51.Takahashi JS. Transcriptional architecture of the mammalian circadian clock. Nat. Rev. Genet. 2017;18:164–179. doi: 10.1038/nrg.2016.150. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Masri S, Sassone-Corsi P. The emerging link between cancer, metabolism, and circadian rhythms. Nat. Med. 2018;24:1795–1803. doi: 10.1038/s41591-018-0271-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Reinke H, Asher G. Crosstalk between metabolism and circadian clocks. Nat. Rev. Mol. Cell Biol. 2019;20:227–241. doi: 10.1038/s41580-018-0096-9. [DOI] [PubMed] [Google Scholar]
  • 54.Patke A, Young MW, Axelrod S. Molecular mechanisms and physiological importance of circadian rhythms. Nat. Rev. Mol. Cell Biol. 2020;21:67–84. doi: 10.1038/s41580-019-0179-2. [DOI] [PubMed] [Google Scholar]
  • 55.Nassan M, Videnovic A. Circadian rhythms in neurodegenerative disorders. Nat. Rev. Neurol. 2022;18:7–24. doi: 10.1038/s41582-021-00577-7. [DOI] [PubMed] [Google Scholar]
  • 56.Maity AK, Hu X, Zhu T, Teschendorff AE. Inference of age-associated transcription factor regulatory activity changes in single cells. Nat. Aging. 2022;2:548–561. doi: 10.1038/s43587-022-00233-9. [DOI] [PubMed] [Google Scholar]
  • 57.Oh ES, Petronis A. Origins of human disease: The chrono-epigenetic perspective. Nat. Rev. Genet. 2021;22:533–546. doi: 10.1038/s41576-021-00348-6. [DOI] [PubMed] [Google Scholar]
  • 58.de Lima Camillo LP, Lapierre LR, Singh R. A pan-tissue dna-methylation epigenetic clock based on deep learning. npj Aging. 2022;8(1):4. doi: 10.1038/s41514-022-00085-y. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

All data used in the study were downloaded from the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/) and from the The Cancer Genome Atlas data portal (https://portal.gdc.cancer.gov/) databases. The corresponding dataset ID-s are listed in the supplementary information file. Any data not public can be requested.

The code for running inferences with the XAI-AGE model can be accessed at: https://github.com/Paureel/XAI-AGE.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES