ABSTRACT
Epigenetic changes during ageing have been characterized by multiple epigenetic clocks that allow the prediction of chronological age based on methylation status. Despite their accuracy and utility, epigenetic age biomarkers leave many questions about epigenetic ageing unanswered. Specifically, they do not permit the unbiased characterization of non-linear epigenetic ageing trends across entire life spans, a critical question underlying this field of research. Here we provide an integrated framework to address this question. Our model, inspired from evolutionary models, is able to account for acceleration/deceleration in epigenetic changes by fitting an individual’s model age, the epigenetic age, which is related to chronological age in a non-linear fashion. Application of this model to DNA methylation data measured across broad age ranges, from before birth to old age, and from two tissue types, suggests a universal logarithmic trend characterizes epigenetic ageing across entire lifespans.
KEYWORDS: Aging, DNA methylation
Introduction
Cell type specific differences in gene expression are partially controlled by chromatin accessibility and specific covalent modifications. These include modification to histones and DNA [1,2]. Among these, DNA methylation has been one of the most extensively studied components of cell type specification [3,4]. The covalent attachment of a methyl group to cytosine is catalysed by either de novo or maintenance methyltransferases, and in mammals is primarily targeted to CpG dinucleotides. Most CpGs in mammalian genomes are methylated, but pockets of hypomethylation exist, largely at promoters and enhancers. It has been shown that the absence of DNA methylation is closely associated with the presence of H3K4 methylation, which is also a hallmark of enhancers and promoters [5]. As stem cells differentiate along myriad lineages, each cell type tends to have distinctive DNA methylation profiles, mostly due to differential activation of enhancers [6].
Once an organism reaches its adult stage, these cell types and their respective epigenomes, have been largely determined. While these developmental epigenetic changes are believed to be rapid and extensive, in the past few years it has become ever more apparent that epigenetic changes continue to occur as an organism age [7]. This observation has led to the development of multiple epigenetic clocks, that is, biomarkers that accurately predict the chronological age of an animal based on its DNA methylation profile [8,9]. These epigenetic clocks have been extensively used in ageing research and have proven to be more accurate than previous ageing biomarkers, such as the length of telomeres [10]. Using these epigenetic clocks, much has been learned about the effects of the environment on ageing. For example, it is well known that the restriction of calories in mice slows down ageing, increases lifespan as well as the rate of the epigenetic clock [11]. Similar conclusions have been found in humans, where individuals with more rapid epigenetic ageing tend to suffer from higher all-cause mortality rates [12].
While these epigenetic clocks have proven to be useful for ageing research, they are constructed using machine learning methods that provide limited insights into the underlying processes that are driving these changes. For example, the Horvath epigenetic clock biomarker is constructed by selecting 354 CpG sites using penalized lasso regression, that optimally predict the chronological age of an individual. This biomarker also sets a hardcoded boundary of 20 years, where childhood ages are transformed using a logarithmic function up to this boundary, while adult ageing (above the 20 boundary) is kept linear [8]. This biomarker generates very accurate predictions of chronological age, typically within a couple of years, but leaves many questions unanswered. Is there truly a change in the rate of epigenetic ageing, from logarithm to linear trends, at 20 years? Does the linear fit of epigenetic age persist indefinitely, even for older individuals? Do non-linear trends in epigenetic ageing vary across populations? As the biomarkers are species specific, they also do not allow one to directly address whether epigenetic ageing trends vary across species.
To address some of these questions, we have previously proposed a common framework by borrowing from the field of evolution. The universal pacemaker (UPM) of genome evolution was devised in the setting of molecular evolution in order to relax the time-linear evolution (i.e. rate constancy) imposed by the molecular clock hypothesis [13–15], to account for correlation between rate changes in the genes of an evolving organism. The UPM is a statistical framework under which the relative evolutionary rates of all genes remain nearly constant whereas the absolute rates can change arbitrarily. In [16] we first proposed the adaptation of the UPM to the epigenetic setting, named the epigenetic pacemaker (EPM), and showed its application to simulation and small-scale biological data. To the best of our knowledge, the EPM is the first model-based framework for epigenetic ageing, where the rates of change with time of individual CpG sites are parametrized, along with the epigenetic age of the individual. In [17] we devised a fast, conditional expectation maximization (CEM) algorithm that is capable of processing inputs of several thousands of sites and individuals.
In this work, we set out to address some of the questions mentioned above, regarding the non-linear trends of epigenetic ageing across populations. First we show that the EPM, which lacks any predefined regimes of age intervals, can be used to model and identify epigenetic ageing trends over the entire lifespan of a population. We first apply our approach to a synthetic model simulating a non-linear ageing process and show that our framework is capable of capturing the trend built into this model. Next, we apply our model to publicly available sets of DNA methylation collected across broad age ranges and diverse tissues. Our results suggest unambiguously that a logarithmic trend across the entire lifespan is a better description of epigenetic ageing than linear or polynomial trends.
Methods
The evolutionary models
Our basic objects are a set of individuals and methylation sites in a genome (or simply sites). Each individual has an age, forming the set of time periods corresponding to each individual ’s age. Henceforth, we will interchangeably refer to individuals with their age. Each individual has a set of sites undergoing methylation changes at some characteristic rate . Each site starts at some methylation start level . All individuals have all the sites . As and are characteristic of the site , by the model they are the same at all individuals. The latter fact, links the same sites across different individuals, but also within individuals by the fact that sites generally maintain the same characteristic rates across the whole population. Henceforth, we will index sites with and individuals with .
Now, let measure the methylation level at site in individual after time (i.e. age) . Hence, under the molecular clock model (MC), where rate of change is relatively constant over time, we expect: . However, in reality, we have a noise effect that is added and therefore the observed value is .
Our goal is to find, given the input matrix , the maximum likelihood (ML) values for the variables and for . For this purpose, we assume a statistical model for by assuming that it is normally distributed, . In [16] we showed that minimizing the following function, denoted , is equivalent to maximizing the model’s likelihood
| (1) |
We also showed that there is an efficient and precise linear algebra solution to this problem, that we describe in more detail in the supplementary text.
In contrast to the MC, under the EPM model, sites may arbitrarily and independently of their counterparts in other individuals, change their rate at any point in life. However, when this happens, all sites of that individual change their rate proportionally such that the ratio is constant between any two sites , at any individual and at all times. In [16] we showed that this is equivalent to extending individual ’s age by the same proportion of the rate change. The new age is denoted as the epigenetic age. Therefore, here we do not just use the given chronological age but estimate the age of each individual. Hence, under the EPM we must find the optimal values of , , and (where represents a weighted average of the rate changes an individual has undergone through life). The solution to this optimization problem is described in detail in our previous publications [16,17]. We note that the deviation between the chronological age and the estimated epigenetic age is an age difference which, when positive, is denoted as age acceleration, and deceleration – otherwise.
To compare between the two models – MC and EPM, we note the following. The MC model is restricted to linearity with time by estimating a constant rate of methylation at each site, and using the given chronological age of each individual. The competing, relaxed, model (EPM) has no such restriction, and we estimate an ”epigenetic” age for each individual. By definition, the ML solution under the relaxed model cannot be worse than the constrained model. For that specific case, when one hypothesis generalizes another, there is a special test, the likelihood ratio test (LRT), in which the specific hypothesis serves as the null hypothesis and the goal is to reject it in favour of the alternative one. In the supplementary text, we provide a more detailed explanation of this test and its application in our case.
Selecting informative methylation loci
DNA methylation platforms usually measure several hundreds of thousands of sites. It has been observed that many of these sites are invariant and do not change with age. It is desirable to restrict the analysis only to the most informative sites. Nevertheless, among the sites that do change it is necessary to set a criterion for site selection as it is inefficient to analyse all of the sites. There are several alternatives that we now describe.
The first and most basic and intuitive criterion for site selection is site variance – simply choose the sites that exhibit the largest variability. Figure 1(L) depicts the resulted analysis based on this criterion. This criterion is crude in the sense that it entirely ignores the relationship between time (age) and methylation. Therefore, the next criterion to be examined is the covariance between age and methylation status at the site. The covariance metric selects sites that have a large change in methylation with age. We note that this criterion will not necessarily yield a significant linear fit between age and methylation status, the sites may still have a significant scatter, as is shown in Figure 1(M).
Figure 1.

Site Selection Criterion . Scatter plots of inferred epigenetic age (e-age, y-axis) as a function of the chronological age (c-age, x-axis) as a result of applying the EPM algorithm to blood samples from data set GSE60132 (see more details in the Results sec.). Each point represents an individual. One thousand best sites were selected by the following three criteria. A: Sites are selected based on their variance, regardless of correlation to age. B: Sites are selected based on their covariance with age. C: Sites are selected by the (absolute) Pearson correlation coefficient.
Therefore, the third criterion is the (absolute) Pearson correlation coefficient (PCC) defined as . In contrast to the covariance, PCC selects sites that have a tight fit to a linear relationship between methylation and age, although some of them show small changes in methylation across the range. Figure 1(R) shows the results based on sites selected by the PCC criterion. It is noticeable that using this criterion a much tighter relationship between epigenetic and chronological age is obtained. Hence, we use PCC to select the sites to be modelled for all our data sets as it provided the clearest trends between epigenetic and chronological age.
Determining the trend line of epigenetic age
To determine the trend line between epigenetic and chronological age we employed both coarse and fine-grained procedures based on the individual e-age inferred initially by the pacemaker criterion. Recall though that a first stage test is whether the pacemaker criterion, stating that rates and starting states are statistically correlated across individuals, and sites at any individual are also correlated, holds. This, is done by comparing to the molecular clock to the pacemaker model, as was described in the model description. Indeed, in the supplementary text we show the results of this test, along with the specific values obtained. The values depict that the pacemaker alternative is always superior with p-value smaller than .
We start by describing the two-stage procedure for determining the type of trend in the population. The mutual independence of the stages, along with lack of any prior assumption of any trend in the population, guarantees that the trend inferred is objective and unbiased. The EPM procedure is applied to the data in order to find optimal values for rates, starting states, and epigenetic age for all sites and individuals. Note that except for that pacemaker principle, that enforces uniformity of site rate and starting state across all individuals, there is no mechanism imposing correlation between any two individuals. Therefore, the EPM assigns every individual the optimal epigenetic age.
Once the EPM procedure is done, each individual is assigned its own epigenetic age. At the second stage, we seek a function that best fits the relationship between epigenetic and chronological age across the entire age range. Our prime criterion for goodness of fit for a trend of this relationship between epigenetic and chronological age is the coefficient from the trend line. In all our data sets we parametrized three functional forms for the trend line: linear, quadratic and exponential, and we used Excel to fit the best coefficients for each type. In the Results section, we provide a more detailed description of this process.
We now describe a second approach that we devised and utilized. In age ranges where the trend is not conspicuous, that is, near linear, such that it cannot be distinguished convincingly using the above functional forms, we note the following. For a person with age and inferred e-age we define . Now, assume the increase in e-age is decreasing with time, then we observe that if we order the s by increasing , we obtain a monotonic increasing series (of ). We denote that series . However, due to biological and statistical noise, we never expect to find strict monotonicity at and we are bound to test only a trend of monotonicity. Now, we note that by the definition of also the variance of is changing in time. However, we note the following. For any two indices and such that , if is monotonically decreasing, then . Moreover, suppose we randomize the order of . Then, for any two indices and the probability and hence the expected number of such that is . Now, since the variables might be dependent, we cannot use standard bounds on deviations to calculate the probability of seeing that many pairs such that by chance. This forces the use of a non-parametric test of the hypothesis. For this purpose, we can use the Mann–Kendall test for monotonic trend [18–20]. According to this method, given a random vector , all pairs of indices such that are checked whether . Let be the number of pairs for , such that and let be the number of such pairs such that . Now let , representing first the direction of the trend with when the series is increasing, and vice versa for . However, also indicates on the intensity of the trend, and we note that under (no monotonicity), we have . Now we also need to compute the variance of , . The statistic defined follows approximately the standard normal distribution, hence allowing us to obtain conveniently a -value for the trend indicated by .
Results
Identifying trends in a cohort
Methylation trends of the relationship between epigenetic and chronological time in a cohort provide useful information of how a group, as opposed to an individual, ages epigenetically with time. The Horvath model [8] has a rigid assumption of linearity of e-age in time for adults by using a linear combination of an individual’s methylation states of several hundreds of sites. For kids (age less than 20), the model corrects for non-linearity using a logarithmic, yet fixed, function. The EPM model has no such assumption and therefore has the freedom to assign each individual its own e-age, as long as it complies with the EPM universality law, that is, that this age affects all the individual’s sites. We now demonstrate on synthetic data, the ability of our procedure to infer correct times (e-ages) and in particular trends throughout a whole population. For this purpose we have devised the following age-related function that appears to encompass the characteristics of e-ageng as they emerge from existing knowledge, in particular by the Horvath model:
| (2) |
where holds the e-age, is the chronological age (c-age) of a person, is some upper limit on a person’s age, and is a trend parameter to the function. The trend function has few desired characteristics. First, it satisfies a monotonic decrease in rate through time (c-age) and that decrease is proportional to the trend parameter. Also, at c-age that equals the upper limit , the epigenetic and chronological ages coincide: . Finally, for the trend function is linear with for every , as . Figure 2(L) illustrates pictorially the behaviour of the trend function for several values of trend and for upper limit age . Indeed, we see that all trend lines depart from the origin and converge towards the point . We also see that the larger is (with maximum ), the more straight the trend line is, and in particular, for a straight line with slope is exhibited.
Figure 2.

The trend function. Left: The trend function – Trend lines for four values in blue, red, green, and olive green colours respectively. Middle: Simulated actual noisy PM – Actual noisy e-ages (blue dots) values around the trend line (red) with specific and . The (green) line represents the c-age of each individual. Right: The values inferred by the EPM-CEM algorithm – green dots represent the inferred e-age by the algorithm. It should be compared to the real e-age (blue). While there is a gap, linear with time, between actual and inferred e-ages, the trend is captured.
In order to simulate realistic e-ages, we allow for each individual, some stochastic deviation of her/his e-age from the (or population’s) trend line, and that deviation depends on some variance . To show that, we set a specific and . Figure 2(M) shows simulated e-ages around the trend function with a specific trend and . For illustration, the straight line, representing the c-age, appears in green the figure.
Finally, for every such e-age produced, our simulation procedure generates the methylation status for every site and individual , according to the model: . Figure 2(R) shows the result of applying the EPM-CEM procedure to such synthetic data. Green dots represent the inferred e-age by the algorithm and should be compared to the real (model) e-age (blue, same as in the middle box). We can see that EPM-CEM is capable of capturing the trend imposed by the simulation however it lags below the trend line by a gap that is linearly (inversely) correlated with age (red-green dots). This gap, returned by the procedure is due to the degeneracy of the likelihood surface that allows for multiple points in the surface to attain the same likelihood and in particular the maximum likelihood.
Analysis of human data
We have shown above the ability of our technique to identify the correct trend in ageing in simulated data, and we now move to analyse human methylation data from various types of data sets. The general procedure we have taken to identify and assess a trend is as follows. We applied the EPM approach to the real methylation data, all taken from the Gene Expression Omnibus (GEO) repository, using the procedure in the simulation section above 3.1. The EPM, allows us to determine whether the MC hypothesis is rejected by the pacemaker, and also infers for each individual its epigenetic age. We remark that for all the real datasets we analysed here, the pacemaker hypothesis was found superior to the linear approach with p-values always smaller than . As these information is not essential to the main subject of this study, it appears in details in the supplementary text.
Similar to the simulation study, in a subsequent stage to the EPM, we plot for each individual its two ages. Here, however, as these points were not synthetically generated by a function, we attempt to fit a trend function best describing these points. We focus on two families of functions – linear and logarithmic, as they are very general with a single explaining variable. These families are indeed the most common for trend approximation. However, to obtain additional insights into the functional form of the trend, we also accompany the logarithmic and the linear approximations with a quadratic best approximation line. In the results below, we demonstrate how we exploit the added information provided by the quadratic approximation. We note that any linear line is a special case of a quadratic family, simply with quadratic coefficient equals zero, and hence by definition, its fit is always inferior to the quadratic approximation.
Our analysis is divided into age range-based analysis, and also to tissue-based analysis. All data sets required a preprocessing step of selecting the most informative (1000) sites, and based on our conclusions above, we used the Pearson correlation coefficient (PCC) criterion.
Epigenetic ageing in adults
Our first data set is from GSE40279, consisting of 656 blood samples from adults [9]. Their ages range from 19 years to 101. The results are depicted in
Figure 3. In the top two graphs we show scatter plots with the three trend lines where the one with the best fit appears in the left and the two suboptimal ones in the right scatter plot. For each trend line, we also depict its exact formula, the and the adjusted for polynomial trend lines. It is quite evident that the linear trend is superior here to the other two, with negligible increase in for the quadratic trend. Therefore, in order to check if there is still a trend of non-linearity, we applied the Mann–Kendall test as described in the Methods section. The lower part of Figure 3 shows the values of ordered by chronological age. We set to test if there is an increasing trend in this series. The value obtained for was , under this size of data, we have a huge variance with yielding -score of which is not significant.
Figure 3.

GSE40279 – Human blood data results I. (Top) E-age vs C-age in adults. Age is plotted in years. The left graph shows the best approximation to the data. The linear line is slightly and insignificantly inferior to the quadratic approximation and therefore is the best fit. (Bottom) Mann-Kendall test for monotonicity trend: c-age vs e-age ratio ordered from left to right according to c-age. If rate of ageing is decreasing, we expect to see a monotonic increase in the function. Indeed, the function is increasing but not in a significant manner.
Our second data set is the GSE87571, also from human blood taken from 366 individuals of ages from 14 to 94 years old. The results of applying the EPM-ECM to this data are depicted in Figure 4. The top two graphs show the scatter plot of e-age versus c-age with the three trend types – linear, quadratic, and logarithmic. The difference between the linear and the quadratic is negligible, with a and respectively. Nevertheless, we see a bend in the points corresponding to younger ages. In general, the entire collection of points here allude to a concave shape, i.e., a decreasing function, as can be seen by the negative coefficient () of the quadratic term in the quadratic trend line. This should be contrasted to the convex trend of the quadratic trend in the previous case (data set GSE40279 – Human Blood Data, Figure 3) where the first coefficient equals .
Figure 4.

GSE87571 – Human blood data results II. (Top) E-age vs C-age in adults. Age is plotted in years. The left graph shows the best approximation to the data. The linear line is slightly and insignificantly inferior to the quadratic approximation. (Bottom) Mann-Kendall test for monotonicity trend: c-age vs e-age ratio ordered from left to right according to c-age. If rate of epigenetic ageing is decreasing with time, we expect to see a monotonic increase in the -ratio function. Indeed, the function is significantly increasing.
To verify this decreasing trend, as in the previous data set, we apply the test of monotonicity – the Mann–Kendall Test – to this data. The value obtained for was implying that we have an increasing trend in and therefore epigenetic ageing is decreasing in time also for this data set. The variance here is yielding a -score of and a p-value smaller than and is therefore significant.
Epigenetic ageing in children
After analysing the data collected from adults, we turned to analyse data from children. We analyse the GSE36064 data set of blood samples taken from 78 children of ages ranging from one year to 16. The ages here, as well as in the figure describing it, are represented in months. The results are shown in Figure 5. Here, the linear trend is clearly inferior to the quadratic and the logarithmic trends. We find that the logarithmic trend is the best approximation and show it on the left side of Figure 5.
Figure 5.

GSE36064 – Children blood data results . E-age vs C-age in young humans. Age is plotted in months. The left graph shows the best approximation to the data. The logarithmic approximation provides the best explanation.
Combined age analysis
In the previous data sets, we restricted the analysis to specific age ranges, such as children or adults. In the next two data sets, we analyse blood samples from individuals with age ranges from childhood to old age. The first data set – GSE60132 – was taken from peripheral blood samples of 192 individuals of Northern European ancestry [21]. Ages range from 6 to 85 years. The results are shown in Figure 6. As can be noticed, the logarithmic trend line provides better than the linear trend line, versus . The concavity of the spread of the points is fairly noticeable, and this is confirmed by the negative first coefficient of the quadratic trend function – .
Figure 6.

GSE60132 – Human, all ages, blood data results I. E-age vs C-age in wide age range . Age is plotted in years. The left graph shows the best trend line approximation to the data, which is the logarithmic trend function. At the right, the inferior trends – the quadratic and linear. The quadratic line is slightly and insignificantly inferior to the logarithmic approximation, buy also portrays a concave line due to negative first coefficient .
The Next data set – GSE64495 – is also from blood samples of 113 individuals [22]. Here, while there is a scarcity of samples from the age range 12–35, the entire age range of the study begins at even younger ages than the previous data set: 2.3 years versus 6. Our results for this dataset are depicted in Figure 7. Here the advantage of the logarithmic trend line over the linear is the most significant among the adults containing data sets analysed so far, versus , and is even significant over the quadratic – . The decrease in the rate is evident as well as the fit to the logarithmic trend line.
Figure 7.

GSE64495 – Human, all ages, blood data results II. E-age vs C-age in kids and adults. Age is plotted in years. The left graph shows the best approximation to the data, obtained by the logarithmic trend line with . On the right, the inferior trend lines, the linear line with and the quadratic line with .
Brain development and ageing
Our last data set is from GSE74193, consisting of 675 samples from brain tissues from before birth to old age [23]. The advantage of this data set is two-fold. First, the broad range of ages – from half a year before birth to 85 years, which represents a broader age range than that found in the previous data sets, and allows us to track epigenetic ageing across the entire span of life, starting from before birth. Second, all the samples from the previously analysed data sets came from blood. This data set, from brain tissues, allows us to contrast our results from blood tissues to another tissue type. Our results are depicted in Figure 8. The logarithmic approximation appears on the left graph, not only provides a significantly better fit to the data, with versus and for the quadratic trend line and the linear trend line, respectively. Moreover, the high provides an almost perfect fit to the data.
Figure 8.

GSE74193 – Brain development data results . E-age vs C-age in young humans. Age is plotted in years. The left graph shows the best approximation to the data. The logarithmic approximation provides the best explanation.
EPM comparison to the Horvath and Hannum epigenetic clocks
Next, we wanted to evaluate the EPM in the context of the well-known Horvath [8] and Hannum [9] epigenetic clocks. To generate sufficiently large experimental data we combined Human Illumina methylation 450K Beadchip data generated using whole blood across several experiments [22,24–31] from GEO. To facilitate cross experiment comparisons, we performed stratified quantile normalization for different probe technologies used in the Illumina 450k Beachip Array as previously reported [8]. Comparisons of the EPM model to the Horvath and Hannum models of epigenetic ageing were performed as follows. The combined methylation data were subset to include CpG sites reported in the Horvath or Hannum models. Samples with missing data for any of the CpG sites were dropped. The methylation subsets consisted of 354 CpG sites and 614 samples and 71 CpG sites and 1117 samples for the Horvath and Hannum comparison datasets, respectively. We then calculated the Horvath and Hannum epigenetic ages for each sample. The EPM produced a similar estimate of epigenetic age for both the Horvath () and Hannum () models when the min-max scaled [32] estimates of epigenetic age were compared to the scaled EPM ages. Both the Horvath and Hannum models showed a non-linear epigenetic ageing trend as shown in Figure 9. However, the computed Horvath epigenetic age represents a log transformed age under 20 years and linear age thereafter. While the raw output of the Horvath clock displayed a better fit to a logarithmic epigenetic ageing trend than the transformed Horvath epigenetic age estimate, vs .
Figure 9.

EPM Hannum Horvath trend comparison. Top Left EPM and Horvath ageing trend. Top Middle EPM and Horvath (transformed ages) ageing trend. Top Right EPM and Hannum ageing trend.
To assess the relative importance of individual CpG sites in the Horvath and Hannum models to the EPM, we compared the EPM rate for each CpG site to the respective regression coefficient in Horvath and Hannum models as shown in Figure 10. EPM rates were correlated with the regression coefficients in both the Horvath (, ) and Hannum (, ) models. Interestingly, the relative importance of many CpG sites appears to differ between the EPM and Horvath and Hannum models; particularly in the case of the Horvath model. Many of the CpG sites assigned high coefficients in the Horvath model have a rate close to zero in the EPM model and sites with large absolute EPM rates have coefficients near zero in the Horvath model. This trend appears to be associated with the methylation state of the individual CpG sites. Hypomethylated sites are assigned coefficients with greater magnitude in the Horvath model, while intermediate sites () are assigned higher magnitude rates in the EPM model. A similar relationship is not observed when comparing the Hannum model coefficients and EPM rates.
Figure 10.

EPM Hannum Horvath rate coefficient comparison horvath and hannum site coefficients compared to the EPM initial methylation values and EPM rates for the CpG sites used in the Horvath and Hannun models, respectively. Left EPM and Horvath right EPM and Hannum.
Discussion
During the past few years, several studies have shown that DNA methylation patterns continue to change as individuals age. These observations have been leveraged to construct epigenetic clocks that predict the age of an individual based on their methylation profile. While these tools have proven to be very useful for ageing studies, they are based on a priori assumptions about the relationships between epigenetic and chronological age. For example, the Hannum age clock assumes that epigenetic age and chronological are linearly related, and using multivariate penalized regression identifies 71 CpG sites whose methylation values can be combined using a weighted sum to predict the actual age of the individual. The Horvath clock uses a more complex set of assumptions to derive its predictions of chronological age: it applies a logarithmic transformation to ages below 20 and a linear relationship for ages greater than 20. He then identifies 354 CpG sites using elastic net regression to very accurately predict the transformed age. These biomarkers have been widely used to study ageing, as is evident by the hundreds of studies that have utilized them. However, because of their underlying assumptions, they are not ideal tools to infer trends in epigenetic ageing rates across life spans in an unbiased fashion. Nevertheless, correctly modelling potential non-linearities in epigenetic ageing is critical to advance the field and generate an even more robust understanding of epigenetic ageing and its impact on human health and mortality.
To address this question, we have developed an unbiased approach to measure the trends in epigenetic ageing within a cohort of individuals of varied ages. Our approach is distinct from the Hannum and Horvath epigenetic clocks in that it is not designed to optimally predict the age of an individual, but rather to model the non-linear trends in epigenetic ageing over time without making any a priori assumptions about what these may be. The method is inspired by evolutionary models that attempt to model mutation rate changes over time. This lead to the development of our epigenetic universal pacemaker model, which we have presented in previous studies [16,17].
To attempt to identify rates of epigenetic ageing over the entire life span of humans we apply the epigenetic pacemaker to multiple datasets that measure DNA methylation in large cohorts of individuals of varying ages. In the first few cohorts the methylation is profiled from blood, while in the last cohort the methylation is measured in brain tissue. In all cases, methylation is profiled using an Illumina microarray that measures the methylation across approximately 450,000 sites. While all cohorts sample individuals from early adulthood to old age, the second set also includes samples from individuals from early childhood to adulthood, and in the last case of the brain study, also samples from fetuses obtained before birth.
By applying our epigenetic pacemaker model to this data we observe consistent and robust trends across these datasets. The first is that from early adulthood (around age 20) to old age (well into the 90s), DNA methylation changes in a roughly linear fashion, yet with a slight but significant tendency for rate decrease. However, in contrast to the adults, we find that DNA methylation changes are strongly non-linear from late fetal stages to adolescents. In both the blood and brain datasets that measure these stages, we observe that the epigenetic age inferred by our model is related to the chronological age by a logarithmic transformation across the entire span of life, from before birth to old age. Thus, DNA methylation changes are very rapid initially, and then gradually decrease with age. This implies that the rate of change of epigenetic ages (i.e., the slope of our trend line) is roughly the inverse of the chronological age. The fact that we consistently observed these trends across multiple datasets, and two different tissuessuggests that the logarithmic relationship between epigenetic and chronological age may be a universal property of human ageing.
This universal logarithmic trend may help explain some interesting observations that have been reported in the literature regarding epigenetic ageing. For example, one recent study found that the Horvath epigenetic clock ”systematically underestimates ages in tissues from older people,” and that ”a decrease in slope of the predicted ages were observed at approximately 60 years, indicating that some loci in the model may change differently with age, and that age acceleration measures will themselves be age-dependent” [33]. A second study also found that ”epigenetic age increases at a slower rate than chronological age across the life course, especially in the oldest population” [34]. These results suggest that the underlying assumptions about the relationships between epigenetic age and chronological age impact the performance of the Hannum and Horvath epigenetic clocks, and that deviations are most notable in old age. We speculate that if the logarithmic epigenetic ageing trend that we observe is, in fact, universal, that this could lead to improved biomarkers that show more robust performance at the extremes of the age distributions, leading to more accurate associations between epigenetic ageing and human health and longevity.
Moreover, we believe that the observation that epigenetic ageing is logarithmic over the entire life span opens up new avenues for epigenetic research in the future. What mechanisms lead to the gradual reduction in epigenetic rates from late fetal stages to centenarians? Is the logarithmic trend related to prior observations that epigenetic ageing is a measure of epigenetic entropy? The answers to these questions will undoubtedly influence our understanding of human ageing and longevity and will most likely apply to a broad range of organisms. By quantitatively demonstrating these trends in an unbiased fashion, we believe we have laid a solid foundation for the development of improved ageing biomarkers and the investigation of the underlying mechanisms of epigenetic ageing, and ultimately to the answers to these important questions that are fundamental to the biology of development and ageing.
In response to our first question about mechanisms of epigenetic age, we speculate that epigenetic ageing is partially driven by entropic forces. Under this assumption, epigenomes are set early in life and during ageing the methylation levels of promoters and enhancers drift away from their original state towards a level of intermediate methylation, which represents more disordered states. Thus, we hypothesize that if methylation profiles were generated at the single cell level, we would find that ensembles of cells of the same type are more epigenetically similar early in life than later in life. Moreover, we speculate that the logarithmic trend between epigenetic age and time is reminiscent of the relationship between entropy and the number of states accessible to a system, that are also related by a logarithmic relationship in Boltzmann’s formula. Thus, if we hypothesize that Epigenetic age is a measure of entropy, then we would conclude that the number of epigenetic states accessible to an individual increase linearly in time.
Disclosure statement
No potential conflict of interest was reported by the authors.
Supplementary material
Supplemental data for this article can be accessed here.
References
- [1].Qianhua X, Xie W.. Epigenome in early mammalian development: inheritance, reprogramming and establishment. Trends Cell Biol. 2018;28(3):237–253. [DOI] [PubMed] [Google Scholar]
- [2].Bernstein BE, Meissner A, Lander ES.. The mammalian epigenome. Cell. 2007;128(4):669–681. [DOI] [PubMed] [Google Scholar]
- [3].Jones PA, Takai D. The role of dna methylation in mammalian epigenetics. Science. 2001;293(5532):1068–1070. [DOI] [PubMed] [Google Scholar]
- [4].Feng S, Jacobsen SE, Reik W. Epigenetic reprogramming in plant and animal development. Science. 2010;330(6004):622–627. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Morselli M, Pastor WA, Montanini B, et al. In vivo targeting of de novo dna methylation by histone modifications in yeast and mouse. eLife. 2015. April;4:e06205. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Hon GC, Rajagopal N, Shen Y, et al. Epigenetic memory at embryonic enhancers identified in dna methylation maps from adult mouse tissues. Nat Genet. 2013. September;45:1198 EP. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Thompson RF, Atzmon G, Gheorghe C, et al. Tissue specific dysregulation of dna methylation in aging. Aging Cell. 2010;9(4):506–518. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Horvath S. Dna methylation age of human tissues and cell types. Genome Biol. 2013;14(10):1–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Hannum G, Guinney J, Zhao L, et al. Genome-wide methylation profiles reveal quantitative views of human aging rates. Mol Cell. 2013;49(2):359–367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Marioni RE, Harris SE, Shah S, et al. The epigenetic clock and telomere length are independently associated with chronological age and mortality. Int J Epidemiol. 2018. February;47(1):356. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Wang T, Tsui B, Kreisberg JF, et al. Epigenetic aging signatures in mice livers are slowed by dwarfism, calorie restriction and rapamycin treatment. Genome Biol. 2017. March;18(1):57. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Bh Chen, Marioni RE, Colicino E.. Dna methylation-based measures of biological age: meta-analysis predicting time to death. Aging (Albany NY). 2016. September;8(9):1844–1859. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Snir S, Wolf YI, Koonin EV. Universal pacemaker of genome evolution. PLoS Comput Biol. 2012. November;8:e1002785. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Wolf YI, Snir S, Koonin EV. Stability along with extreme variability in core genome evolution. Genome Biol Evol. 2013;5(7):1393–1402. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Snir S, Wolf YI, Koonin EV. Universal pacemaker of genome evolution in animals and fungi and variation of evolutionary rates in diverse organisms. Genome Biol Evol. 2014;6:1268–1278. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Snir S, vonHoldt BM, Pellegrini M. A statistical framework to identify deviation from time linearity in epigenetic aging. PLoS Comput Biol. 2016. November;12(11):1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Snir S, Pellegrini M. An epigenetic pacemaker is detected via a fast conditional expectation maximization algorithm. Epigenomics. 2018;10(6):695–706. PMID: 29979108 [DOI] [PubMed] [Google Scholar]
- [18].Mann HB. Non-parametric tests against trend. Econometrica. 1945;13:163–171. [Google Scholar]
- [19].Kendall MG. Rank correlation methods: 10 tab. Oxford, UK: Oxford University Press; 1975. [Google Scholar]
- [20].Gilbert RO. Statistical methods for environmental pollution monitoring. Hoboken, NJ: Wiley; 1987. [Google Scholar]
- [21].Ali O, Cerjak D, Kent JW, et al. An epigenetic map of age-associated autosomal loci in northern european families at high risk for the metabolic syndrome. Clin Epigenetics. 2015. February;7(1):12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Walker RF, Liu JS, Peters BA, et al. Epigenetic age analysis of children who seem to evade aging. Aging (Albany NY). 2015. May;7(5):334–339. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Jaffe AE, Gao Y, Deep-Soboslay A, et al. Mapping dna methylation across development, genotype and schizophrenia in the human frontal cortex. Nat Neurosci. 2016;19(1):40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [24].Bell JT, Loomis AK, Butcher LM, et al. Differential methylation of the TRPA1 promoter in pain sensitivity. Nat Commun. 2014;5:2978. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Milenkovic D, Berghe WV, Boby C, et al. Dietary flavanols modulate the transcription of genes associated with cardiovascular pathology without changes in their DNA methylation state. PloS One. 2014;9(4):e95527. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Horvath S, Levine AJ. Hiv-1 infection accelerates age according to the epigenetic clock. J Infect Dis. 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [27].Horvath S, Mah V, Lu AT, et al. The cerebellum ages slowly according to the epigenetic clock. Aging (Albany NY). 2015. May;7(5):294–306. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Norheim KB, Imgenberg-Kreuz J, Jonsdottir K, et al. Epigenome-wide DNA methylation patterns associated with fatigue in primary Sjogren’s syndrome. Rheumatology (Oxford). 2016. June;55(6):1074–1082. [DOI] [PubMed] [Google Scholar]
- [29].Fernandez-Rebollo E, Eipel M, Seefried L, et al. Primary osteoporosis is not reflected by disease-specific dna methylation or accelerated epigenetic age in blood. J Bone Miner Res. 2018. February;33(2):356–361. [DOI] [PubMed] [Google Scholar]
- [30].Kular L, Liu Y, Ruhrmann S, et al. DNA methylation as a mediator of HLA-DRB1*15:01 and a protective variant in multiple sclerosis. Nat Commun. 2018. June;9(1):2397. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Cordova-Palomera A, Palma-Gudiel H, Fores-Martos J, et al. Epigenetic outlier profiles in depression: a genome-wide DNA methylation analysis of monozygotic twins. PloS One. 2018;13(11):e0207754. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].Varoquaux G, Pedregosa F, Gramfort A, et al. Édouard duches and bertrand thirion olivier grisel mathieu blondel peter prettenhofer ron weiss vincent dubourg jake vanderplas alexandre passos david cournapeau matthieu brucher matthieu perrot édouard duches gaël varoquaux fabian pedregosa alexandre gramfort vincent michel. scikit learn: machine learning in python. J Mach Learn Res. 2011;12:2825–2830. [Google Scholar]
- [33].Louis Y, Khoury E, Gorrie-Stone T, et al. Properties of the epigenetic clock and age acceleration. bioRxiv. 2018. [Google Scholar]
- [34].Marioni RE, Suderman M, Chen BH, et al. Tracking the epigenetic clock across the human life course: a meta-analysis of longitudinal cohort data. J Gerontol A. 2018;74:gly060. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
