Figure 1.

Methodology for leave‐one dataset‐out cross‐validation (LOOCV) and measures of age prediction accuracy. In the LOOCV, one dataset is left out (test set) and all other datasets (training sets) are used to develop the age predictor. The DNA methylation profiles of the training sets are input into an elastic net regression model (glmnet package in r), and this model is then used to estimate age in the test set. Predicted and actual age were correlated using Pearson's correlation coefficient (unless all individuals had the same age or the dataset was too small). We also calculated the AAdiff as the difference between predicted and actual age. We then calculated the median of the absolute values of AAdiff to estimate how well calibrated the clock was to this particular test set, and we calculated the mean of AAdiff to see whether the test set as a whole was younger (or older) than expected. Finally, we calculated the residuals from a linear regression of predicted age against actual age (AAresid) to obtain accuracy measures insensitive to the mean age of the dataset and to pre‐processing techniques.