Skip to main content
. 2021 Mar 11;13(5):6442–6458. doi: 10.18632/aging.202775

Table 2. Metrics of machine learning approaches applied for Y-based epigenetic age prediction.

Regression model Parameter(s) No. of features(Y-CpGs) Internal validation (n=172)1 External testing (n=127)1
MAD (years) ρ RMSE R2 MAD (years) ρ RMSE R2
Multiple Linear N/A 75 10.45 0.65 12.93 0.42 9.31 0.58 11.65 0.34
Lasso α: 1 32* 10.71 0.65 12.99 0.42 9.19 0.58 11.08 0.34
Ridge α: 0 75 10.67 0.66 12.88 0.43 8.76 0.60 10.70 0.36
Elastic Net α: 0.5 33* 10.72 0.65 12.99 0.42 9.15 0.59 11.01 0.34
Random Forest ntree: 500, mtry: 25,
nodesize: 5
75 9.23 0.80 11.33 0.64 8.48 0.66 10.18 0.43
19§ 9.63 0.74 11.89 0.54 8.42 0.61 10.50 0.37
Support Vector Machine (ε-type) Linear kernel C: 1 75 10.69 0.60 13.88 0.36 10.63 0.53 13.03 0.28
Polynomial kernel C: 1, degree: 3, γ: 0.013 75 9.41 0.68 12.40 0.46 9.71 0.53 12.48 0.28
Sigmoid kernel C: 1, γ: 0.013 75 13.83 0.40 17.83 0.16 11.44 0.33 16.90 0.11
Radial kernel C: 2, γ: 0.013 75 7.53** 0.81 10.15 0.653 7.61** 0.70 9.36 0.49
C: 2, γ: 0.052 19† 8.46 0.73 11.77 0.53 8.88 0.57 11.38 0.33

MAD: Mean Absolute Deviation, ρ: Pearson Correlation Coefficient, RMSE: Root Mean Square Error, N/A: Not Applicable.

α: Regularization parameter, ntree: Number of trees to grow, mtry: Number of variables randomly sampled as candidates at each split, nodesize: Minimum size of terminal nodes, C: Cost weight for penalizing the soft margin, degree: Number of degrees for the polynomial equation, γ: Controls the trade-off between error due to bias and variance in the model.

1All models were built based on our training set (n = 758).

*Based on α penalization, which shrinks coefficients towards zero.

§Based on Random Forest Cross-Validation for feature selection.

**(in bold) Best performing model.

Based on stepwise-feed forward feature selection and Bayesian Information Criterion (BIC).