Table 2. Metrics of machine learning approaches applied for Y-based epigenetic age prediction.
Regression model | Parameter(s) | No. of features(Y-CpGs) | Internal validation (n=172)1 | External testing (n=127)1 | ||||||||
MAD (years) | ρ | RMSE | R2 | MAD (years) | ρ | RMSE | R2 | |||||
Multiple Linear | N/A | 75 | 10.45 | 0.65 | 12.93 | 0.42 | 9.31 | 0.58 | 11.65 | 0.34 | ||
Lasso | α: 1 | 32* | 10.71 | 0.65 | 12.99 | 0.42 | 9.19 | 0.58 | 11.08 | 0.34 | ||
Ridge | α: 0 | 75 | 10.67 | 0.66 | 12.88 | 0.43 | 8.76 | 0.60 | 10.70 | 0.36 | ||
Elastic Net | α: 0.5 | 33* | 10.72 | 0.65 | 12.99 | 0.42 | 9.15 | 0.59 | 11.01 | 0.34 | ||
Random Forest | ntree: 500, mtry: 25, nodesize: 5 |
75 | 9.23 | 0.80 | 11.33 | 0.64 | 8.48 | 0.66 | 10.18 | 0.43 | ||
19§ | 9.63 | 0.74 | 11.89 | 0.54 | 8.42 | 0.61 | 10.50 | 0.37 | ||||
Support Vector Machine (ε-type) | Linear kernel | C: 1 | 75 | 10.69 | 0.60 | 13.88 | 0.36 | 10.63 | 0.53 | 13.03 | 0.28 | |
Polynomial kernel | C: 1, degree: 3, γ: 0.013 | 75 | 9.41 | 0.68 | 12.40 | 0.46 | 9.71 | 0.53 | 12.48 | 0.28 | ||
Sigmoid kernel | C: 1, γ: 0.013 | 75 | 13.83 | 0.40 | 17.83 | 0.16 | 11.44 | 0.33 | 16.90 | 0.11 | ||
Radial kernel | C: 2, γ: 0.013 | 75 | 7.53** | 0.81 | 10.15 | 0.653 | 7.61** | 0.70 | 9.36 | 0.49 | ||
C: 2, γ: 0.052 | 19† | 8.46 | 0.73 | 11.77 | 0.53 | 8.88 | 0.57 | 11.38 | 0.33 |
MAD: Mean Absolute Deviation, ρ: Pearson Correlation Coefficient, RMSE: Root Mean Square Error, N/A: Not Applicable.
α: Regularization parameter, ntree: Number of trees to grow, mtry: Number of variables randomly sampled as candidates at each split, nodesize: Minimum size of terminal nodes, C: Cost weight for penalizing the soft margin, degree: Number of degrees for the polynomial equation, γ: Controls the trade-off between error due to bias and variance in the model.
1All models were built based on our training set (n = 758).
*Based on α penalization, which shrinks coefficients towards zero.
§Based on Random Forest Cross-Validation for feature selection.
**(in bold) Best performing model.
†Based on stepwise-feed forward feature selection and Bayesian Information Criterion (BIC).