Table 2.
Model training, EPIC Italy training set, n = 1352 | Results on EPIC ITALY test set n = 451 | Results on the validation set | ||||||
---|---|---|---|---|---|---|---|---|
Risk factor/biomarker | Model type | Number of CpGs | Pearson R | P | Validation data sets (N) | Pearson R | P | Validated DNAm surrogate |
BMI | Mixed-effect LASSO | 405 | 0.59 | < 0.0001 | US, TILDA, EXPOsOMICS, GSE174848 (2,045) | 0.27 | < 0.0001 | Yes |
CRP | LASSO | 265 | 0.57 | < 0.0001 | US, TILDA, EXPOsOMICS, GSE174849 (1,893) | 0.23 | < 0.0001 | Yes |
D-dimer | LASSO | 483 | 0.72 | < 0.0001 | EXPOsOMICS, GSE174848 (248) | 0.17 | 0.56 | No |
Diastolic blood pressure | Mixed-effect LASSO | 401 | 0.57 | < 0.0001 | EXPOsOMICS, TILDA (772) | 0.10 | 0.36 | No |
Glucose | Mixed-effect LASSO | 354 | 0.67 | < 0.0001 | EXPOsOMICS, TILDA, US (1,810) | 0.28 | 0.007 | Yes |
HDL cholesterol | Mixed-effect LASSO | 151 | 0.58 | < 0.0001 | EXPOsOMICS, TILDA, US (1,829) | 0.08 | 0.001 | Yes |
Insulin | Mixed-effect LASSO | 574 | 0.66 | < 0.0001 | EXPOsOMICS (170) | 0.44 | < 0.0001 | Yes |
LDL cholesterol | Mixed-effect LASSO | 368 | 0.62 | < 0.0001 | EXPOsOMICS, TILDA (661) | 0.15 | 0.36 | No |
PAI-1 | LASSO | 90 | 0.43 | < 0.0001 | EXPOsOMICS (171) | 0.28 | 0.0001 | Yes |
Systolic blood pressure | Mixed-effect LASSO | 275 | 0.64 | < 0.0001 | EXPOsOMICS, TILDA (772) | 0.28 | 0.001 | Yes |
Tissue factor (CD142) | Mixed-effect LASSO | 197 | 0.62 | < 0.0001 | EXPOsOMICS (171) | 0.16 | 0.03 | Yes |
Total cholesterol | Mixed-effect LASSO | 257 | 0.53 | < 0.0001 | EXPOsOMICS, TILDA, US (1,830) | 0.13 | 0.14 | No |
Triglycerides | LASSO | 471 | 0.73 | < 0.0001 | EXPOsOMICS, TILDA (661) | 0.22 | 0.0003 | Yes |
For each candidate marker, we reported: the model used to extract significant CpGs (LASSO or mixed-effect LASSO depending on the association with the centre of recruitment), the number of CpGs whose linear combination constitute the best marker prediction, the Pearson correlation coefficient and p value in the primary test set (random 25% of EPIC Italy samples), the Pearson correlation coefficient and p value in independent test sets (random effect meta-analysis across studies). Nine out of 13 DNAm surrogates for CVD risk factors/markers were validated in independent testing set (P value for the Pearson correlation test lower than 0.05). The lists of CpGs and their weights to compute DNAm surrogates in independent data sets are provided in Additional file 1