Skip to main content
. 2021 Feb 25;11:4565. doi: 10.1038/s41598-021-83922-6

Table 3.

Summary of predictive performance of the best ML models for classification tasks.

Phenotype Dataset Model F1 score per class Test F1 score
Test
F1 score
Train
Precision per class Test Recall per class Test Ave
F1-score
CV
Std
F1-score
CV
Menopausal status 62 first samples LightGBM [0.86, 0.95] 0.92 0.98 [1., 0.9] [0.75, 1. ] 0.93 0.06
1200 samples blocked by individual XGBoost [0.89, 0.75] 0.85 1 [0.89, 0.75] [0.89, 0.75] 0.82 0.07

Smoking

status

62 first samples XGBoost [0.89, 0.75] 0.85 0.98 [0.89, 0.75] [0.89, 0.75] 0.72 0.12
1200 samples blocked by individual LightGBM [0.88, 0.10] 0.74 1.0 [0.82, 0.21] [0.94, 0.07] 0.93 0.08

Three ML models (RF, LightGBM, XGboost) have been evaluated on different subsets of the Canada cohort (62 samples taken from each subject at the first time point, and 1200 time series samples blocked by individual). When applied to time series samples, the ML models have been tuned and trained blocking by individual, e.g., samples of the same subjects are not present both in the training and test datasets. The table reports F1-score, precision and recall per class as computed on the test dataset, weighted average F1-score on the test, training datasets and on cross validation. The table reports the performances scores of the best fine-tuned model per dataset and phenotype, while Supplementary Table 2 shows the full list.