Skip to main content
. 2022 Aug 5;12:36. doi: 10.1038/s41387-022-00216-0

Fig. 1. Data preprocessing and selection of machine learning models.

Fig. 1

Metabolomic data preprocessing work flow (A), accuracy heat map of machine learning model (B), decision tree (D), and its hyper-parameter learning curve (C). Notes: C Maximum depth parameter (max_depth) selection in the decision tree model used hold-out and 10-fold cross-validation based on the hyper-parameter learning curve; D A decision tree model based on the training set to distinguish the healthy control group, DM group, and DR group. Abbreviations: QC quality control, CV Coefficient of variation, KNN K-Nearest Neighbors, GNB Gaussian Naive Bayes, LR Logistics Regression, DT Decision Tree, RF Random Forest, XGB XGBoost, DNN Neural Networks, SVM Support Vector Machine, MEDP545 2-pyrrolidinone, MEDN430 thiamine triphosphate, Control healthy control group, DR diabetic retinopathy group, DM diabetes mellitus without DR group.