Figure 4. Out-performing multi-‘omics model identifies microbial, metagenomic, and metabolic biomarkers for ME/CFS compared to controls.
A) Biomarkers from three supervised Gradient Boosting (GDBT) models are shown. Models from top to bottom: species relative abundance, relative abundance of KEGG gene profile, normalized abundance of plasma metabolomics. The top ten most important features in each model are shown together with their general functional class, raw abundance, and variance. From left to right: 1. Functional annotations: species relative abundance model - the metabolic function (capacity of butyrate, tryptophan, and propionate pathway); KEGG gene profile model - the class identification of the enzyme; metabolomics models - the superfamily for the metabolite; 2. Feature importance: features were ranked by their contribution to the model on the y-axis; the x-axis indicates the feature importance value from each model; 3. Average feature abundance in control and patient groups (Figure S4); 4. Variation in mean relative abundance in control and patient groups with coefficient of variation. B) Performance of the classifiers using area under the curve (AUC) was evaluated using 10 randomized and 10-fold cross-validations for each model: species relative abundance (pink), KEGG gene profile (blue) or metabolites (orange) alone, or taken altogether (‘omics, green), which used the combination of the top 30 features from three models. See also Figure S4 and Table S3–5.
