Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2022 Jun 23;11:e73870. doi: 10.7554/eLife.73870

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© 2022, Baranwal, Clark et al

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

PMC Copyright notice

Figure 4. — (a) Sensitivity of metabolite prediction performance ( $R^{2}$ ) to the amount of training data. Training datasets were randomly subsampled 30 times using 50–100% of the total dataset in increments of 10%. Each subsampled training set was subject to 20-fold cross-validation to assess prediction performance. Lineplot of the mean prediction performance over the 30 trials for each percentage of the data. Error bars denote 1 s.d. from the mean. (b) Schematic scatter plot representing how communities containing species A and B define a poorly predicted subsample of the full sample set (c) Heatmap of prediction performance ( $R^{2}$ ) of acetate for each subset of communities containing a given species (diagonal elements) or pair of species (off-diagonal elements). (d) Heatmap of prediction performance for acetate, butyrate, lactate, and succinate. A sample subset containing a given species or pair of species included all communities in which the species were initially present. Predictions for each community were determined using 20-fold cross validation so that for each model the predicted samples were excluded from the training samples. N and p-values are reported in Supplementary file 1.

Figure 4—figure supplement 1. — (a) Sensitivity of metabolite prediction performance ( $R^{2}$ ) to the amount of training data. Training datasets were randomly subsampled 30 times using 50–100% of the total dataset in increments of 10%. Each subsampled training set was subject to 20-fold cross-validation to assess prediction performance. Lineplot of the mean prediction performance over the 30 trials for each percentage of the data. Error bars denote 1 s.d. from the mean. (b) Schematic scatter plot representing how communities containing species A and B define a poorly predicted subsample of the full sample set (c) Heatmap of prediction performance ( $R^{2}$ ) of acetate for each subset of communities containing a given species (diagonal elements) or pair of species (off-diagonal elements). (d) Heatmap of prediction performance for acetate, butyrate, lactate, and succinate. A sample subset containing a given species or pair of species included all communities in which the species were initially present. Predictions for each community were determined using 20-fold cross validation so that for each model the predicted samples were excluded from the training samples. N and p-values are reported in Supplementary file 1.