Skip to main content
. 2024 Feb 10;12:24. doi: 10.1186/s40168-023-01737-1

Fig. 4.

Fig. 4

Microbiome-metabolite relations are dataset-specific. A Heatmap of significant SCCs between microbes and SCFA over different WGS datasets (ERAWIJANTARI, FRANZOSA, MARS, WANG, YACHIDA). Each row represents a microbe-metabolite pair and each column represents a different dataset. Red/blue colors represent negative/positive correlations. Many relations seem quite consistent. However, practically none of them is consistent over all datasets. B Heatmap of significant SCCs between all common microbes and metabolites over different gastric problems WGS datasets (ERAWIJANTARI, FRANZOSA, MARS, YACHIDA). Similar to A, each row represents a microbe-metabolite pair and each column represents a different dataset. A and B share the same color bar. The rows and columns are clustered. There are 4 different clusters of microbe-metabolite pairs. The first most light gray one consists of inconsistent pairs that tend to be positively related, the second darker gray cluster consists of equally inconsistent pairs, the third darker gray cluster consists of negatively correlated consistent pairs, and the last darkest cluster consists of inconsistent pairs that tend to be negative. The pair’s names in each cluster can be found in Supplementary material Table S5. C Heatmap of SCC between microbes and metabolites over different datasets (He, Kim, and Jacob) vs. the relations that are reported in the literature. The relations vary between different datasets and do not preserve the known relations from the literature. D The core microbiome. There are about 20 orders which are common to most of the datasets. These orders are also the most frequent taxa in the population of the cohorts. The x-axis represents the fraction of the population in the order that exists in each cohort. If the order appears in all the populations of all the cohorts, it sums to 10. The y-axis represents the different orders. Each color represents a cohort. E Swarm plot of LOCATE’s predicted metabolites SCCs in the cross-times test over the Direct Plus cohort. The dark blue points represent the SCCs of the prediction within a time point, referred to as “Internal,” where only one time point was used for the training and the testing, by the 10 CV approach. The light blue points represent the SCCs of the prediction between time points, where LOCATE is trained on one-time point (T0) and is tested on another one (T6). There is a decrease in the accuracy of the between-time points prediction. The stars follow the previous figure. For similar results on other time steps, see Supplementary material Fig. S4A–C. FH Swarm plots of all of the cross-datasets predictions between couples of datasets on the shared metabolites and microbes, He-Direct Plus (F), He-Kim (G), He-Jacob (H); for similar results on the other pairs, see Supplementary material Fig. S4D–F. Each model is applied twice. First, it is trained on the intersection of the microbiome and metabolites of the pair but predicts on an internal test of the same dataset, “in-learning” (the dark points, referred to as “model-in”), then each model is trained on one dataset and is tested on the other dataset, “ex-learning” (the light points, referred to as “model-ex”). Training on one dataset and testing on another drastically decreases the performance of all the models, including LOCATE. However, LOCATE is still the significantly best model in most of the comparisons