Skip to main content
. 2022 Jun 23;11:e73870. doi: 10.7554/eLife.73870

Figure 5. Community metabolite trajectories cluster into qualitatively distinct groups which can be classified based on presence and absence of key microbial species.

(a) Schematic of experiment and network representing a minimal spanning tree across the 95 communities where weights (indicated by edge length) are equal to the Euclidean distance between the metabolite trajectories for each community. Node colors indicate clusters determined as described in the Materials and methods. Red node with black outline annotated with ‘25’ represents the 25-member community. Annotations indicate the most specific microbial species presence/absence rules that describe most data points in the cluster of the corresponding color as determined by a decision tree classifier (Materials and methods). Communities that deviate from the rules for their cluster are indicated with a border matching the color of the closest cluster whose rules they do follow. Network visualization generated using the draw_kamada_kawai function in networkx (v2.1) for Python 3. (b–g) Temporal changes in metabolite concentrations for communities within each cluster (indicated by sub-plot border color), with individual communities denoted by transparent lines. Solid lines and shaded regions represent the mean ±1 s.d. of all communities in the cluster. (h) Schematic of LSTM model training and computation of gradients to evaluate impact of species abundance on metabolite concentrations in a specific community context. (i) Heatmap of model M3 prediction accuracy for four metabolites in the 34 validation communities at each time point (Pearson correlation R2, N=34 for all tests). (j) Heatmap of the gradient analysis of model M3 as described in (h) for the full 25-species community. N and p-values are reported in Supplementary file 1.

Figure 5.

Figure 5—figure supplement 1. Characteristics of the dynamic community behaviors.

Figure 5—figure supplement 1.

(a) Minimal spanning tree of a graph representation of the 180 communities characterized in Figure 3 where each node is a community and each weight is the Euclidean distance between a pair of communities in the 4-dimensional metabolite space to show that the subset of communities characterized in the dynamic experiment was representative of all 180 communities characterized in Figure 3. Blue and red nodes indicate the subset of communities chosen for dynamic characterization and used as training and validation examples for LSTM model M3 in Figure 5. These subsets were chosen by first performing kmeans clustering with k=94 for the 180 communities and identifying the 94 communities closest to each cluster centroid and then repeating this process to subsample 34 for the 94 communities (as the training/validation split). (b) and (c) Scatter plots showing where the clusters from Figure 5a fall in the 48 hr metabolite measurement space for comparison with Figure 3b. Each datapoint represents a community with the color corresponding to the clusters in Figure 5a. Legend indicates the percentage of communities from each cluster that come from the ‘corner’ or ‘distributed’ sets. (d) Decision tree classifier explaining which species’ presence determines the clusters of dynamic community behavior from Figure 5. Annotation indicate the percentage of communities from each cluster that can be explained by the indicated paths, which are also annotated on Figure 5a.
Figure 5—figure supplement 2. Prediction accuracy of model M3 for species abundance.

Figure 5—figure supplement 2.

Heatmap represents R2 for the prediction accuracy of model M3 of the abundance of each species at each time point in the 34 validation communities.
Figure 5—figure supplement 3. Comparison of the discrete generalized Lotka-Volterra model to the LSTM using the same training algorithm.

Figure 5—figure supplement 3.

(a) Schematic detailing the implementation of the discretized gLV model and the addition of a feed-forward neural network to predict metabolites from species abundance, where A is a matrix of species interaction coefficients, r is a vector of growth rates, and ⊙ is the Hadamard product. (b) Schematic of the LSTM model, which uses an LSTM cell to compute a hidden state vector, which is the input to a feed-forward neural network that predicts a vector of species abundances and metabolite concentrations at each time step. See Computational Methods for a detailed description. (c) Scatter plot of experimentally measured (true) and predicted species absolute abundance using the approximate gLV model. gLV model prediction performance (N=2,625) of species abundance on held-out test data after training on the same training data used to fit LSTM model M3. (d) Scatter plot of experimentally measured (true) and predicted species absolute abundance using the LSTM + FFN model (N=2,625). (e) Scatter plot of experimentally measured (true) and predicted metabolite concentrations using the gLV + FFN model (N=105 for every metabolite). (f) Scatter plot of experimentally measured (true) and predicted metabolite concentrations using the LSTM +FFN model (N=105 for every metabolite). Lines denote.x=y.