Skip to main content
iMeta logoLink to iMeta
. 2022 Mar 1;1(1):e3. doi: 10.1002/imt2.3

Predicting microbiome compositions from species assemblages through deep learning

Sebastian Michel‐Mata 1,2, Xu‐Wen Wang 3, Yang‐Yu Liu 3,, Marco Tulio Angulo 4,
PMCID: PMC9221840  NIHMSID: NIHMS1812831  PMID: 35757098

Abstract

Microbes can form complex communities that perform critical functions in maintaining the integrity of their environment or their hosts' wellbeing. Rationally managing these microbial communities requires improving our ability to predict how different species assemblages affect the final species composition of the community. However, making such a prediction remains challenging because of our limited knowledge of the diverse physical, biochemical, and ecological processes governing microbial dynamics. To overcome this challenge, we present a deep learning framework that automatically learns the map between species assemblages and community compositions from training data only, without knowing any of the above processes. First, we systematically validate our framework using synthetic data generated by classical population dynamics models. Then, we apply our framework to data from in vitro and in vivo microbial communities, including ocean and soil microbiota, Drosophila melanogaster gut microbiota, and human gut and oral microbiota. We find that our framework learns to perform accurate out‐of‐sample predictions of complex community compositions from a small number of training samples. Our results demonstrate how deep learning can enable us to understand better and potentially manage complex microbial communities.

Keywords: deep learning, microbiome composition, species assemblage


A deep learning method to predict microbial compositions from species assembalges.

graphic file with name IMT2-1-e3-g001.jpg

Highlights

  • A deep learning framework was developed to predict community compositions from species assemblages.

  • The framework does not require knowing any microbial dynamics.

  • Validation in various data shows accurate predictions.

INTRODUCTION

Microbes can form complex multispecies communities that perform critical functions in maintaining the integrity of their environment [1,2] or the well‐being of their hosts [3, 4, 5, 6]. For example, microbial communities play key roles in nutrient cycling in soils [7] and crop growth [8]. In humans, the gut microbiota plays important roles in our nutrition [9], immune system response [10], pathogen resistance [11], and even our central nervous system response [5]. Still, species invasions (e.g., pathogens) and extinctions (e.g., due to antibiotic administration) produce changes in the species assemblages that may shift these communities to undesired compositions [12]. For instance, antibiotic administrations can shift the human gut microbiota to compositions making the host more susceptible to recurrent infections by pathogens [13]. Similarly, intentional changes in the species assemblages, such as by using fecal microbiota transplantations, can shift back these communities to desired “healthier” compositions [14,15]. Therefore, improving our ability to rationally manage these microbial communities requires that we can predict changes in the community composition based on changes in species assemblages [16]. Building these predictions would also reduce managing costs, helping us to predict which changes in the species' assemblages are more likely to yield a desired community composition. Unfortunately, making such a prediction remains challenging because of our limited knowledge of the diverse physical [17], biochemical [18], and ecological [19,20] processes governing the microbial dynamics.

To overcome the above challenge, we present a deep learning framework that automatically learns the map between species assemblages and community compositions from training data only, without knowing the underlying microbial dynamics. We systematically validated our framework using synthetic data generated by classical ecological dynamics models, demonstrating its robustness to changes in the system dynamics and measurement errors. Then, we applied our framework to real data of both in vitro and in vivo communities, including ocean and soil microbial communities [21,22], Drosophila melanogaster gut microbiota [23], and human gut [24] and oral microbiota [25]. Across these diverse microbial communities, we find that our framework learns to predict accurate out‐of‐sample compositions given a few training samples. Our results show how deep learning can be an enabling ingredient for understanding and managing complex microbial communities.

PREDICTING MICROBIOME COMPOSITIONS FROM SPECIES ASSEMBLAGES

Consider the pool Ω={1,,N} of all microbial species (or taxa) that can inhabit an ecological habitat of interest, such as the human gut. A microbiome sample obtained from this habitat can be considered as a local community assembled from Ω with a particular species assemblage. The species assemblage of a sample is characterized by a binary vector z{0,1}N, where its ith entry zi satisfies zi=1 (or zi=0) if the ith species is present (or absent) in this sample. Each sample is also associated with a composition vector pΔN, where its ith entry pi is the relative abundance of the ith species, and ΔN=pR0Nipi=1 is the probability simplex. Therefore, our problem can be formalized as learning the map

φ:z{0,1}NpΔN, (1)

which assigns the composition vector p=φ(z) based on the species assemblage z. Note that the above map depends on many physical, biochemical, and ecological processes influencing the dynamics of microbial communities. These processes include the spatial structure of the ecological habitat [17], the chemical gradients of available resources [18], and inter/intraspecies interactions [20], among many others. Therefore, our limited knowledge of all these processes for most microbial communities renders the map of Equation (1) highly uncertain.

Next, we show it is possible to predict the microbial composition from species assemblage without knowing the mechanistic details of all the above processes. Our approach consists in learning the map φ directly from a data set D with S microbiome samples. We arrange each of those samples as a pair (z,p) satisfying the map of Equation (1), see Figure 1A. Note that microbiome samples are readily available using standard metagenomic sequencing techniques.

Figure 1.

Figure 1

A deep learning framework to predict microbiome compositions from species assemblages. We illustrate this framework using experimental data from a pool of N=5 bacterial species in Drosophila melangaster gut microbiota [23]: Lactobacillus plantarum (blue), Lactobacillus brevis (pink), Acetobacter pasteurianus (yellow), Acetobacter tropicalis (green), and Acetobacter orientalis (purple). (A) We randomly split this data set into training (D1) and test (D2) data sets, which contain 80% and 20% of the samples, respectively. Each data set contains pairs (z,p) with the species assemblage z{0,1}N (top) and its corresponding composition pΔN (bottom) from each sample. (B) To predict compositions from species assemblages, our compositional neural ordinary differential equation (cNODE) framework consists of a solver for the ordinary differential equation shown in Equation (2), together with a chosen parametrized function fθ. During training, the parameters θ are adjusted to learn to predict the composition pˆΔN of the species assemblage z{0,1}N in D1. (C) After training, the performance is evaluated by predicting the composition of never‐seen‐before species assemblages in the test data set D2. In this experimental microbiota, cNODE learned to perform accurate predictions of the composition in the test data set. For example, in the assemblage of species 3 and 4 (sample 26), cNODE correctly predicts that the composition is strongly dominated by a single species

Conditions for predicting compositions from species assemblages

To ensure that the problem of learning φ from D is mathematically well‐posed, we make the following assumptions. First, we assume that the species pool in the habitat has universal dynamics [26] (i.e., different local communities of this habitat can be described by the same population dynamics model with the same parameters). This assumption is necessary because, otherwise, the map φ does not exist, implying that predicting community compositions from species assemblages has to be done in a sample‐specific manner, which is a daunting task. The universal dynamics assumption will be satisfied when samples in the data set were collected from similar environments. Indeed, in this case, the environmental factors can be treated as roughly fixed and hence need not be used for composition prediction. For in vitro communities, the universal dynamics assumption is satisfied if samples were collected from the same experiment or multiple experiments but with very similar environmental conditions. For in vivo communities, empirical evidence indicates that the human gut and oral microbiota of healthy adults, as well as certain environment microbiota, display strong universal dynamics [26].

Second, we assume that the compositions of the collected samples represent steady states of the microbial communities. This assumption is natural because the map φ is not well defined for highly fluctuating microbial compositions. We note that observational studies of host‐associated microbial communities such as the human gut microbiota indicate that they remain close to stable steady states in the absence of drastic dietary change or antibiotic administrations [24,27,28].

Finally, we assume that for each species assemblage z{0,1}N there is a unique steady‐state composition pΔN. In particular, this assumption requires that true multistability does not exist for the species pool (or any subset of it) in this habitat. This assumption is required because, otherwise, the map φ is not injective, and the prediction of community compositions becomes mathematically ill‐defined.

In practice, we expect that the above three assumptions cannot be strictly satisfied. Therefore, any algorithm that predicts microbial compositions from species assemblages needs to be systematically tested to ensure its robustness against errors due to the violation of such approximations. Note that we can a priori check if a microbiome data set satisfies the universal dynamics assumption using the Dissimilarity‐Overlap analysis [26]. Yet, it is mathematically challenging to a priori check if the other two assumptions are satisfied for real data. Nevertheless, the ability to accurately predict microbiome compositions from species assemblage is a posteriori evidence of the validity of the above three assumptions.

Learning to predict species compositions

Consider building a map φˆθ:{0,1}NΔN, parametrized by θRp, giving the predicted composition pˆ=φˆθ(z) associated with the species assemblage z. Under the above assumptions, we can in principle learn the map of Equation (1) from the data set D by training φˆθ (i.e., adjusting its parameters θ to ensure that φˆθ approximates φ). Existing deep learning network architectures and training methods [29,30], such as ResNet [31] trained with a gradient descent algorithm, are natural candidates to solve this problem (Methods Section). We found that it is possible to train a ResNet architecture for predicting microbiome compositions in simple cases like small in vitro communities (Supporting Information Note S.2.1). But for large in vivo communities like the human gut microbiota, ResNet does not perform very well (Figure S1). The poor performance of ResNet is likely due to a vanishing gradient problem during training [32]. Namely, the ResNet architecture must satisfy two restrictions that are very particular to the map of Equation (1). First, the predicted compositions pˆ must be compositional (i.e., pˆΔN). Second, the predicted relative abundance of any absent species in the assemblage must be identically zero (i.e., zi=0 should imply that pˆi=0).

To overcome the limitations of traditional deep learning frameworks based on neural networks (such as ResNet) in predicting microbial compositions from species assemblages, we developed cNODE (compositional Neural Ordinary Differential Equation, see Methods Section and Figure 1B). We design the cNODE framework using the notion of Neural Ordinary Differential Equations, which can be interpreted as a continuous limit of ResNet architecture [33]. Crucially, the architecture and initialization of cNODE ensure that the above two restrictions are satisfied by construction. Furthermore, cNODE's architecture naturally circumvents the typical difficulties of handling zero values associated with compositional data analysis. Zero abundance values often occur in human microbiome datasets because of their highly personalized compositions across hosts (i.e., different individuals tend to have different species assemblages). To evaluate the prediction error of cNODE, one can choose any dissimilarity measure between the predicted and actual compositions related to a given species assemblage. Once this dissimilarity measure is selected, we train cNODE using a meta‐learning algorithm for a given number of epochs to minimize the average prediction error in a training data set D1 (Methods Section). Using this meta‐learning algorithm improves the ability of cNODE for predicting the composition of never‐seen‐before species assemblages. Once trained, we evaluate the performance of cNODE by calculating its average prediction error in a test data set D2 containing samples not used during the training.

Figure 1 illustrates the application of cNODE in a small experimental community of N=5 bacterial species of Drosophila melanogaster microbiota studied by Gould et al. [23]. The data set D obtained from this study has S=26 samples (Methods Section). To illustrate the potential of cNODE, we consider a training data set of 21 randomly chosen samples (Figure 1A). As explained before, we arrange each training sample as a pair of “species assemblage” z (top) and “composition” p (bottom). Once trained, the main use of cNODE is to predict the composition of “never‐seen‐before” species assemblages —namely, “test assemblages” that are not in the training data set. To evaluate the performance of cNODE for predicting such test assemblages, we use as test data set the remaining five experimental samples not included during training. Figure 1C shows that the trained cNODE predicts accurate compositions for the test species assemblages. For example, cNODE predicts that in the assemblage of species 3 with species 4 (which was not used for training), species 3 will become nearly extinct. This prediction agrees well with the actual experimental result (sample 26 in Figure 1C).

RESULTS

In silico validation of cNODE with large species pools

We first evaluated cNODE's performance using in silico microbiome samples generated as steady‐state compositions of pools with N=100 species and Generalized Lotka‐Volterra (GLV) population dynamics (Methods Section). We characterize the population dynamics of a species pool using two parameters. First, the connectivity C>0, characterizing how likely is that two species in the pool interact directly. Second, the typical interaction strength σ0, characterizing the typical effect of one species over the per‐capita growth rate of another species if they interact. Different habitats where the species pool is assembled are thus represented by different parameters (C,σ). Note that, despite its simplicity, the GLV model successfully describes the population dynamics of microbial communities in diverse environments, from the soil [39] and lakes [40] to the human gut [11,41,42].

Figure 2A shows the performance of cNODE during training. The training and test datasets have S=N samples for this panel. Note that the training prediction error decreases with the number of training epochs, especially for low values of σ. Interestingly, the test prediction error reaches a plateau after sufficient training epochs, regardless of the value of σ. This plateau implies that cNODE was adequately trained with low overfitting. Note that the plateau's value increases with σ (i.e., the test prediction error increases). This result remains valid for different training data set sizes and different values for the parameters (C,σ). In all these cases, the test prediction error reaches a plateau whose value increases both by increasing C (Figure 2B) or σ (Figure 2C). But, crucially, such an increase can be compensated by increasing the number of samples in the training data set. This result implies that, in general, cNODE requires a larger number of training samples in species pools with higher connectivity or higher typical interaction strength between species. Overall, these results suggest that using S=2N or more training samples is enough to adequately train cNODE, regardless of the habitat type. In this case, we also observe a high correlation between the true and predicted compositions in the test data set, as expected from a low test prediction error (Figure S2).

Figure 2.

Figure 2

In silico validation of compositional neural ordinary differential equation (cNODE). Results are for pools of N=100 species with Generalized Lotka‐Volterra population dynamics (A–E) or population dynamics model with nonlinear functional responses that admits true multistability (F). The population dynamics is characterized by two parameters: the connectivity C>0 and the typical interaction strength σ0. In panels B–F, thin lines represent the prediction errors for 10 validations of training cNODE with a different data set. Mean errors are shown with thick lines. (A) Training cNODE using S=N samples with connectivity C=0.5 and different typical interaction strengths σ. (B) Performance of cNODE for in‐silico data sets with σ=0.1 and different connectivity C. (C) Performance of cNODE for in‐silico datasets with C=0.5 and different interaction strengths σ. (D) Performance of cNODE for in silico data sets with nonuniversal dynamics. The lack of universal dynamics is quantified by the value of η. For all datasets, σ=0.1 and C=0.5. (E) Performance of cNODE for in‐silico data sets with measurement errors quantified by ε. For all data sets, σ=0.1 and C=0.5. (F) Performance of cNODE for in‐silico data sets with multiple interior equilibria, quantified by the probability μ[0,1] of finding multiple equilibria. For all data sets, C=0.5,σ=0.1

To systematically evaluate the robustness of cNODE against violation of its three key assumptions, we performed three types of validations. In the first validation, we generated datasets that violate the assumption of universal dynamics (Methods Section). In this case, if two species interact, the effect of one species over the per‐capita growth rate of the other species changes on average by η0 among samples in the data set. Therefore, the value η=0 corresponds to universal dynamics, and larger values of η correspond to more significant losses of universal dynamics. We find that cNODE is robust against universality loss as its asymptotic prediction error changes continuously and maintains a reasonably low test prediction error up to η=0.4 (Figure 2D). cNODE is also robust to losses of universal dynamics that occur when species interact with different species in a sample‐specific manner (Figure S3).

In the second validation, we evaluated the robustness of cNODE against measurement noises in the relative abundance of species (Methods Section). We characterize the noise intensity by a constant ε0. The measurement noise may cause some absent species to be measured as present and vice‐versa. We find that cNODE performs adequately up to ε=0.025 (Figure 2E).

In the final validation, we generated datasets with true multistability by simulating a population dynamics model with nonlinear functional responses (Methods Section). For each species assemblage, these functional responses generate two interior equilibria in different “regimes”: one regime with low biomass and the other with high biomass. Therefore, each species assemblage can have two associated compositions. We built training datasets by choosing a fraction (1μ) of samples from the first regime and the rest from the second regime. We find that cNODE is robust enough to provide reasonable predictions up to μ=0.2 (Figure 2F).

cNODE predicts microbiome compositions in real microbial communities

We evaluated cNODE using six microbiome datasets of different habitats (Supporting Information Note S4). The first data set consists of S=275 samples [43] of the ocean microbiome at the phylum taxonomic level, resulting in N=73 different taxa. The second data set consists of S=26 in vivo samples of Drosophila melanogaster gut microbiota of N=5 species [23], as described in Figure 1. The third data set has S=93 samples of in vitro communities of N=8 soil bacterial species [21]. The fourth data set contains S=113 samples of the Central Park soil microbiome [22] at the phylum level (N=36 phyla). The fifth data set contains S=150 samples of the human oral microbiome [25] at the genus level (N=73 genera). The final data set has S=106 samples of the human gut microbiome from the Human Microbiome Project [24] at the genus level (N=58 genera). Note that for each data set, to ensure cNODE has enough training samples, we chose to work at a specific taxonomic level so that the number of samples S2N, where N is the total number of taxa at the specific taxonomic level. Note that, based on the Dissimilarity‐Overlap analysis, all the six microbiome datasets display the signature of universal microbial dynamics to some extent (Supporting Information Note S4.5 and Figure S4).

To evaluate cNODE, we performed the leave‐one‐out cross‐validation on each data set (Methods Section). The median test prediction errors were 0.06, 0.066, 0.079, 0.107, 0.211, and 0.242 for the six datasets, respectively (Figure 3A). These errors are consistent with the strength of universality observed in each data set. To understand the meaning of these errors, for each data set we inspected five pairs (p,pˆ) corresponding to the observed and out‐of‐sample predicted composition of five samples. We chose the five samples based on their test prediction error. Specifically, we selected those samples with the minimal error, close to the first quartile, closer to the median, closer to the third quartile, and with the maximal error (columns in Figure 3B–G, from left to right). We found that samples with errors below the third quartile provide acceptable predictions (left three columns in Figure 3B–G), while samples with errors close to the third quartile or with the maximal error do demonstrate salient differences between the observed and predicted compositions (right two columns in Figure 3B–G). Note that in the sample with largest error of the human gut data set (Figure 3G, rightmost column), the observed composition is dominated by Prevotella (pink) while the predicted sample is dominated by Bacteroides (blue). This drastic difference is likely due to different dietary patterns [44]. These results also confirm that 2N or more training samples are enough to adequately train cNODE, regardless of the habitat type. Note that using other taxonomic levels in these experimental datasets may change the performance of cNODE because it will effectively change the sample size.

Figure 3.

Figure 3

Predicting the composition of real microbiomes from species assemblages. Results of the compositional neural ordinary differential equation applied to six experimental microbial communities using a leave‐one‐out crossvalidation. (A) Prediction error obtained from a leave‐one‐out crossvalidation of each data set. (B–G) For each data set, we show the true and predicted compositions corresponding to the minimal prediction error, closer to the first quartile, median, closer to the third quartile, maximum prediction error (including outliers). All compositions shown in (B–G) are out‐of‐sample predictions

DISCUSSION

cNODE is a deep learning framework to predict microbial compositions from species assemblages only. We validated its performance using in silico, in vitro, and in vivo microbial communities, finding that cNODE learns to perform accurate out‐of‐sample predictions using a few training samples. Classic methods for predicting species abundances in microbial communities use inference based on population dynamics models [21,41,45,46]. However, these methods typically require high‐quality time‐series data of species absolute abundances, which can be difficult and expensive to obtain in vivo microbial communities. cNODE circumvents needing absolute abundances or time‐series data. However, compared to the classic methods, the cost to pay is that cNODE cannot be mechanistically interpreted because of the lack of identifiability inherent to compositional data [47,48]. We also found that cNODE can outperform existing deep‐learning architectures like ResNet, specially when predicting the composition of large in‐vivo microbiomes. Recently, Maynard et al. [49] proposed a statistical method to predict the steady‐state abundance in ecological communities [49]. This method requires absolute abundance data of species, which are not available in most microbiome datasets. cNODE can outperform this statistical method despite using only relative abundances (Supporting Information Note S6). See also Supporting Information Note S5 and Figure S5 for a discussion of how our framework compares to other related works.

Deep learning techniques are actively applied in microbiome research [50, 51, 52, 53, 54, 55, 56, 57, 58], such as for classifying samples that shifted to a diseased state [59], predicting infection complications in immunocompromised patients [60], or predicting the temporal or spatial evolution of certain species collection [61,62]. However, to the best of our knowledge, the potential of deep learning for predicting the effect of changing species assemblages was not explored nor validated before. Our framework, based on the notion of neural ODE [33], is a baseline that could be improved by incorporating additional information. For example, incorporating available environmental information such as pH, temperature, age, BMI, body‐site, and host's diet could enhance the prediction accuracy. This additional information would help us predict the species present in different environments. Adding “hidden variables” such as the unmeasured total biomass or unmeasured resources to our ODE will enhance the expressivity of the cNODE [63,64], but this may result in more challenging training. Finally, if available, knowledge of the genetic similarity between species can be leveraged into the loss function by using the phylogenetic Wasserstein distance [65] that provides a well‐defined gradient [66].

We anticipate that a useful application of our framework is to predict if by adding some species collection to a local community we can bring the abundance of target species below the practical extinction threshold. Thus, given a local community containing the target (and potentially pathogenic) species, we could use a greedy optimization algorithm to identify a minimal collection of species to add such that our architecture predicts that they will decolonize the target species.

Our framework has a few limitations. For example, cNODE cannot accurately predict the abundance of taxa that have never been observed in the training data set. An additional limitation of our current architecture is that it assumes that true multistability does not exist—namely, a community with a given species assemblage permits only one stable steady‐state, where each species in the collection has a positive abundance. For complex microbial communities such as the human gut microbiota, the highly personalized species collections make it very difficult to decide if true multistability exists or not. We could extend our framework to handle multistability by predicting a probability density function for the abundance of each species. True multistability would correspond to predicting a multimodal density function in such a case. Datasets with insufficient sequencing depth or coverage can produce samples with “fake” multistability, leading to prediction errors that our framework cannot resolve. Indeed, the in‐silico validation of cNODE in Figure 2 indicates that measurement errors can significantly degrade the performance of cNODE.

In conclusion, the many species and the complex, uncertain dynamics that microbial communities exhibit, have been fundamental obstacles in our ability to learn how they respond to alterations, such as removing or adding species. Moving this field forward may require losing some ability to interpret the mechanism behind their response. In this sense, deep learning methods could enable us to rationally manage and predict complex microbial communities' dynamics.

METHODS

A ResNet architecture for predicting microbiome compositions from species assemblages

As a top‐rated tool in image processing, ResNet is a cascade of L1 hidden layers where the state hRN of the th hidden layer satisfies h=h1+fθ(h1),=1,,L, for some parametrized function fθ with parameters θ. These hidden layers are plugged to the input h0=gin(z) and the output pˆ=gout(hL) layers, where gin and gout are some functions. Crucially, for our problem, any architecture must satisfy two restrictions: (1) vector pˆ must be compositional (i.e., pˆΔN); and (2) the predicted relative abundance of any absent species must be identically zero (i.e., zi=0 should imply that pˆi=0). Simultaneously satisfying both restrictions requires that the output layer is a normalization of the form pˆi=zihL,ijzjhL,j, and that fθ is a non‐negative function (because hL0 is required to ensure the normalization is correct). This result is likely due to the normalization of the output layer, which challenges the training of neural networks because of vanishing gradients [30]. The vanishing gradient problem is often solved by using other normalization layers such as the softmax or sparsemax layers [34]. However, we cannot use these layers because they do not satisfy the second restriction. We also note that ResNet becomes a universal approximation only in the limit L, which again complicates the training [32].

The cNODE architecture

In cNODE, an input species assemblage z{0,1}N is first transformed into the initial condition h(0)=z1zΔN, where 1=(1,,1)RN (left in Figure 1B). This initial condition is used to solve the set of nonlinear ODEs

dh(τ)dτ=h(τ)[fθ(h(τ))1h(τ)fθ(h(τ))]. (2)

Here, the independent variable τ0 represents a virtual “time”. The expression hv is the entry‐wise multiplication of the vectors h,vRN. The function fθ:ΔNRN can be any continuous function parametrized by θ. For example, it can be the linear function fθ(h)=Θh with parameter matrix ΘRN×N (bottom in Figure 1B), or a more complicated function represented by a feedforward deep neural network. Note that Equation (2) is a general form of the replicator equation—a canonical model in evolutionary game theory [35]—with fθ representing the fitness function. By choosing a final integration “time” τc>0, Equation (2) is numerically integrated to obtain the prediction pˆ=h(τc) that is the output of cNODE (right in Figure 1B). We choose τc=1 without loss of generality, as τ in Equation (2) can be rescaled by multiplying fθ by a constant. The cNODE thus implements the map

φˆθ:z{0,1}NpˆΔN, (3)

taking an input species assemblage z to the predicted composition pˆ (see Supporting Information Note S.1 for implementation details). Note that Equation (2) is key to cNODE because its architecture guarantees that the two restrictions imposed before are naturally satisfied. Namely, pˆΔN because the conditions h(0)ΔN and 1dhdτ=0 imply that h(τ)ΔN for all τ0. Additionally, zi=0 implies pˆi=0 because h(0) and z have the same zero pattern, and the right‐hand side of Equation (2) is entry‐wise multiplied by h.

Training cNODE

We train cNODE by adjusting the parameters θ to approximate φ with φˆθ. To do this, we first choose a distance or dissimilarity measure d(p,q) to quantify how dissimilar are two compositions p,qΔN. We choose the Bray‐Curtis [36] dissimilarity to present our results, however, the performance of cNODE is quite robust to the specific distance or dissimilarity measure used (Figure S6). Specifically, for a data set DiD, we use as loss function the prediction error

E(Di)=1Di(z,p)Did(p,φˆθ(z)). (4)

Second, we randomly split the data set D into training D1 and test D2 datasets. Next, we choose an adequate functional form for fθ. In our experiments, we found that the linear function fθ(h)=Θh,ΘRN×N, provides accurate predictions for the composition of in silico, in vitro, and in vivo communities. Importantly, despite fθ is linear, the map φˆθ is nonlinear because it is the solution of the nonlinear ODE of Equation (2). Finally, we adjust the parameters θ by minimizing Equation (4) on D1 using a gradient‐based meta‐learning algorithm [37]. This learning algorithm enhances the generalizability of cNODE (Supporting Information Note S1.2 and Figure S1). Training cNODE with a data set of 100 species, 100 training samples, and 100 epochs takes about 30 min on a Linux machine with six Intel Xeon CPUs (E7‐4870 v2) @ 2.30 GHz.

Once trained, we calculate cNODE's test prediction error E(D2) that quantifies cNODE's performance in predicting the compositions of never‐seen‐before species assemblages. Test prediction errors could be due to a poor adjustment of the parameters (i.e., inaccurate prediction of the training set), low ability to generalize (i.e., inaccurate predictions of the test data set), or violations of our three assumptions (universal dynamics, steady‐state samples, no true multistability).

Generating in‐silico data for validating cNODE

We generated in silico data for validating cNODE as steady‐state compositions of pools with N species and generalized Lotka‐Volterra (GLV) population dynamics. The GLV model reads [38]:

dxi(t)dt=xi(t)ri+j=1Naijxj(t),i=1,,N. (5)

Above, xi(t) denotes the abundance of the ith species at time t0. The GLV model has as parameters the interaction matrix A=(aij)RN×N, and the intrinsic growth‐rate vector r=(ri)RN. The parameter aij denotes the inter‐ (if ji) or intra‐ (if j=i) species interaction strength of species j to the per‐capita growth rate of species i. The parameter ri is the intrinsic growth rate of species i. The interaction matrix A determines the ecological network G(A) underlying the species pool. Namely, this network has one node per species and edges (ji)G(A) if aij0. The connectivity C[0,1] of this network is the proportion of edges it has compared to the N2 edges in a complete network.

To validate cNODE, we generated synthetic microbiome samples as steady‐state compositions of GLV models with random parameters by choosing aij~Bernoulli(C)Normal(0,σ) if ij,aii=1, and ri~Uniform[0,1], for different values of connectivity C and characteristic inter‐species interaction strength σ>0 (Supporting Information Note S3).

Generating in silico data to test the robustness of cNODE

For this, given a “base” GLV model with parameters (A,r), we consider two forms of universality loss (Supporting Information Note S3). First, samples are generated using a GLV with the same ecological network but with those non‐zero interaction strengths aij replaced by aij+Normal(0,η), where η>0 characterizes the changes in the typical interaction strength. Second, samples are generated using a GLV with slightly different ecological networks obtained by randomly rewiring a proportion ρ[0,1] of their edges.

In the second validation, we evaluated the robustness of cNODE against measurement noises in the relative abundance of species. For this, for each sample p, we first change the relative abundance of the ith species from pi to max{0,pi+Normal(0,ε)}, where ε0 characterizes the measurement noise intensity. Then, we normalize the vector p to ensure it is still compositional, that is, pΔN. Due to the measurement noise, some species that were absent may be measured as present and vice‐versa.

In the third validation, we generated datasets with true multistability by simulating a population dynamics model with nonlinear functional responses (Supporting Information Note S3). For each species collection, these functional responses generate two interior equilibria in different “regimes”: one regime with low biomass, and the other regime with high biomass. We then train cNODE with datasets obtained by choosing a fraction (1μ) of samples from the first regime, and the rest from the second regime.

Validating cNODE using real microbiome data sets

To validate cNODE, we performed a leave‐one‐out cross‐validation over real microbiome data sets (see descriptions on Supporting Information Note S4). For each data set, we measured the prediction error of cNODE using each sample as a test set and the rest of the samples as a training set. We repeated this procedure for different learning rates and mini‐batch sizes and selected the hyperparameters that minimized the average prediction error over the samples (see Table S1).

CONFLICT OF INTERESTS

The authors declare that there are no conflict of interests.

AUTHOR CONTRIBUTIONS

Marco Tulio Angulo and Yang‐Yu Liu conceived and designed the project. Sebastian Michel‐Mata did the numerical analysis. Sebastian Michel‐Mata and Xu‐Wen Wang performed the real data analysis. All authors analyzed the results. Marco T. Angulo and Yang‐Yu Liu wrote the manuscript. Sebastian Michel‐Mata and Xu‐Wen Wang edited the manuscript.

Supporting information

Supplementary Information

IMT2-1-e3-s001.pdf (1.6MB, pdf)

ACKNOWLEDGMENTS

Marco Tulio Angulo gratefully acknowledges the financial support from CONACyT A1‐S‐13909 and PAPIIT 104915, México. Yang‐Yu Liu acknowledges the funding support from the National Institutes of Health (R01AI141529, R01HD093761, RF1AG067744, UH3OD023268, U19AI095219, and U01HL089856).

Michel‐Mata, Sebastian , Wang Xu‐Wen, Liu Yang‐Yu, and Angulo Marco Tulio. 2022. “Predicting Microbiome Compositions from Species Assemblages Through Deep Learning.” iMeta 1, e3. 10.1002/imt2.3

Contributor Information

Yang‐Yu Liu, Email: yyl@channing.harvard.edu.

Marco Tulio Angulo, Email: mangulo@im.unam.mx.

DATA AVAILABILITY STATEMENT

The data and code used in this study are available at https://github.com/michel-mata/cNODE.jl.

REFERENCES

  • 1. East, Roger . 2013. “Soil Science Comes to Life.” Nature 501: S18. 10.1038/501S18a [DOI] [PubMed] [Google Scholar]
  • 2. Busby, Posy E. , Soman Chinmay, Wagner Maggie R., Friesen Maren L., Kremer James, Bennett Alison, Morsy Mustafa, Eisen Jonathan A., Leach Jan E., and Dangl Jeffery L.. 2017. “Research Priorities for Harnessing Plant Microbiomes in Sustainable Agriculture.” PLOS Biology 15: e2001793. 10.1371/journal.pbio.2001793 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Thursby, Elizabeth , and Juge Nathalie. 2017. “Introduction to the Human Gut Microbiota.” Biochemical Journal 474: 1823–36. 10.1042/BCJ20160510 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Chittim, Carina L. , Irwin Stephania M., and Balskus Emily P.. 2018. “Deciphering Human Gut Microbiota‐Nutrient Interactions: A Role for Biochemistry.” Biochemistry 57: 2567–77. 10.1021/acs.biochem.7b01277 [DOI] [PubMed] [Google Scholar]
  • 5. Cryan, John F. , and Dinan Timothy G.. 2012. “Mind‐altering Microorganisms: The Impact of the Gut Microbiota on Brain and Behaviour.” Nature Reviews Neuroscience 13: 701. 10.1038/nrn3346 [DOI] [PubMed] [Google Scholar]
  • 6. Arrieta, Marie‐Claire , Stiemsma Leah T., Amenyogbe Nelly, Brown Eric M., and Finlay Brett. 2014. “The Intestinal Microbiome in Early Life: Health and Disease.” Frontiers in Immunology 5: 427. 10.3389/fimmu.2014.00427 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Fierer, Noah . 2017. “Embracing the Unknown: Disentangling the Complexities of the Soil Microbiome.” Nature Reviews Microbiology 15: 579–90. 10.1038/nrmicro.2017.87 [DOI] [PubMed] [Google Scholar]
  • 8. Mueller, Ulrich G. , and Sachs Joel L.. 2015. “Engineering Microbiomes to Improve Plant and Animal Health.” Trends in Microbiology 23: 606–17. 10.1016/j.tim.2015.07.009 [DOI] [PubMed] [Google Scholar]
  • 9. Gill, Steven R. , Pop Mihai, DeBoy Robert T., Eckburg Paul B., Turnbaugh Peter J., Samuel Buck S., Gordon Jeffrey I., Relman David A., Fraser‐Liggett Claire M., and Nelson Karen E.. 2006. “Metagenomic analysis of the human distal gut microbiome.” Science 312: 1355–9. 10.1126/science.1124234 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Hooper, Lora V. , Littman Dan R., and Macpherson Andrew J.. 2012. “Interactions Between the Microbiota and the Immune System.” Science 336: 1268–73. 10.1126/science.1223490 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Buffie, Charlie G. , Bucci Vanni, Stein Richard R., McKenney Peter T., Ling Lilan, Gobourne Asia, No Daniel, Liu Hui, Kinnebrew Melissa, Viale Agnes, et al. 2015. “Precision Microbiome Reconstitution Restores Bile Acid Mediated Resistance to Clostridium Difficile.” Nature 517: 205–8. 10.1038/nature13828 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Costello, Elizabeth K. , Stagaman Keaton, Dethlefsen Les, Bohannan Brendan J. M, and Relman David A.. 2012. “The Application of Ecological Theory Toward an Understanding of the Human Microbiome.” Science 336: 1255–62. 10.1126/science.1224203 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Lemon, Katherine P. , Armitage Gary C., Relman David A., and Fischbach Michael A.. 2012. “Microbiota‐Targeted Therapies: An Ecological Perspective.” Science Translational Medicine 4: 137rv5. 10.1126/scitranslmed.3004183 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Borody, Thomas J. , Paramsothy Sudarshan, and Agrawal Gaurav. 2013. “Fecal Microbiota Transplantation: Indications, Methods, Evidence, and Future Directions.” Current Gastroenterology Reports 15: 1–7. 10.1007/s11894-013-0337-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Xiao, Yandong , Angulo Marco Tulio, Lao Songyang, Weiss Scott T., and Liu Yang‐Yu. 2020. “An Ecological Framework to Understand the Efficacy of Fecal Microbiota Transplantation.” Nature Communications 11: 1–17. 10.1038/s41467-020-17180-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Widder, Stefanie , Allen Rosalind J., Pfeiffer Thomas, Curtis Thomas P., Wiuf Carsten, Sloan William T., Cordero Otto X., Brown Sam P., Momeni Babak, Shou Wenying, et al. 2016. “Challenges in Microbial Ecology: Building Predictive Understanding of Community Function and Dynamics.” The ISME Journal 10: 2557. 10.1038/ismej.2016.45 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Tropini, Carolina , Earle Kristen A., Huang Kerwyn Casey, and Sonnenburg Justin L.. 2017. “The Gut Microbiome: Connecting Spatial Organization to Function.” Cell Host & Microbe 21: 433–42. 10.1016/j.chom.2017.03.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Quinn, Robert A. , Comstock William, Zhang Tianyu, Morton James T., da Silva Ricardo, Tran Alda, Aksenov Alexander, Nothias Louis‐Felix, Wangpraseurt Daniel, Melnik Alexey V., et al. 2018. “Niche Partitioning of a Pathogenic Microbiome Driven by Chemical Gradients.” Science Advances 4: eaau1908. 10.1126/sciadv.aau1908 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Kuramitsu, Howard K. , He Xuesong, Lux Renate, Anderson Maxwell H., and Shi Wenyuan. 2007. “Interspecies Interactions within Oral Microbial Communities.” Microbiology Molecular Biology Reviews 71: 653–70. 10.1128/MMBR.00024-07 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Coyte, Katharine Z. , Schluter Jonas, and Foster Kevin R.. 2015. “The Ecology of the Microbiome: Networks, Competition, and Stability.” Science 350: 663–6. 10.1126/science.aad2602 [DOI] [PubMed] [Google Scholar]
  • 21. Friedman, Jonathan , Higgins Logan M., and Gore Jeff. 2017. “Community Structure Follows Simple Assembly Rules in Microbial Microcosms.” Nature Ecology & Evolution 1: 109. 10.1038/s41559-017-0109 [DOI] [PubMed] [Google Scholar]
  • 22. Ramirez, Kelly S. , Leff Jonathan W., Barberán Albert, Bates Scott Thomas, Betley Jason, Crowther Thomas W., Kelly Eugene F., Oldfield Emily E., Shaw E. Ashley, Steenbock Christopher, et al. 2014. “Biogeographic Patterns in Below‐Ground Diversity in New York City's Central Park are Similar to Those Observed Globally.” Proceedings of the Royal Society B: Biological Sciences 281: 20141988. 10.1098/rspb.2014.1988 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Gould, Alison L. , Zhang Vivian, Lamberti Lisa, Jones Eric W., Obadia Benjamin, Korasidis Nikolaos, Gavryushkin Alex, Carlson Jean M., Beerenwinkel Niko, and Ludington William B.. 2018. “Microbiome Interactions Shape Host Fitness.” Proceedings of the National Academy of Sciences 115: E11951–60. 10.1073/pnas.1809349115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Huttenhower, Curtis , Gevers Dirk, Knight Rob, Abubucker Sahar, Badger Jonathan H., Chinwalla Asif T., Creasy Heather H., Earl Ashlee M., FitzGerald Michael G., Fulton Robert S., et al. 2012. “Structure, Function and Diversity of the Healthy Human Microbiome.” Nature 486: 207. 10.1038/nature11234 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Vogtmann, Emily , Chen Jun, Kibriya Muhammad G., Amir Amnon, Shi Jianxin, Chen Yu, Islam Tariqul, Eunes Mahbubul, Ahmed Alauddin, Naher Jabun, et al. 2019. “Comparison of Oral Collection Methods for Studies of Microbiota.” Cancer Epidemiology and Prevention Biomarkers 28: 137–43. 10.1158/1055-9965.EPI-18-0312 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Bashan, Amir , Gibson Travis E., Friedman Jonathan, Carey Vincent J., Weiss Scott T., Hohmann Elizabeth L., and Liu Yang‐Yu. 2016. “Universality of Human Microbial Dynamics.” Nature 534: 259. 10.1038/nature18301 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Lozupone, Catherine A. , Stombaugh Jesse I., Gordon Jeffrey I., Jansson Janet K., and Knight Rob. 2012. “Diversity, Stability and Resilience of the Human Gut Microbiota.” Nature 489: 220. 10.1038/nature11550 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Gibbons, Sean M. , Kearney Sean M., Smillie Chris S., and Alm Eric J.. 2017. “Two Dynamic Regimes in the Human Gut Microbiome.” PLOS Computational Biology 13: 1–20. 10.1371/journal.pcbi.1005364 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. LeCun, Yann , Bengio Yoshua, and Hinton Geoffrey. 2015. “Deep Learning.” Nature 521: 436. 10.1038/nature14539 [DOI] [PubMed] [Google Scholar]
  • 30. Goodfellow, Ian , Bengio Yoshua, and Courville Aaron. 2016. Deep Learning. Cambridge, MA, USA: MIT Press. 10.1007/s10710-017-9314-z [DOI] [Google Scholar]
  • 31. He, Kaiming , Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. “Deep Residual Learning for Image Recognition.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–8. Las Vegas, NV, USA. 10.1109/cvpr.2016.90 [DOI] [Google Scholar]
  • 32. Lin, Hongzhou , and Jegelka Stefanie. 2018. “ResNet with One‐Neuron Hidden Layers is a Universal Approximator.” In Advances in Neural Information Processing Systems, 6169–78. Montreal, Canada: Curran Associates Inc. https://hdl.handle.net/1721.1/129326 [Google Scholar]
  • 33. Chen, Tian Qi , Rubanova Yulia, Bettencourt Jesse, and Duvenaud David K.. 2018. “Neural Ordinary Differential Equations.” In Advances in Neural Information Processing Systems, 6571–83. Montreal, Canada: Curran Associates Inc. https://dl.acm.org/doi/abs/10.5555/3327757.3327764 [Google Scholar]
  • 34. Martins, Andre , and Astudillo Ramon. 2016. “From Softmax to Sparsemax: A Sparse Model of Attention and Multi‐Label Classification.” In International Conference on Machine Learning, 1614–23. New York, New York, USA: PMLR. https://dl.acm.org/doi/10.5555/3045390.3045561 [Google Scholar]
  • 35. Hofbauer, Josef , and Sigmund Karl. 1988. The Theory of Evolution and Dynamical Systems: Mathematical Aspects of Selection (7, London Mathematical Society Students Text). https://searchworks.stanford.edu/view/1317720 [Google Scholar]
  • 36. Legendre, Pierre and Legendre Louis. eds. 2012. “Numerical Ecology.” In Developments in Environmental Modelling (3rd English edition, 24). Amsterdam: Elsevier. https://www.elsevier.com/books/numerical-ecology/legendre/978-0-444-53868-0 [Google Scholar]
  • 37. Nichol, Alex , Achiam Joshua, and Schulman John. 2018. “On First‐Order Meta‐Learning Algorithms.” arXiv preprint arXiv:1803.02999. https://arxiv.org/abs/1803.02999
  • 38. Case, T. J. 2000. An Illustrated Guide to Theoretical Ecology. Oxford: Oxford University Press. [DOI] [Google Scholar]
  • 39. Moore, John C. , de Ruiter Peter C., Hunt H. William, Coleman David C., and Freckman Diana W.. 1996. “Microcosms and Soil Ecology: Critical Linkages Between Fields Studies and Modelling Food Webs.” Ecology 77: 694–705. 10.2307/2265494 [DOI] [Google Scholar]
  • 40. Dam, Phuongan , Fonseca Luis L., Konstantinidis Konstantinos T., and Voit Eberhard O.. 2016. “Dynamic Models of the Complex Microbial Metapopulation of Lake Mendota.” NPJ Systems Biology and Applications 2: 1–7. 10.1038/npjsba.2016.7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Stein, Richard R. , Bucci Vanni, Toussaint Nora C., Buffie Charlie G., Rätsch Gunnar, Pamer Eric G., Sander Chris, and Xavier Joao B.. 2013. “Ecological Modeling from Time‐series Inference: Insight into Dynamics and Stability of Intestinal Microbiota.” PLOS Computational Biology 9: e1003388. 10.1371/journal.pcbi.1003388 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Rao, Chitong , Coyte Katharine Z., Bainter Wayne, Geha Raif S., Martin Camilia R., and Rakoff‐Nahoum Seth. 2021. “Multikingdom Ecological Drivers of Microbiota Assembly in Preterm Infants.” Nature 591: 633–8. 10.1038/s41586-021-03241-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Laverock, Bonnie , Smith Cindy J., Tait Karen, Osborn A. Mark, Widdicombe Steve, and Gilbert Jack A.. 2010. “Bioturbating Shrimp Alter the Structure and Diversity of Bacterial Communities in Coastal Marine Sediments.” The ISME Journal 4: 1531–44. 10.1038/ismej.2010.86 [DOI] [PubMed] [Google Scholar]
  • 44. Wu, Gary D. , Chen Jun, Hoffmann Christian, Bittinger Kyle, Chen Ying‐Yu, Keilbaugh Sue A., Bewtra Meenakshi, Knights Dan, Walters William A., Knight Rob, et al. 2011. “Linking Long‐Term Dietary Patterns with Gut Microbial Enterotypes.” Science 334: 105–108. 10.1126/science.1208344 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Mounier, Jérôme , Monnet Christophe, Vallaeys Tatiana, Arditi Roger, Sarthou Anne‐Sophie, Hélias Arnaud, and Irlinger Françoise. 2008. “Microbial Interactions Within a Cheese Microbial Community.” Applied and Environmental Microbiology 74: 172–81. 10.1128/AEM.01338-07 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Venturelli, Ophelia S. , Carr Alex V., Fisher Garth, Hsu Ryan H., Lau Rebecca, Bowen Benjamin P., Hromada Susan, Northen Trent, and Arkin Adam P.. 2018. “Deciphering Microbial Interactions in Synthetic Human Gut Microbiome Communities.” Molecular Systems Biology 14: e8157. 10.15252/msb.20178157 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Aitchison, John . 1994. “Principles of Compositional Data Analysis.” Lecture Notes‐Monograph Series 73–81. 10.1214/lnms/1215463786 [DOI] [Google Scholar]
  • 48. Cao, Hong‐Tai , Gibson Travis E., Bashan Amir, and Liu Yang‐Yu. 2017. “Inferring Human Microbial Dynamics from Temporal Metagenomics Data: Pitfalls and Lessons.” BioEssays 39: 1600188. 10.1002/bies.201600188 [DOI] [PubMed] [Google Scholar]
  • 49. Maynard, Daniel S. , Miller Zachary R., and Allesina Stefano. 2020. “Predicting Coexistence in Experimental Ecological Communities.” Nature Ecology & Evolution 4: 91–100. 10.1038/s41559-019-1059-z [DOI] [PubMed] [Google Scholar]
  • 50. Reiman, Derek , Metwally Ahmed, and Dai Yang. 2017. “Using Convolutional Neural Networks to Explore the Microbiome in Engineering in Medicine and Biology Society (EMBC).” In 2017 39th Annual International Conference of the IEEE, 4269–72. Jeju Island, South Korea: Curran Associates Inc. 10.1109/EMBC.2017.8037799 [DOI] [PubMed] [Google Scholar]
  • 51. García‐Jiménez, Beatriz , Muñoz Jorge, Cabello Sara, Medina Joaquín, and Wilkinson Mark D.. 2021. “Predicting Microbiomes Through a Deep Latent Space.” Bioinformatics 37(10): 1444–51. 10.1093/bioinformatics/btaa971 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Metwally, Ahmed A. , Yu Philip S., Reiman Derek, Dai Yang, Finn Patricia W., and Perkins David L.. 2019. “Utilizing Longitudinal Microbiome Taxonomic Profiles to Predict Food Allergy via Long Short‐Term Memory Networks.” PLOS Computational Biology 15: e1006693. 10.1371/journal.pcbi.1006693 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Sharma, Divya , Paterson Andrew D., and Xu Wei. 2020. “TaxoNN: Ensemble of Neural Networks on Stratified Microbiome Data for Disease Prediction.” Bioinformatics 36: 4544–50. 10.1093/bioinformatics/btaa542 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Galkin, Fedor , Mamoshina Polina, Aliper Alex, Putin Evgeny, Moskalev Vladimir, Gladyshev Vadim N., and Zhavoronkov Alex. 2020. “Human Gut Microbiome Aging Clock Based on Taxonomic Profiling and Deep Learning.” iScience 23: 101199. 10.1016/j.isci.2020.101199 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Asgari, Ehsaneddin , Garakani Kiavash, McHardy Alice C., and Mofrad Mohammad R. K.. 2018. “MicroPheno: Predicting Environments and Host Phenotypes from 16S rRNA Gene Sequencing Using a K‐mer Based Representation of Shallow Sub‐Samples.” Bioinformatics 34: i32–42. 10.1093/bioinformatics/bty652 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Zhu, Qiang , Jiang Xingpeng, Zhu Qing, Pan Min, and He Tingting. 2019. “Graph Embedding Deep Learning Guides Microbial Biomarkers’ Identification.” Frontiers in Genetics 10: 1182. 10.3389/fgene.2019.01182 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Le, Vuong , Quinn Thomas P., Tran Truyen, and Venkatesh Svetha. 2020. “Deep in the Bowel: Highly Interpretable Neural Encoder‐Decoder Networks Predict Gut Metabolites from Gut Microbiome.” BMC Genomics 21: 1–15. 10.1186/s12864-020-6652-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. LaPierre, Nathan , Ju Chelsea J.‐T., Zhou Guangyu, and Wang Wei. 2019. “MetaPheno: A Critical Evaluation of Deep Learning and Machine Learning in Metagenome‐Based Disease Prediction.” Methods 166: 74–82. 10.1016/j.ymeth.2019.03.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Yazdani, Mehrdad , Taylor Bryn C., Debelius Justine W., Li Weizhong, Knight Rob, and Smarr Larry. 2016. “Using Machine Learning to Identify Major Shifts in Human Gut Microbiome Protein Family Abundance in Disease.” In IEEE International Conference on BigData 1272–80. 10.1109/BigData.2016.7840731 [DOI] [Google Scholar]
  • 60. Espinoza, J. Luis . 2018. “Machine Learning for Tackling Microbiota Data and Infection Complications in Immunocompromised Patients with Cancer.” Journal of Internal Medicine 284: 189–92. 10.1111/joim.12746 [DOI] [PubMed] [Google Scholar]
  • 61. Larsen, Peter E. , Field Dawn, and Gilbert Jack A.. 2012. “Predicting Bacterial Community Assemblages Using an Artificial Neural Network Approach.” Nature Methods 9: 621. 10.1038/nmeth.1975 [DOI] [PubMed] [Google Scholar]
  • 62. Zhou, Guangyu , Jiang Jyun‐Yu, Ju Chelsea J.‐T., and Wang Wei. 2019. “Prediction of Microbial Communities for Urban Metagenomics Using Neural Network Approach.” Human Genomics 13: 47. 10.1186/s40246-019-0224-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Zhang, Han , Gao Xi, Unterman Jacob, and Arodz Tom. 2019. “Approximation Capabilities of Neural Ordinary Differential Equations.” arXiv preprint arXiv:1907.12998. https://arxiv.org/abs/1907.12998
  • 64. Dupont, Emilien , Doucet Arnaud, and Teh Yee Whye. 2019. “Augmented Neural ODEs.” arXiv preprint arXiv:1904.01681. https://arxiv.org/abs/1904.01681
  • 65. Evans, Steven N. , and Matsen Frederick A.. 2012. “The Phylogenetic Kantorovich‐Rubinstein Metric for Environmental Sequence Samples.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 74: 569–92. 10.1111/j.1467-9868.2011.01018.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66. Peyré, Gabriel , and Cuturi Marco. 2019. “Computational Optimal Transport.” Foundations and Trendső in Machine Learning 11: 355–607. 10.1561/2200000073 [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information

IMT2-1-e3-s001.pdf (1.6MB, pdf)

Data Availability Statement

The data and code used in this study are available at https://github.com/michel-mata/cNODE.jl.


Articles from iMeta are provided here courtesy of Wiley

RESOURCES