Skip to main content
. 2021 Feb 19;12:634511. doi: 10.3389/fmicb.2021.634511

TABLE 3.

Available Resources for applying ML to human microbiome studies.

Tool Name Description References
Feature Selection with the R Package MXM Includes several feature selection algorithms. In particular, the Statistically Equivalent Signatures (SES) algorithm that is very suitable for microbiome data because it scales up to high dimensions and requires few samples. It also reports “multiple biosignatures” meaning multiple, minimal-size subsets of features that lead to an equally predictive model. A more recent feature selection algorithm that scales up well to high dimensional data called Forward-Backward Selection with Early Dropping (FBED) also implemented in the MXM R package; It is preferable to SES when the sample size is higher. Lagani et al., 2017; Borboudakis and Tsamardinos, 2019
Automated Machine Learning (AutoML) with JADBio. End-to-end AutoML tool designed to deliver predictive and diagnostic models to non-experts while drastically increasing the productivity of expert analysts. Several qualifications make JADbio (www.jadbio.com) very suitable for microbiome data analysis. First, it accepts numerical measurements (e.g., abundance tables), as well as discrete predictors (e.g. experimental factors and curated metadata), and incomplete datasets with missing values. Second, it facilitates a novel out-of-sample bootstrapping protocol able to provide accurate, non-optimistic estimates of predictive performance even in cases of low sample sizes (e.g., 40) and hundreds of thousands of features Finally. It uses SES and FBED to return the corresponding biosignatures. This allows the creation of predictive models that are equally good up to statistical equivalence, thus, providing the researcher with choices when designing new cost-benefit diagnostic assays. Tsamardinos et al., 2018, 2020
Microbiome network inference with SCENERY. SCENERY is a free online application that allows users to perform several network learning tasks (scenery.csd.uoc.gr). It is the first of its kind to facilitate advanced algorithms for the inference of association networks, probabilistic causal networks and Bayesian networks. The qualifications of SCENERY have been successfully shown on the single-cell cytometry domain. At the moment, SCENERY does not treat missing values or compositionality, yet, it is readily applicable to the microbiome data domain for inferring causal or non-causal networks of microbiome molecules and species. Papoutsoglou et al., 2017
The Microbiome Modeling Toolbox Comprehensive toolbox to model (i) microbe-microbe and host-microbe metabolic interactions, and (ii) microbial communities using microbial genome-scale metabolic reconstructions and metagenomic data. Baldini et al., 2019
Constraint-based reconstruction and analysis (COBRA) Toolbox v.3.0. Software suite for quantitative prediction of cellular and multicellular biochemical networks with constraint-based modeling. Heirendt et al., 2019
Reconstruction, Analysis and Visualization of Metabolic Networks (RAVEN). RAVEN is a commonly used MATLAB toolbox for genome-scale metabolic model reconstruction, curation and constraint-based modeling and simulation. Wang et al., 2018
Fizzy: feature subset selection for metagenomics Python command line tool compatible with BIOM format, for microbial ecologists that implements information-theoretic subset selection methods for biological data formats. Ditzler et al., 2015; http://github.com/EESI/Fizzy.