Crohn’s Disease (CD) |
BISCUIT cohort (Hansen et al., 2012; Pascal et al., 2017), CD n = 20, Controls n = 20, Validation Cohort RISK cohort (Gevers et al., 2014). |
Shotgun metagenomics data and 16S rRNA gene data. |
Classify pediatric CD patients by disease state and treatment response. |
Random forest. |
Douglas et al., 2018 |
Colorectal Cancer (CRC) |
Patients with CRC S0 n = 27, Patients with CRC SIII/IV n = 54. Healthycontrols n = 127. |
Shotgun metagenomics data (Species, KO genes, Metabolite profiles). |
Classification of CRC patients according to cancer stage. |
Feature selection by LASSO. Random forest. |
Yachida et al., 2019 |
Colorectal Cancer (CRC) |
Fecal CRC metagenomes: n = 38 previously published, n = 22 new. Control n = 60. |
Feature selection by LASSO. Features: IGC gene abundances. |
Predict taxonomic and functional microbiome CRC signatures. |
Feature selection by LASSO.: Random forest. |
Wirbel et al., 2019 |
Colorectal cancer (CRC) |
Stool: Controls n = 62, CRC n = 69, Polyps n = 23. Swabs: Controls n = 25, CRC n = 45, Polyps n = 21. |
Log-ratio transformed values of OTUs present in at least 5% of individuals. |
Development of an oral and fecal microbiota classifier that distinguish individuals with CRC and adenomas from controls. |
Feature selection by LASSO. Random forest. |
Flemer et al., 2017 |
Colorectal cancer (CRC) |
Cohort 1: CRC n = 29, adenomas n = 27, controls = 24. Cohort 2: CRC n = 32, Control = 28. Validation Datasets: CRC n = 313, Adenomas n = 143, Controls = 308. |
Taxonomic species-level abundances, gene-family and pathways related abundances. |
Finding of reproducible microbiome markers and disease-predictive models for CRC. |
Supervised Learning Methods: Random forest. |
Thomas et al., 2019 |
Colorectal cancer (CRC) |
Previously published data from France, Hong Kong and Austria. France (Zeller et al., 2014). |
Shotgun metagenomics, FASTA. |
Discovery of biomarkers from WGS that could be used to build a machine learning classifier for CRC prediction. |
Supervised Learning Methods: Random forest. Neural network. Support vector machine. |
Koohi-Moghadam et al., 2019 |
Abnormal cases vs. Controls |
Controls n = 383. Abnormal Cases: Type 2 diabetes n = 170, Rheumatoid arthritis n = 130, Liver cirrhosis n = 123. |
Shotgun metagenomics. |
Develop a pipeline to address the challenging characterization of multilabel samples from type 2 diabetes, rheumatoid arthritis, and liver cirrhosis. |
Logistic Regression. |
Wu et al., 2018) |
Bacterial Vaginosis (BV) |
Dataset 1: Asymptomatic BV-:299. Asymptomatic BV + :97 Dataset 2: Asymptomatic BV-:6. Asymptomatic BV + :214. |
OTU tables from 16S rRNA gene data. |
Establishing microbial signatures in bacterial vaginosis (BV). |
Logistic Regression, Genetic Programming, and Random Forest. |
Beck and Foster, 2015 |
Colorectal cancer (CRC) |
n = 30 Controls n = 30 CRC patients from Previously published datasets from Austria (n = 57 health ycontrols, n = 46 CRC patients) and China (n = 53 healthy controls and 75 CRC patients) (Feng et al., 2015; Purcell et al., 2017). |
Shotgun Metagenomics data (mOTU, MGS, Methaphlan species) Gene counts. |
Identify cohort-specific non-invasive biomarkers to be used in diagnosis of CRC. |
Weka “CfsSubsetEval” + Boruta algorithm for feature selection. RF with 33 genes and 20 taxonomic markers. |
Gupta et al., 2019 |
Obesity |
Data from 10 previously published studies (n = 2.786 subjects) (Turnbaugh et al., 2006; Wu et al., 2011; Human Microbiome Project Consortium, 2012; Zupancic et al., 2012; Escobar et al., 2014; Goodrich et al., 2014; Schubert et al., 2014; Ross et al., 2015; Zeevi et al., 2015; Baxter et al., 2016). |
OTU tables from 16S rRNA gene data. |
Predict obesity status on the basis of the microbial composition of the microbiome. |
Random Forest. |
Sze and Schloss, 2016 |
Pediatric irritable bowel syndrome (IBS) |
n = 23 IBS patients n = 22 Healthy Controls. |
Shotgun metagenomics, Gene Counts and pathways, Metabolomics. |
Evaluate the relationship between pediatric IBS and abdominal pain with intestinal microbes and fecal metabolites. |
RF LASSO feature selection SVM naïve Bayes. |
Hollister et al., 2019 |
Gastrointestinal symptoms in healthy humans |
n = 21 volunteers after probiotics consumption for 60 days. |
16S rRNA gene data. |
Establish which of the gut microbes respond to probiotics interventions. |
Correlation-based network analysis. Dimensionality reduction. |
Seo et al., 2017 |
Chron’s Disease |
Chron’s Disease dataset: n = 731 Pediatric patients with CD n = 628 Non-CD. n = 300 healthy controls from HMP (Turnbaugh et al., 2007). |
16S rRNA gene data. |
Use of deep learning methods and classic machine learning approaches for distinguishing among human body sites, diagnosis of Crohn’s disease, and predicting the environments from representative 16S gene sequences. |
RF, SVM, Deep Learning. |
Asgari et al., 2019 |
Inflammatory Bowel Disease (IBD) and esophagus diseases |
n = 3501 samples from different datasets (Costello et al., 2009; Knights et al., 2011). |
16S rRNA gene data. |
Classification of metagenomic data using Neural Networks approaches. |
Neural Networks. Comparison with supervised ML methods (Linear regression, Boosting gradients, SVM, RF). |
Lo and Marculescu, 2019 |
Islet autoimmunity (IA) and Type 1 Diabetes (T1D). |
n = 10,913 metagenomes in stool samples from persistent confirmed IA or T1D vs controls. (TEDDY cohort) (Hagopian et al., 2011). |
Shotgun metagenomics. Gene count. |
Describe the functional profile of the developing gut microbiome in relation to islet autoimmunity, T1D and other early childhood events. |
RF to separate between case-controls. |
Vatanen et al., 2018 |
Irritable Bowel Syndrome |
71 samples from 22 children with IBS (pediatric Rome III criteria) and 22 healthy children. |
16S rRNA gene data. |
Finding microbial signatures for Irritable Bowel Syndrome. |
Random Forest. |
Saulnier et al., 2011 |
Sclerosing cholangitis |
46 controls and 80 patients with PSC during ERC (37 with early disease, 32 with advanced disease, and 11 with biliary dysplasia). |
16S rRNA gene data. |
Explore the microbial involvement in the etiopathogenesis and risk for development of biliary neoplasia in primary sclerosing cholangitis. |
Generalized linear models. |
Pereira et al., 2017 |
Allergy |
Skin microbiota samples from 118 individuals. |
16S rRNA gene data. |
Analyzing atopic sensitization (i.e., allergic disposition) in a random sample of adolescents. |
Linear and logistic regression, and PCA. |
Hanski et al., 2012 |
Liver disease |
FINRISK population cohort (Borodulin et al., 2018). |
Shallow shotgun metagenome sequencing. |
Study the link between the Fatty Liver Index (FLI) and gut microbiome composition in a population sample in Finland. |
Gradient boosting. |
Ruuskanen et al., 2020 |
Liver disease |
A large population-based cohort (N ≥ 7,115) and ∼15 years of electronic health register follow-up of the FINRISK population cohort (Borodulin et al., 2018). |
Shallow shotgun metagenome sequencing. |
Investigate the predictive ability of gut microbial markers in conjunction with conventional risk factors, for incident liver disease and alcoholic liver disease. |
Gradient boosting. |
Liu et al., 2020 |
Serum lipids |
Healthy Finnish adults (n = 25, 18 females, 7 males). |
16S rRNA gene data. |
Evaluate the association between the gut microbiome and lipid profile. |
Linear models, unsupervised hierarchical clustering. |
Lahti et al., 2013 |
IBD (Crohn’s disease, Ulcerative Colitis, collagenous colitis) vs healthy |
Three publicly available human metagenomics data sets as Use Cases (Turnbaugh et al., 2009; Koren et al., 2013; Halfvarson et al., 2017). |
OTU tables. |
Predicting gut microbiome functional role. |
Supervised Learning method comparison. |
Wassan et al., 2018a |
Obesity |
267 children aged 7–18 years from the American Gut Project (McDonald et al.). |
16S rRNA gene data. |
Composition of gut microbiota and its associations with BMI level, weight change and lifestyle. |
Linear decomposition model. |
Bai et al., 2019 |
Postmortem Changes |
144 sample swabs were from 21 cadavers. |
16S rRNA gene data. |
Use of necrobiome data in the prediction of the Postmortem interval. |
Regression. |
Johnson et al., 2016 |