Skip to main content
. 2021 Feb 19;12:634511. doi: 10.3389/fmicb.2021.634511

TABLE 2.

Clinical Applications of Machine Learning for human microbiome studies.

Disease Datasets Features Aim Method Citation
Crohn’s Disease (CD) BISCUIT cohort (Hansen et al., 2012; Pascal et al., 2017), CD n = 20, Controls n = 20, Validation Cohort RISK cohort (Gevers et al., 2014). Shotgun metagenomics data and 16S rRNA gene data. Classify pediatric CD patients by disease state and treatment response. Random forest. Douglas et al., 2018
Colorectal Cancer (CRC) Patients with CRC S0 n = 27, Patients with CRC SIII/IV n = 54. Healthycontrols n = 127. Shotgun metagenomics data (Species, KO genes, Metabolite profiles). Classification of CRC patients according to cancer stage. Feature selection by LASSO. Random forest. Yachida et al., 2019
Colorectal Cancer (CRC) Fecal CRC metagenomes: n = 38 previously published, n = 22 new. Control n = 60. Feature selection by LASSO. Features: IGC gene abundances. Predict taxonomic and functional microbiome CRC signatures. Feature selection by LASSO.: Random forest. Wirbel et al., 2019
Colorectal cancer (CRC) Stool: Controls n = 62, CRC n = 69, Polyps n = 23. Swabs: Controls n = 25, CRC n = 45, Polyps n = 21. Log-ratio transformed values of OTUs present in at least 5% of individuals. Development of an oral and fecal microbiota classifier that distinguish individuals with CRC and adenomas from controls. Feature selection by LASSO. Random forest. Flemer et al., 2017
Colorectal cancer (CRC) Cohort 1: CRC n = 29, adenomas n = 27, controls = 24. Cohort 2: CRC n = 32, Control = 28. Validation Datasets: CRC n = 313, Adenomas n = 143, Controls = 308. Taxonomic species-level abundances, gene-family and pathways related abundances. Finding of reproducible microbiome markers and disease-predictive models for CRC. Supervised Learning Methods: Random forest. Thomas et al., 2019
Colorectal cancer (CRC) Previously published data from France, Hong Kong and Austria. France (Zeller et al., 2014). Shotgun metagenomics, FASTA. Discovery of biomarkers from WGS that could be used to build a machine learning classifier for CRC prediction. Supervised Learning Methods: Random forest. Neural network. Support vector machine. Koohi-Moghadam et al., 2019
Abnormal cases vs. Controls Controls n = 383. Abnormal Cases: Type 2 diabetes n = 170, Rheumatoid arthritis n = 130, Liver cirrhosis n = 123. Shotgun metagenomics. Develop a pipeline to address the challenging characterization of multilabel samples from type 2 diabetes, rheumatoid arthritis, and liver cirrhosis. Logistic Regression. Wu et al., 2018)
Bacterial Vaginosis (BV) Dataset 1: Asymptomatic BV-:299. Asymptomatic BV + :97 Dataset 2: Asymptomatic BV-:6. Asymptomatic BV + :214. OTU tables from 16S rRNA gene data. Establishing microbial signatures in bacterial vaginosis (BV). Logistic Regression, Genetic Programming, and Random Forest. Beck and Foster, 2015
Colorectal cancer (CRC) n = 30 Controls n = 30 CRC patients from Previously published datasets from Austria (n = 57 health ycontrols, n = 46 CRC patients) and China (n = 53 healthy controls and 75 CRC patients) (Feng et al., 2015; Purcell et al., 2017). Shotgun Metagenomics data (mOTU, MGS, Methaphlan species) Gene counts. Identify cohort-specific non-invasive biomarkers to be used in diagnosis of CRC. Weka “CfsSubsetEval” + Boruta algorithm for feature selection. RF with 33 genes and 20 taxonomic markers. Gupta et al., 2019
Obesity Data from 10 previously published studies (n = 2.786 subjects) (Turnbaugh et al., 2006; Wu et al., 2011; Human Microbiome Project Consortium, 2012; Zupancic et al., 2012; Escobar et al., 2014; Goodrich et al., 2014; Schubert et al., 2014; Ross et al., 2015; Zeevi et al., 2015; Baxter et al., 2016). OTU tables from 16S rRNA gene data. Predict obesity status on the basis of the microbial composition of the microbiome. Random Forest. Sze and Schloss, 2016
Pediatric irritable bowel syndrome (IBS) n = 23 IBS patients n = 22 Healthy Controls. Shotgun metagenomics, Gene Counts and pathways, Metabolomics. Evaluate the relationship between pediatric IBS and abdominal pain with intestinal microbes and fecal metabolites. RF LASSO feature selection SVM naïve Bayes. Hollister et al., 2019
Gastrointestinal symptoms in healthy humans n = 21 volunteers after probiotics consumption for 60 days. 16S rRNA gene data. Establish which of the gut microbes respond to probiotics interventions. Correlation-based network analysis. Dimensionality reduction. Seo et al., 2017
Chron’s Disease Chron’s Disease dataset: n = 731 Pediatric patients with CD n = 628 Non-CD. n = 300 healthy controls from HMP (Turnbaugh et al., 2007). 16S rRNA gene data. Use of deep learning methods and classic machine learning approaches for distinguishing among human body sites, diagnosis of Crohn’s disease, and predicting the environments from representative 16S gene sequences. RF, SVM, Deep Learning. Asgari et al., 2019
Inflammatory Bowel Disease (IBD) and esophagus diseases n = 3501 samples from different datasets (Costello et al., 2009; Knights et al., 2011). 16S rRNA gene data. Classification of metagenomic data using Neural Networks approaches. Neural Networks. Comparison with supervised ML methods (Linear regression, Boosting gradients, SVM, RF). Lo and Marculescu, 2019
Islet autoimmunity (IA) and Type 1 Diabetes (T1D). n = 10,913 metagenomes in stool samples from persistent confirmed IA or T1D vs controls. (TEDDY cohort) (Hagopian et al., 2011). Shotgun metagenomics. Gene count. Describe the functional profile of the developing gut microbiome in relation to islet autoimmunity, T1D and other early childhood events. RF to separate between case-controls. Vatanen et al., 2018
Irritable Bowel Syndrome 71 samples from 22 children with IBS (pediatric Rome III criteria) and 22 healthy children. 16S rRNA gene data. Finding microbial signatures for Irritable Bowel Syndrome. Random Forest. Saulnier et al., 2011
Sclerosing cholangitis 46 controls and 80 patients with PSC during ERC (37 with early disease, 32 with advanced disease, and 11 with biliary dysplasia). 16S rRNA gene data. Explore the microbial involvement in the etiopathogenesis and risk for development of biliary neoplasia in primary sclerosing cholangitis. Generalized linear models. Pereira et al., 2017
Allergy Skin microbiota samples from 118 individuals. 16S rRNA gene data. Analyzing atopic sensitization (i.e., allergic disposition) in a random sample of adolescents. Linear and logistic regression, and PCA. Hanski et al., 2012
Liver disease FINRISK population cohort (Borodulin et al., 2018). Shallow shotgun metagenome sequencing. Study the link between the Fatty Liver Index (FLI) and gut microbiome composition in a population sample in Finland. Gradient boosting. Ruuskanen et al., 2020
Liver disease A large population-based cohort (N ≥ 7,115) and ∼15 years of electronic health register follow-up of the FINRISK population cohort (Borodulin et al., 2018). Shallow shotgun metagenome sequencing. Investigate the predictive ability of gut microbial markers in conjunction with conventional risk factors, for incident liver disease and alcoholic liver disease. Gradient boosting. Liu et al., 2020
Serum lipids Healthy Finnish adults (n = 25, 18 females, 7 males). 16S rRNA gene data. Evaluate the association between the gut microbiome and lipid profile. Linear models, unsupervised hierarchical clustering. Lahti et al., 2013
IBD (Crohn’s disease, Ulcerative Colitis, collagenous colitis) vs healthy Three publicly available human metagenomics data sets as Use Cases (Turnbaugh et al., 2009; Koren et al., 2013; Halfvarson et al., 2017). OTU tables. Predicting gut microbiome functional role. Supervised Learning method comparison. Wassan et al., 2018a
Obesity 267 children aged 7–18 years from the American Gut Project (McDonald et al.). 16S rRNA gene data. Composition of gut microbiota and its associations with BMI level, weight change and lifestyle. Linear decomposition model. Bai et al., 2019
Postmortem Changes 144 sample swabs were from 21 cadavers. 16S rRNA gene data. Use of necrobiome data in the prediction of the Postmortem interval. Regression. Johnson et al., 2016