Skip to main content
. Author manuscript; available in PMC: 2022 Sep 1.
Published in final edited form as: Curr Epidemiol Rep. 2021 Feb 23;8:143–150. doi: 10.1007/s40471-021-00263-8

Table Two:

Feature-based, data reduction and machine-learning methods used to compare microbiome characteristics in epidemiologic studies

Technique Description Examples
Feature-based approaches Use individual features (e.g. taxa) to examine associations with outcome of interest
 Microbiome-wide association testing Test for an association between each feature and the outcome of interest, correcting for multiple testing burden or otherwise exploiting information from the study to limit false positives Many specific packages, such as Aldex2, DeSeq2, LefSe, MASLIN
 Candidate microbe approach Select for key features using either prior-knowledge or a data driven approach Build networks and identify features of interest based on centrality metrics (Network hubs); select features based on prior knowledge
Data/dimension reduction approaches Summarize all features into a smaller number of either continuous or categorical variables; examine associations of these summary variables with outcome of interest
 Ecological or diversity measures Alpha diversity: Measures of the number species present (richness) and relative abundance of species present Beta diversity: Measures of similarity between microbiome composition among samples Alpha diversity: Chao1, Faith’s phylogenetic diversity index, Simpson index, Shannon index, Rao’s quadratic entropy Beta diversity: Bray-Curtis, Euclidean distance, Jaccard, unifrac distance
 Ordination methods Order samples characterized by feature elements such that similar samples are grouped close together, and samples with dissimilar features are grouped further away.\ Nonmetric multidimensional scaling (NMDS); principal components analysis (PCA); principal coordinates analysis (PCoA), Correspondence analysis (CA)
 Clustering methods Cluster samples together based on their features using a variety of methods Community state typing; Hierarchical clustering
Machine learning approaches for feature-based classification Split data into training and testing sets, and refine predictive models for classifying samples based on features Support vector machines; random forest; k-nearest neighbors; neural networks