. Author manuscript; available in PMC: 2022 Sep 1.

Published in final edited form as: Curr Epidemiol Rep. 2021 Feb 23;8:143–150. doi: 10.1007/s40471-021-00263-8

Table Two:

Feature-based, data reduction and machine-learning methods used to compare microbiome characteristics in epidemiologic studies

Technique	Description	Examples
Feature-based approaches	Use individual features (e.g. taxa) to examine associations with outcome of interest
Microbiome-wide association testing	Test for an association between each feature and the outcome of interest, correcting for multiple testing burden or otherwise exploiting information from the study to limit false positives	Many specific packages, such as Aldex2, DeSeq2, LefSe, MASLIN
Candidate microbe approach	Select for key features using either prior-knowledge or a data driven approach	Build networks and identify features of interest based on centrality metrics (Network hubs); select features based on prior knowledge
Data/dimension reduction approaches	Summarize all features into a smaller number of either continuous or categorical variables; examine associations of these summary variables with outcome of interest
Ecological or diversity measures	Alpha diversity: Measures of the number species present (richness) and relative abundance of species present Beta diversity: Measures of similarity between microbiome composition among samples	Alpha diversity: Chao1, Faith’s phylogenetic diversity index, Simpson index, Shannon index, Rao’s quadratic entropy Beta diversity: Bray-Curtis, Euclidean distance, Jaccard, unifrac distance
Ordination methods	Order samples characterized by feature elements such that similar samples are grouped close together, and samples with dissimilar features are grouped further away.\	Nonmetric multidimensional scaling (NMDS); principal components analysis (PCA); principal coordinates analysis (PCoA), Correspondence analysis (CA)
Clustering methods	Cluster samples together based on their features using a variety of methods	Community state typing; Hierarchical clustering
Machine learning approaches for feature-based classification	Split data into training and testing sets, and refine predictive models for classifying samples based on features	Support vector machines; random forest; k-nearest neighbors; neural networks