Table Two:
Feature-based, data reduction and machine-learning methods used to compare microbiome characteristics in epidemiologic studies
Technique | Description | Examples |
---|---|---|
Feature-based approaches | Use individual features (e.g. taxa) to examine associations with outcome of interest | |
Microbiome-wide association testing | Test for an association between each feature and the outcome of interest, correcting for multiple testing burden or otherwise exploiting information from the study to limit false positives | Many specific packages, such as Aldex2, DeSeq2, LefSe, MASLIN |
Candidate microbe approach | Select for key features using either prior-knowledge or a data driven approach | Build networks and identify features of interest based on centrality metrics (Network hubs); select features based on prior knowledge |
Data/dimension reduction approaches | Summarize all features into a smaller number of either continuous or categorical variables; examine associations of these summary variables with outcome of interest | |
Ecological or diversity measures | Alpha diversity: Measures of the number species present (richness) and relative abundance of species present Beta diversity: Measures of similarity between microbiome composition among samples | Alpha diversity: Chao1, Faith’s phylogenetic diversity index, Simpson index, Shannon index, Rao’s quadratic entropy Beta diversity: Bray-Curtis, Euclidean distance, Jaccard, unifrac distance |
Ordination methods | Order samples characterized by feature elements such that similar samples are grouped close together, and samples with dissimilar features are grouped further away.\ | Nonmetric multidimensional scaling (NMDS); principal components analysis (PCA); principal coordinates analysis (PCoA), Correspondence analysis (CA) |
Clustering methods | Cluster samples together based on their features using a variety of methods | Community state typing; Hierarchical clustering |
Machine learning approaches for feature-based classification | Split data into training and testing sets, and refine predictive models for classifying samples based on features | Support vector machines; random forest; k-nearest neighbors; neural networks |