MetAML 2016 [64] |
sp. rel. ab. or strain markers |
Parameter sweep for 4 classifiers (SVM, RF, Lasso, ENet) with 3 feature selection methods (RF n most important, Lasso, ENet) |
0.96 SVM |
0.91 SVM |
0.76 SVM |
0.66 SVM |
Foundational cross-validation test data and framework; first parameter sweep of metagenome disease prediction from off-the-shelf ML models |
PopPhy-CNN 2020 [65] |
OTU rel. ab. |
PhyloT tree construction; populated with input OTU rel. ab.; transformed to 2D matrix; CNN with ELU |
0.95 |
N/A |
0.69 |
0.67 |
CNN with spatial quantitative relationship in input taxonomy data; novel alg for selecting most important features from first convolutional layer |
Met2Img 2018 [66] |
sp. or genus rel. ab. |
Rel. ab. binned, colored, and visualized with Fill-up or t-SNE; 24x24 px (or smaller) images input into CNN with ReLU |
0.91 Fillup SPB |
0.87 Fillup SPB |
0.68 tSNE QTF |
0.69 tSNE SPB |
Colored pixel visualization for microbiome profile; explores 3 binning methods (PR, QTF, SPB) with color and gray colormaps |
MicroPheno 2018 [67] |
16S raw seqs |
Find subsample size for stable k-mer profile; find best k; input k-mers to DNN (MLP w/ ReLU), RF, or multi-class linear SVM |
N/A |
N/A |
N/A |
N/A |
16S sequences; k-mer distribution from shallow sub-samples outperformed OTU features; first 16S deep learning metagenome-phenotype exploration |
MetaPheno 2019 [68] |
sp. rel. ab. or raw seqs |
Jelly-fish k-mer counts; identify sig. k-mers with cohort p-values; apply hyper-parameter grid search models |
N/A |
N/A |
0.76 gcF, k-mer |
0.65 gcF, rel. ab. |
Review of current methods; compares features: k-mers and rel. ab. with classifiers: SVM, RF, XGBoost, gcForest, AE-pretained DNN (AutoNN) |
DeepMicro 2020 [69] |
sp. rel. ab. or strain markers |
Low-dimensional profile representation from autoencoder; input into MLP with ReLU or hyper-parameter grid SVM or RF |
0.94 SVM CAE |
0.96 SVM SAE |
0.76 MLP CAE |
0.67 RF DAE |
4 autoencoders (shallow, deep, variational, convolutional) to reduce microbiome dimension; combines with MLP, SVM, and RF param. sweep |
MVIB 2021 [70] |
sp. rel. ab. and strain markers |
MLP for each modality (rel. ab., strain marker, metabolomics); Information Bottleneck theory to learn joint stochastic encoding |
0.93 D |
0.94 J;T |
0.76 J;T |
0.67 D |
Combine multiple heterogeneous data modalities; explore default and joint pre-processing (D,J); optional triple margin loss extension (T) |