Skip to main content
. Author manuscript; available in PMC: 2023 Jun 21.
Published in final edited form as: J Mol Biol. 2022 Jun 28;434(15):167693. doi: 10.1016/j.jmb.2022.167693

Table 1. Average prediction accuracies on testing data from 12 random training-validation-testing splits, by using different methods for categorical covariates (T2D study).

Acronyms: Logistic - logistic regression with elastic net penalty using original data; SVM - support vector machine classifier using original data; RF - random forest classifier using original data; MLP multi-layer perceptron using original data; MB-simCLR - logistic regression model with elastic net penalty using microbiome embeddings learned from unsupervised contrastive learning; MB-SupCon + Logistic - logistic regression model with elastic net penalty using microbiome embeddings learned from supervised contrastive learning. MB-SupCon + SVM: support vector machine classifier using microbiome embeddings learned from supervised contrastive learning; MB-SupCon + RF: random forest classifier using microbiome embeddings learned from supervised contrastive learning; MB-SupCon + MLP: multi-layer perceptron using microbiome embeddings learned from supervised contrastive learning; Avg. Acc. based on MB-SupCon: average accuracies among MB-SupCon + Logistic, MB-SupCon + SVM, MB-SupCon + RF and MB-SupCon + MLP.

Prediction
Task
Logistic SVM RF MLP MB-simCLR
Insulin resistance 76.69% 79.46% 83.93% 83.73% 65.67%
Sex 65.61% 69.02% 80.38% 78.94% 59.85%
Race 72.99% 72.17% 77.90% 75.60% 68.38%
 
Prediction
Task
MB-SupCon
+ Logistic
MB-SupCon
+ SVM
MB-SupCon
+ RF
MB-SupCon
+ MLP
Avg. Acc.
based on
MB-SupCon
Insulin resistance 84.42% 85.12% 84.62% 84.33% 84.62%
Sex 78.94% 79.02% 79.24% 78.71% 78.98%
Race 80.73% 80.36% 79.91% 79.17% 80.04%