Skip to main content
. Author manuscript; available in PMC: 2024 Jun 6.
Published in final edited form as: Nat Microbiol. 2024 May 2;9(6):1555–1565. doi: 10.1038/s41564-024-01680-3

Extended Data Figure 8: Cross-validation accuracy for classifying total bacterial load in fecal samples from MSKCC allo-HCT recipients.

Extended Data Figure 8:

A schematic illustration of our classification model is enclosed in the dashed box. Our model uses two threshold parameters, θo and θt, to convert the oral bacterial fraction and total bacterial load into binary categories, respectively. Given training data, we optimized the two cutoff parameters by minimizing the P value of Fisher’s exact test of independence, subject to the constraints θo10-4,1 and θt103,1010. The optimized θo. was subsequently applied to predict high or low bacterial loads in the test set by comparing the oral bacterial fractions to θo. Simultaneously, we binarized the observed bacterial loads by comparing them to θt. Accuracy was assessed by comparing the predicted bacterial load categories to the observed bacterial load categories. In the boxplot, each dot corresponds to a single 5-fold cross-validation split and the random train-test split was repeated 50 times. Box plots represent the median, 25th and 75th percentiles and whiskers represent the 95th and 5th percentiles.