Skip to main content
. Author manuscript; available in PMC: 2021 May 4.
Published in final edited form as: Nature. 2020 Nov 4;587(7834):448–454. doi: 10.1038/s41586-020-2881-9

Extended Data Figure 1: Data processing and machine learning analysis framework.

Extended Data Figure 1:

Raw V4 16S rRNA reads were processed using dada2 and samples were filtered and selected as described in the text and Methods to form the ‘core sample population’. Balanced cohorts were constructed for each binary questionnaire variable, and Random Forests analyses were repeated 25 times over 75/25 splits. Concurrently, sample classes were randomly permuted to simulate noise and the same procedure was performed to facilitate empirical P value estimations.