Figure 5. Microbiota composition distinguishes health from disease.
(A) Principle component analysis (PCA) on abundance data yields poor separation of CD and control samples; the ellipses contain 95% of the probabilities for control and CD samples, centering at the corresponding centroids. (B) Maximal mutual information component analysis (MMICA) on log-abundance data yields a much better separation of CD and control samples; the ellipses contain 95% of the probabilities for control and CD samples, centering at the corresponding centroids. The difference between the distances to the centroids is statistically significant; see Figure S3. (C) The first MMIC trained on RISK cohort can classify both RISK and PIBD-CC samples. For RISK data the curve is the averages over 5-fold cross validation. See also Figure S3.