Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Mar 9.
Published in final edited form as: Cell Rep. 2016 Jan 21;14(4):945–955. doi: 10.1016/j.celrep.2015.12.088

Detecting microbial dysbiosis associated with Pediatric Crohn’s disease despite the high variability of the gut microbiota

Feng Wang 1,14, Jess L Kaplan 2,14, Benjamin D Gold 3, Manoj K Bhasin 4, Naomi L Ward 5, Richard Kellermayer 6, Barbara S Kirschner 7, Melvin B Heyman 8, Scot E Dowd 9, Stephen B Cox 9, Haluk Dogan 10, Blaire Steven 5, George D Ferry 6, Stanley A Cohen 3, Robert N Baldassano 11, Christopher J Moran 2, Elizabeth A Garnett 8, Lauren Drake 2, Hasan H Otu 10, Leonid A Mirny 12, Towia A Libermann 4, Harland S Winter 2,15,*, Kirill Korolev 1,13,15,*
PMCID: PMC4740235  NIHMSID: NIHMS747819  PMID: 26804920

SUMMARY

The relationship between the host and its microbiota is challenging to understand because both microbial communities and their environment are highly variable. We developed a set of techniques to address this challenge based on population dynamics and information theory. These methods identified additional bacterial taxa associated with pediatric Crohn's disease and could detect significant changes in microbial communities with fewer samples than previous statistical approaches. We also substantially improved the accuracy of the diagnosis based on the microbiota from stool samples and found that the ecological niche of a microbe predicts its role in Crohn’s disease. Bacteria typically residing in the lumen of healthy patients decrease in disease while bacteria typically residing on the mucosa of healthy patients increase in disease. Our results also show that the associations with Crohn’s disease are evolutionarily conserved and provide a mutual-information-based method to visualize dysbiosis.

INTRODUCTION

Hosts rely on microbiota for the digestion of food (Breznak and Brune, 1994), vitamin biosynthesis (Turnbaugh et al., 2007), behavioral responses (Cryan and Dinan, 2012) and protection from pathogens, (Buffie et al, 2012), and other functions (Stefka et al., 2014). The host-microbe relationship, however, can turn awry due to a simple infection, changes in nutrition, or a more nuanced dysbiosis. Microbial dysbiosis has been implicated in many human diseases including diabetes, autism, and obesity. A particularly strong relationship between disease and microbiota exists for Crohn’s disease (CD) and ulcerative colitis (UC), the two major subtypes of inflammatory bowel disease (IBD), (Mazmanian et al., 2008; Greenblum et al., 2012; Manichanh et al., 2012) characterized by chronic inflammation of the gastrointestinal tract, which causes significant morbidity and can lead to colorectal cancer or death (Card et al, 2003). With more than 1.4 million people affected in the United States (CCFA, 2015), IBD poses an urgent challenge to understand the link between microbiota and human health.

The development of IBD depends on a diverse set of factors including life style (Bernstein and Shanahan, 2008), environment (Danese et al., 2004), and genetic predisposition (Jostins et al., 2012). Gut microbes also contribute to IBD: Deviations from the microbial composition of the healthy human gut have been detected in both long-standing and newly-diagnosed IBD patients (Gevers et al., 2014; Papa et al., 2012). Mouse studies have demonstrated that microbes are required for and precede IBD onset (Kim et al., 2007; Overstreet et al., 2010) and microbiome-derived compounds can ameliorate chronic intestinal inflammation (Furusawa et al., 2013). Given the substantial role of microbes in the disease, we need to carefully characterize the changes in the microbiota that accompany IBD, particularly in new- or early-onset disease This information can improve IBD diagnostics, identify disease subtypes, elucidate the mechanisms of IBD onset and progression, and uncover novel therapeutic strategies.

Although 16S rRNA and metagenomic sequencing provide a detailed view of the gut microbiota, translating these data into clinical insights has been difficult (De Cruz et al., 2012). The analysis is often complicated by the extreme variability of the microbial abundances across both patients and species. As a result, commonly used statistical approaches may overlook important changes associated with IBD and fail to translate these changes into useful predictions. Here we present a set of methods to identify changes in gut microbial composition associated with a disease and use them to diagnose CD based on an individual’s microbiota. The performance of these methods was evaluated on two data sets: the previously interrogated RISK cohort, the most comprehensive data set of treatment-naïve pediatric CD (Gevers et al., 2014), and an independently obtained Pediatric Inflammatory Bowel Disease Consortium Cohort (PIBD-CC), which similarly includes only treatment-naïve pediatric IBD patients and controls (see Experimental Procedures and Supplementary Tables 1 and 2). Our methods had a substantially higher statistical power and could find disease-associated microbes with fewer samples compared to more commonly used statistical approaches.

In addition to the development and validation of the improved approaches to the statistical analysis and visualization of microbial communities, we report several important biomedical findings. Both CD and healthy microbiota showed a power-law distribution of taxa abundance, indicating that the vast majority of taxa are rare, including those associated with the disease. The subject-to-subject variation of microbial abundance was also extreme and posed a significant challenge to standard statistical methods. Despite of this high variation, we identified additional taxa associated with CD and found that the phylogenetic trees of CD-associated and health-associated bacteria do not overlap, suggesting that factors promoting health or disease have distinct evolutionary history. We also found that microbes preferentially associated with the ileal mucosa in healthy people proliferate in the stool of CD patients, while bacteria more prevalent in the stool of healthy people tend to decrease in abundance in CD patients. This observation allowed us to develop a diagnostic tool based on non-invasively collected stool samples. Contrary to the previous analysis of the RISK cohort (Gevers et al., 2014), we found that both stool and ileal mucosal samples have equal predictive power.

RESULTS AND DISCUSSIONS

Here we focus on two independent cohorts of CD patients and non-IBD controls: PIBD-CC and RISK. Both cohorts were mostly pediatric (ages 2–20), balanced with respect to gender, race, and other factors (see Supplement), and contained only newly diagnosed and treatment-naïve CD subjects. The PIBD-CC contained only ileal mucosa samples, while RISK had samples from both stool and ileal mucosa. For both cohorts, the compositions of bacterial communities in mucosal and stool samples were obtained via DNA extraction followed by 16S rRNA gene sequencing and processing with the Quantitative Insights Into Microbial Ecology (QIIME) software (Caporaso et al., 2010); see Experimental Procedures and (Gevers et al., 2014) for further details. The sizes and sequencing depth of the two cohorts were very different. RISK is larger with over 700 patients and ~30,000 mean number of reads per sample. In contrast, PIBD-CC had only 87 patients with the mean number of reads per samples of only ~3,000. These order of magnitude variations in sample sizes and sequencing depths span the spectrum of microbiome research and illustrate the performance of our statistical approaches in different settings: from a pilot study to a large nation-wide effort.

High variability in microbial abundances

Host-associated microbial communities are highly variable (Figure 1A). The first aspect of this variability is the power law distribution of relative abundances of different taxa (Figure 1B). This power-law variability is observed in both health and CD as well as in both microbiota obtained from a single subject and averaged across the cohort. Not only do different taxa have abundances that vary by orders of magnitude, but also the number of taxa grows as their abundance declines, so most taxa are rare. While the more abundant taxa are probably more important for gut health, even a rare microbe can trigger chronic inflammation or dysbiosis (Powell et al., 2012) Thus, analysis should be able to handle many rare taxa and integrate changes in abundances across taxa with different prevalence. The second aspect of this high variability is the dramatic subject-to-subject variation in the abundance of a single taxon observed in both healthy subjects and patients (Figure 1C). The abundance of a given genus typically varies by more than two orders of magnitude among individuals, even for highly abundant microbes like Bacteriodes and Roseburia. Deep sequencing of a large number of samples is an expensive and time-consuming way to overcome the high variability in species abundance. Moreover, large sample sizes may not be available for rare or emergent diseases. Hence, methods that can manage with both small sample sizes and high variability are needed to analyze changes in microbial communities.

Figure 1. High variability of bacterial abundances in the human gut microbiota.

Figure 1

(A) Abundance variation across all genera detected in PIBD-CC data sets. Genera are ranked by their mean relative abundance in controls. (B) Mean genera abundances are distributed according to a power law. (C) Rank-abundance distributions are shown for 3 typical genera in controls. The high subject-to-subject variability (2–3 orders of magnitude) is typical for other genera, other phylogenetic levels, and in CD.

Log-abundance is a less variable metric than abundance

Our main observation in Figure 1 is that the variation of species abundances is better captured on a logarithmic rather than linear scale. Indeed, variations in diet, immune pressure, and other aspects of host-microbe interactions affect microbial composition, for example, by changing the growth rates of the different bacterial species (Caballero, 2015). The randomness associated with growth rates is known to break the assumptions of the central limit theorem in statistics and even prevent the convergence of the sample mean to the true population mean (Redner, 1989). These difficulties can be resolved with a simple log-transformation of the data; instead of describing each taxonomic unit by its average abundance, as it is commonly done in the literature (Claesson et al., 2011; Wang et al., 2014), we first computed logarithm of the relative abundances in each sample and then averaged them over the samples (see Experimental Procedures).

Log-transformation resolved many of the complications due to the high variability of the gut microbiota (Figure 2) and the artifacts of the compositional bias due to the conversion of sequence counts into relative abundances (Friedman and Alm, 2012). We found that when abundances are used to detect diseases-associated taxa, large variation made it hard to detect any significant association between a taxon and the disease. When log-abundances were used instead, several significant associations were detected as determined by low p-values for permutation tests of association (Figure 2).

Figure 2. Log-transformation reduces the variability and helps detect significant changes in abundance between control and CD.

Figure 2

(A) The scatter plot shows the mean abundances of all genera in control versus CD in PIBD-CC. The purple symbols correspond to the largest changes in the mean abundance between control and CD. (B) The same as (A) but for mean log-abundance. Note the dramatic reduction in the deviation from the diagonal compared to (A). The green symbols label the largest changes in the mean log-abundance. (C, D) The statistical significance of outliers in (A) and (B) is evaluated in (C) and (D). The large difference in mean abundance for Clostridium XIX is not statistically significant, while the highly significant association of Roseburia is only detected by the mean log-abundance.

While log-transformations have become standard in other areas of bioinformatics (Quackenbush, 2002), they are not universally used in microbiome research (Gevers et al., 2014; White et al., 2009; Huse et al., 2014). Although a few microbiome studies have incorporated log-transformations in their analysis (Hong et al., 2006), untransformed data or non-parametric statistical tests are predominantly used to detect associations (Le Chatelier et al., 2013; David et al., 2014; Qin et al., 2014; Wang et al., 2014). Analysis of untransformed data suffers from the extreme variation in the microbiome abundances while rank-based, non-parametric methods discard some of the available information and lose statistical power. Indeed, small changes in abundance typically result in large changes in rank when relative abundances have a fat-tailed distribution (Huse et al., 2014).

Comparison of methods to detect associations

We compared the performance of log-transforms and other commonly used techniques, which can be divided into four classes. The first class includes mean abundance, mean log-abundance, and median. These statistics represent the distribution of relative abundances found in a particular group of subjects by a single number. The second class contains methods aiming to estimate the actual distributions of relative abundances in each of the subject groups and then quantify the differences between these distributions. Such methods include Kolmogorov-Smirnov statistic, Kullback-Leibler divergence, L2-norm distance, and mutual information between the distribution and diagnosis (Reza, 2012; Experimental Procedures). The third class is based on the regression of the diagnosis on the abundance of a given taxon, and we examined the linear regression on arcsine-square-root transformed abundances (Gevers et al., 2014). Finally, the fourth class consists of the non-parametric methods based on the differences in the ranks of taxa across subject groups. Here, we examined the Wilcoxon rank-sum statistic (Le Chatelier et al., 2013) commonly used in ecological literature (Hoekstra et al., 2001). The UniFrac statistic (Lozupone and Knight, 2005) was not considered since it is primarily based on the presence and absence of evolutionary distant taxa, while no major taxa losses or gains have been observed in the data (Figure 1A).

For many of the aforementioned statistics, there are approximate methods to estimate their significance. These methods, however, rely on assumptions that may not hold for the highly variable data on microbiota composition. To avoid the associated biases in comparing the different methods, we subjected all statistics to the permutation test, which is an exact statistical test of association (Experimental Procedures). Moreover, we analyzed different phylogenetic levels separately as not to bias false discovery rate (FDR) estimates by the correlations among higher and lower phylogenetic ranks. The results of our comparison of different methods to detect associations in the PIBD-CC data set are shown in Figure 3A. Surprisingly, the mean log-difference—one of the simplest tests—detected more orders, families, and genera associated with CD than any other method. Since all of the evaluated methods relied on the same assumptions about the data and had the same false discovery rate, the higher number of detected associations faithfully represents the higher statistical power of a test to distinguish signal from noise.

Figure 3. Statistical methods differ is their ability to detect association.

Figure 3

(A) The number of significantly associated orders (FDR=0.05; permutation test) is shown for different statistical tests and different number of orders tested. Orders were tested in the order of their abundance. Initially, the number of detected orders increases with the number of tests because true positives are more likely to be included in the set of orders tested, but it eventually declines because the threshold for statistical significance increases with the number of tests. MA: mean abundance. MEA: median abundance. MLA: mean log-abundance. AS: arcsine-square-root.

(B) The maximal number of associations detected by a procedure illustrated in (A) is shown for three phylogenetic levels (O: order, F: family, G: genus); mean log-abundance outperforms other methods.

(C) The procedure illustrated in (A) was applied to subsamples from the RISK data. The mean log abundance outperforms other methods for all sample sizes; see Figure S1 for statistical significance. Error bars are standard deviations from 10 sub-samplings. The power of the detected associations at discriminating control from CD samples and further details are shown in Figure S1.

To test if the advantage of the mean log-abundance method is robust, we repeatedly subsampled the ileal mucosa samples from the larger RISK cohort at various sample sizes ranging from 50 to 350 with equal number of control and CD and obtained the average number of associations detected with the FDR corrected q-value lower than 0.05. This analysis (Figure 3C) shows that the mean-log difference identified more taxa across all sample sizes. A comparable number of associations was also detected using the arcsine-square-root transform, which is similar to the log-transform because both discount the contribution of samples with exceedingly large relative abundance. Nevertheless, the number of associations detected by our method was significantly higher for most sample sizes tested (Figure S1). Importantly, none of the methods approached saturation in the number of identified taxa as the amount of data increased. This indicates that larger studies may uncover additional taxa associated with CD and provide deeper understanding of this disease. We also note that the association of taxa with the disease status is highly heterogeneous across subjects (Figure S1), a finding that parallels the recent discovery of two microbiome clusters in CD patients: one similar to and one different from the typical control microbiome (Lewis et al., 2015).

The associations detected by the mean log-abundance method in RISK cohort are shown in Figure 4. In total, we identified 15 orders, 26 families, 31 genera, and 20 species associated with CD; many of them not found by previously used methods (Gevers et al., 2014). In agreement with the previous studies, some of the strongest associations were with the Lachnospiraceae family, a core and ancient member of the commensal microbiota and Pasteurellaceae family, which contains many human pathogens (De Cruz et al., 2012). The additional associations found in RISK data are also unlikely to be spurious because many of these taxa were previously identified in other IBD cohorts or are otherwise known to contribute to disease. For example, the association of Staphylococcus with CD is known from a different IBD cohort (Nguyen et al., 2009); Turicibacter is less abundant in the fecal samples from dogs with IBD (Suchodolski et al., 2012); Eikenella can cause periodontitis and other infections in the oral cavity (Aas et al., 2005); and some strains of Enterobacteriaceae thrive in the presence of inflammation and outcompete the healthy microbiota in mice (Garrett et al., 2010). Of special interest are the two commensal species Bacteroides fragilis and Fecalibacterium prausnitzii that suppress inflammation and have received a lot of attention in IBD literature (De Cruz et al., 2012). Although we found a strong association between health and F. prausnitzii, no significant effect of B. fragilis could be inferred from the data.

Figure 4. Significant associations in the RISK cohort.

Figure 4

The phylogenetic tree of associated taxa (detected by the mean log-abundance). Note that the phylogenetic trees formed by the health-associated and disease-associated bacteria have little overlap, suggesting deep evolutionary roots of traits related to health and disease. New findings compared to (Gevers et al., 2014) are shaded. The findings from stool samples are shown in Figure S2.

Many of the taxa enriched in controls are known to provide important functions for gut health. For example, Roseburia, Blautia, and F. prausnitzii produce butyrate which acts as an energy source for epithelial cells in the gut (Duncan, 2002). Supplementing patients with such bacteria or the metabolites they produce could then be a potential intervention strategy (Furusawa et al., 2013). In contrast, a majority of the taxa enriched in CD are known to be opportunistic pathogens (Mukhopadhya et al., 2012). Collectively, these previous studies suggest that some of the CD-associated bacteria are deleterious to the host while some of the health-associated bacteria are beneficial to the host. However, further experimental studies that can establish causality are required to determine the specific roles of the associations reported in Figure 4.

The increase in the number of pathogenic bacteria in CD may reflect the overall decline of gut health. We tested this hypothesis by looking for a link between active inflammation and bacterial composition using the mean log-abundance method, but failed to detect any significant associations in agreement with other studies (De Cruz et al., 2012). The types of pathogens established in the gut could affect disease progression beyond inflammation, a hypothesis worthy of further investigation.

Association congruence across phylogenetic ranks in ileal microbiome

Phylogenetic structure of associations is rarely discussed in microbiome research. Most studies focus on order or family-level associations because the large number of genera and species increases the number of hypotheses tested and reduces the statistical power. However, to link compositional changes in microbiota to human health, we need to understand the relationship between ecological functions and phylogenetic distance better.

Typically, related organisms have similar genomes and occupy similar ecological niches. Yet, studies of parasites, symbionts, and commensals have shown that closely related species could have dramatically different life styles and effects on the host (Siddall et al., 1993; Moran et al., 2008). We found that, when a higher phylogenetic level is more abundant in CD, then lower level associations are more abundant in CD as well (Figure 4). A similar pattern of phylogenetic congruence is also observed for the taxa decreased in CD. In fact, only 4 out of 92 associations do not follow this rule. Although these exceptions can be attributable to the 5% false discovery rate, they might also indicate particularly interesting microbial dynamics associated with the disease.

Phylogenetic congruence could be driven by conserved ecological traits or by a single lower-level taxon contributing to the association at the higher level. Consistent with the first mechanism, the strong association between CD and Enterobacteriaceae results from the weak associations of several genera in this family (Figure 4). The second mechanism could explain the association between CD and Bifidobacteriaceae family, which seems to be primarily driven by a single species Bifidobacterium adolescentis.

Given these patterns of associations, there is likely an optimal phylogenetic level for microbiome analysis. Strain and species level may be too idiosyncratic since the behavior of genetically very similar organisms could be very different while order and family level may be too coarse and miss important ecological functions present only in a specific genus. Here we found that the genus level yields not only the largest number of associations, but also better patient classification compared to order and species levels (Figure S1).

Classification on ileal samples

Diagnostics is an important application of microbiota association studies in CD; so we attempted to classify patients in RISK and PIBD-CC as CD or controls based only on the composition of their gut microbial communities. None of the commonly used unsupervised clustering methods could provide an acceptable classification of the data (see Experimental Procedures). Similarly, principle component analysis (PCA) based on microbial abundances of either all or only significantly associated genera could not differentiate CD and control patients (Figure 5A) consistent with the earlier analysis by Gevers et al. (2014). To improve on PCA performance, we implemented a supervised projection method by maximizing the mutual information between the linear combination of log-abundances of significantly associated bacterial taxa and diagnosis (see Experimental Procedures). This technique, termed maximal mutual information component analysis (MMICA) effectively filters out large patient-to-patient variation in microbiome composition due to numerous factors unrelated to the disease, and focuses on the few variables indicative of CD. MMICA showed dramatic improvement over PCA for both RISK (Figure 5B) and PIBD-CC (Figure S3). In particular, the first two components contained more than half of the information on the diagnosis (44% and 9% of maximally possible 0.98 bits). MMICA was also a significant improvement over a single genus analysis. Roseburia, the most informative genus, explained only 16.5% of the information compared to the 44% explained by the first maximal mutual information component (MMIC). Thus, MMICA significantly increases the diagnostic information contained in a single metric, capturing the major difference between controls and CD.

Figure 5. Microbiota composition distinguishes health from disease.

Figure 5

(A) Principle component analysis (PCA) on abundance data yields poor separation of CD and control samples; the ellipses contain 95% of the probabilities for control and CD samples, centering at the corresponding centroids. (B) Maximal mutual information component analysis (MMICA) on log-abundance data yields a much better separation of CD and control samples; the ellipses contain 95% of the probabilities for control and CD samples, centering at the corresponding centroids. The difference between the distances to the centroids is statistically significant; see Figure S3. (C) The first MMIC trained on RISK cohort can classify both RISK and PIBD-CC samples. For RISK data the curve is the averages over 5-fold cross validation. See also Figure S3.

We inspected the contribution of each genus to the MMICs and found that the first MMIC was primarily comprised of Roseburia, Turicibacter, Blautia, and Holdemania. All four genera were decreased in CD, so the first MMIC was primarily negatively correlated to CD (Figure 5B, Figure S3). The second MMIC contained bacteria both decrease and increase in CD and was mainly in the direction of Dorea, Erwinia, and Actinobacillus. The probability distribution along this second MMIC was bimodal for CD and unimodal for control patients. Most variance based methods would not be able to take advantage of such differences in the distribution suggesting that information-based approaches such as MMICA could have an important advantage for microbiota studies.

In addition to being a powerful visualization tool, MMICA was also able to classify samples as CD or control based on their microbial composition. We found that a simple classifier that uses the projection on first MMIC as a microbial dysbiosis index yields an AUC of 0.84 (Figure 5C). To test the power of this method, the MMIC-based classifier was trained on the RISK cohort and applied to the PIBD-CC data set. Despite the fact that PIBD-CC samples were independently collected, processed using different protocols, and sequenced at much lower depth, the classifier achieved a high AUC of 0.71 suggesting that the first MMIC is a robust indicator of dysbiosis that could reach a sufficient power for clinical applications.

Classification on stool samples

Stool samples hold the key to non-invasive diagnostics of IBD, but two recent studies reached opposite conclusions on the feasibility of predicting IBD from stool microbiota. On the one hand,Papa et al. (2012) found stool samples to be very predictive with an AUC of 0.83, but their cohort contained patients undergoing treatment for established disease and thus could have a much larger difference in the microbial composition between CD and control patients due to past and current medication use and prolonged inflammation. On the other hand,Gevers et al. (2014) found that, for treatment-naïve children, stool samples were a poor predictor of the diagnosis yielding an AUC of only 0.66.

The stool samples from the RISK cohort were reanalyzed by performing a log-transformation and identifying genera significantly associated with CD (Figure S2). We then trained an SVM classifier on the log-abundances of the significant genera and found a substantially higher mean AUC of 0.72 (95% CI: 0.663–0.770), which is approaching clinically useful values (Figure 6A). This higher value of AUC (0.72 vs 0.66) underscores the advantage of our methods for small to medium data sets with high variability. Although our method improved the classification power of stool samples in RISK, we still found that ileal samples were more informative (AUC of 0.84 vs 0.72), in agreement with the RISK analysis (Gevers et al., 2014).

Figure 6. The classification power of stool and ileal samples.

Figure 6

(A) Mean AUCs of classifiers developed in this study,Gevers et al. (2014), and Papa et al. (2012), based on all available ileal or stool samples. (B) SVM classifiers based on a subgroup of subjects with both ileal and stool samples in the RISK cohort. Ileal and stool samples had similar discriminating power, while their combination further increased performance.

The above comparison, however, is not entirely valid because there were fewer stool than ileal samples in the RISK cohort and, unlike the ileal samples (CD to controls ratio of 254 to 187), the stool samples are very imbalanced (CD to controls ratio of 187 to 31). One way to make the comparison more fair is to focus only on patients that have both stool and ileal samples because that would make the training and test data the same for both stool and ileal based classifiers. In addition, this approach allows one to test whether stool and ileal samples contain equivalent or complementary information.

The RISK cohort had 74 patients (14 controls and 60 CDs) with both stool and ileal samples. Since the sample size was too small to detect a sufficient number of significant associations and find MMICs, we used all 31 significant genera found in ileal samples (shown in Figure 4) as features in an SVM classifier (Experimental Procedures). In contrast to the RISK-wide analysis above (Gevers et al., 2014), we found that stool samples contain comparable information to ileal samples (AUC of 0.84 (95% CI: 0.747–0.911) vs 0.81 (95% CI: 0.654–0.925)); see Figure 6B. This observation agrees with previous findings in patients with established and treated CD (Papa et al., 2012) and raises the possibility that stool samples may actually aid in the initial diagnosis of CD. We also found that an SVM classifier trained on both ileal and stool samples had an AUC of 0.94 (95% CI: 0.908–0.979) (Figure 6B); however, with the limited number of samples it is premature to say whether stool and ileal microbiota contain complementary information or the increase in AUC simply came from doubling the number of features in the training data.

Differential enrichments of health and CD associated microbes in ileum and stool

The subset of patients with both ileal and stool samples described above allowed us to examine the relationship between ileal and stool microbiota further. Specifically, we asked whether bacterial genera have similar abundances in ileal mucosa and the stool of CD patients (Figure 7A and 7B). On average, the relative abundances of bacteria in stool and ileal samples were similar, but there was a striking difference between bacteria increased and decreased in CD (Figure 4). Bacteria associated with health were more abundant in the stool than in the ileum in CD patients, while bacteria associated with CD were preferentially found in the ileum.

Figure 7. Stool and ileum dwelling bacteria have distinct contribution to IBD.

Figure 7

(A) Mean log-abundance of top 50 abundant genera in ileal and stool microbiota community for CD patients. Note that health-associated bacteria are more abundant in stool than ileum (above the diagonal), while CD-associated bacteria are more abundant in ileum (below the diagonal). (B) The same as (A), but the beginning and end of the arrows show the mean log-abundance in control and CD respectively; thus, the arrows show the shift in the community associated with CD. Note that stool and ileal abundances change about equally. (C) The preference of a genus for ileal vs. stool habitat is strongly correlated with the change in its abundance between the stool of CD and control patients.

One possible interpretation of these data is that some of the bacteria decreased in CD perform essential digestive functions in the gut and therefore are primarily found in the stool while opportunistic pathogens primarily colonize the mucosa and trigger IBD when not controlled by the immune system. Microbial dysbiosis may then be driven further by mucosal oxygenation, which disrupts the normal intestinal oxygen gradient (Albenberg et al., 2014). This hypothesis would suggest that bacteria's role in CD can be predicted by its abundance in the ileal mucosa and stool of healthy patients: Bacteria more abundant in stool should decrease in CD and bacteria abundant in the mucosa should increase. Indeed, we found a very strong correlation (R2=0.62, p=1.3e-11) between the difference of stool and mucosal abundances in controls and the changes in the stool of CD patients relative to controls (Figure 7C). Consistent with our findings, previous studies have found that stool microbiota consist of two distinct components: one shed from the mucosa and a separate non-adherent luminal population (Eckburg et al., 2005). Our results further suggest that these different components may play a distinct role in IBD.

CONCLUSIONS

Broad abundance distribution of different taxa and high patient-to-patient variability challenge existing statistical tools to detect microbial associations with disease. We found that performing statistical analysis on the logarithm of relative abundances makes patterns of microbiota changes more clear and robust. For both the RISK and PIBD-CC data sets, our technique required fewer samples for similar statistical power and identified additional taxa associated with CD. Discovered associations could distinguish CD vs. non-IBD patients using a new classifier and visualization tool that identifies directions in the multidimensional space of microbial abundances with maximal information on the diagnosis. This maximal mutual information components analysis was superior to the commonly used principle component analysis and remained informative when validated on the independently obtained PIBD-CC dataset.

Our analysis indicates that health and disease associated bacteria have distinct ecology and evolutionary history. We found that bacteria increased in CD and bacteria decreased in CD formed two largely non-overlapping phylogenetic trees suggesting that factors promoting health or disease have deep evolutionary roots and are not frequently exchanged between gut bacteria. Moreover, bacteria that proliferate in CD are preferentially associated with ileal mucosa while bacteria decreased in CD reside mostly in the stool. The connection between lumen and mucosa compartments enabled patient classification using either ileal biopsies or stool samples with about equal accuracy.

Collectively, our results provide a set of statistical tools for the analysis of microbiome data, refine the link between shifts in microbial abundances and disease, and show the relevance of microbiota to the diagnosis and management of pediatric Crohn’s disease.

EXPERIMENTAL PROCEDURES

Study populations

Clinical characteristics of patients from the RISK cohort have been previously described (Gevers et al., 2014). The Pediatric Inflammatory Bowel Disease Consortium Cohort (PIBD-CC) is a previously unreported cohort of children (ages 1–17) who underwent endoscopic evaluation for IBD at 7 centers in the US (MassGeneral Hospital for Children, University of California San Francisco, Children’s Hospital of Philadelphia, University of Chicago, Texas Children’s Hospital, Children’s Center for Digestive Healthcare-Atlanta, GA, Children’s Healthcare of Atlanta (Scottish Rite and Egleston Children’s Hospital campuses) from September, 2005 until January, 2008 (Emory University HIC IRB Number 060-2002 and additional approval by local internal review boards of all participating institutions). PIBD-CC patients with terminal ileal biopsies and available clinical data (24 children with newly diagnosed, treatment-naive CD and 63 non-IBD control subjects) were included in this study. Clinical characteristics can be found in Supplementary Tables 1 and 2. The study was supported by the following grant: Role of Infectious Agents in Pediatric Crohn’s Disease; NIH – R03 DK064544 (PI, BD Gold).

Sample collection

PIBD-CC: During diagnostic ileocolonoscopy, terminal ileal mucosal biopsies were obtained using standard biopsy forceps and immediately placed in liquid nitrogen or dry ice and stored at −80°C until use.

RISK: The collection of the RISK cohort is presented in (Gevers et al., 2014). 441 ileal samples (184 control and 245 CD subjects) and 218 stool samples (31 control and 187 CD subjects) were extracted from the RISK cohort data set after filtering out the ones with antibiotic exposure for detection of taxa associated with CD. A subgroup of these patients (14 control and 60 CD) with both ileal and stool samples were used for the analysis in Figures 6 and 7.

DNA Extraction and 16S rRNA gene sequencing

PIBD-CC: Bulk DNA was extracted from samples using the Qiagen Stool DNA kit. Tag-encoded FLX amplicon pyrosequencing was performed as described (Bailey et al., 2011; Callaway et al., 2010; Finegold et al., 2010; Handl et al., 2011) using Gray28F 5’TTTGATCNTGGCTCAG and Gray519r 5’ GTNTTACNGCGGCKGCTG, with primers numbered in relation to the primary sequence of E. coli 16S rRNA (Brosius et al., 1978). Initial generation of the sequencing library utilized one-step PCR with 30 cycles, generating amplicons extending from the 28F primer with average read length of 400bp. Tag-encoded FLX amplicon pyrosequencing analyses utilized a Roche 454 FLX instrument with Titanium reagents, and Titanium procedures performed at the Research and Testing Laboratory (Lubbock, TX).

Following sequencing, all failed sequence reads, low-quality sequence ends, tags and primers were removed, and sequence collections depleted of non-bacterial ribosome sequences and those with degenerate base calls, homopolymers >5bp in length, reads <200bp and chimeras (Gontcharova et al., 2010), as described (Bailey et al., 2011; Callaway et al., 2010; Finegold et al., 2010; Handl et al., 2011).

OTU picking

PIBD-CC: We used a naive Bayes classifier with confidence cutoff=0.5 and RDP database (Cole et al., 2014) for OTUs assignments.

RISK: OTU picking was described in (Gevers et al., 2014). Briefly, OTUs were picked using closed reference OTU picking by QIIME software (Caporaso et al., 2010) and at 97% similarity against the Greengenes database (DeSantis et al., 2006).

Logarithmic transformation of relative abundance

Tables of OTU counts were transformed into relative abundances by adding a pseudocount of 1 and normalization. These were transformed into mean log abundances by first taking the natural logarithm and then averaging over the samples.

Probability distribution estimation

Kolmogorov-Smirnov statistic, Kullback-Leibler divergence, L2-norm distance and mutual information are defined on probability distribution functions, which were obtained by kernel density estimation methods (Khan et al, 2009). We used Gaussian kernels and chose their bandwidths according to Silverman’s thumb rule (Silverman, 1986).

Statistical significance

Statistical significance was evaluated by a permutation test with 106 permutations. The false discovery rate (FDR) correction at the level 5% was performed using Benjamini–Hochberg method. To avoid imposing an arbitrary abundance cutoff on taxa, we analyzed all possible cutoffs and reported the maximal number of associations, as illustrated in Figure 3A. Concretely, for each phylogenetic level, the taxa were first ranked by their mean log-abundances for both control and CD separately, and then merged in a single list according to their minimal rank in the two lists. We then performed association tests for species with ranks between 1 and k for all possible values of k and reported the maximal number of association. This procedure was applied uniformly to all methods presented in Figure 3.

Maximal mutual information component analysis (MMICA)

To find MMIC1, we obtained a linear combination of taxa log-abundances, which maximizes the mutual information about the diagnosis. The second component was also found as a linear combination maximizing the mutual information on the diagnosis, but subject to the constraint that the correlation coefficient between MMIC1 and MMIC2 equals zero. Our approach is related to a recent method developed in neuroscience (Faivishevsky and Goldberger, 2012).

Software packages and classification

Kernel density estimation of the probability distribution function, Kolmogorov-Smirnov statistic, Pearson’s r, and Wilcoxon rank-sum statistic were computed by Python package SciPy 0.14.0. Mean abundance difference/ratio and median abundance difference were computed using their definitions. PCA, all unsupervised clustering methods, and all supervised classifiers were performed using Python machine learning package scikit-learn 0.15.2 (Pedregosa et al., 2011). The supervised classifiers included logistic regression with L1 penalty, support vector machine and random forest. The best parameters for the classifiers were found by a 5-fold cross-validation. Their performances were then measured by the area under the ROC curve, which was obtained by averaging results from 5-fold cross-validations.

Supplementary Material

1
2

Acknowledgments

K.K. and F.W. were supported by the startup fund from Boston University to K.K. Some of the analysis was carried out on the Shared Computing Cluster at BU. H.S.W. was supported by philanthropic support from Martin Schlaff and the B. Hasso Family Foundation. R.K. was supported by the Houston Men of Distinction and the Gutsy Kids Fund supported by the Karen and Brock Wagner family. The authors would also like to acknowledge Antone Opekun for his work on this project. The PIBD-CC study was supported by the grant: Role of Infectious Agents in Pediatric Crohn’s Disease; NIH – R03 DK064544 (PI, BD Gold).

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCESSION NUMBERS

The raw PIBD-CC 16S rRNA sequencing data can be accessed through NCBI BioProject ID PRJNA297124.

SUPPLEMENTAL INFORMATION

Supplemental Information includes 3 figures and 2 tables.

AUTHOR CONTRIBUTIONS

B.D.G., H.S.W., M.B.H., B.S.K., G.D.F., S.A.C., R.B., M.K.B., T.A.L., L.D., E.A.G., and J.L.K. designed the study, M.K.B., H.H.O., K.K., F.W., N.L.W., B.S., S.E.D., S.B.C. and H.D. analyzed the data, T.A.L., K.K., L.A.M., F.W., J.L.K., and H.S.W. interpreted the data, F.W., J.L.K., H.S.W., and K.K. wrote the first draft, and all authors reviewed and contributed to the final draft of the manuscript.

All authors declared no conflict of interest.

REFERENCES

  1. Aas JA, Paster BJ, Stokes LN, Olsen I, Dewhirst FE. Defining the normal bacterial flora of the oral cavity. J. Clin. microbiol. 2005;43:5721–5732. doi: 10.1128/JCM.43.11.5721-5732.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Albenberg L, Esipova TV, Judge CP, Bittinger K, Chen J, Laughlin A, Grunberg S, Baldassano RN, Lewis JD, Li H, et al. Correlation between intraluminal oxygen gradient and radial partitioning of intestinal microbiota. Gastroenterology. 2014;147 doi: 10.1053/j.gastro.2014.07.020. 1055-63.e8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bailey MT, Dowd SE, Galley JD, Hufnagle AR, Allen RG, Lyte M. Exposure to a social stressor alters the structure of the intestinal microbiota: implications for stressor-induced immunomodulation. Brain Behav Immun. 2011;25:397–407. doi: 10.1016/j.bbi.2010.10.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bernstein CN, Shanahan F. Disorders of a modern lifestyle: reconciling the epidemiology of inflammatory bowel diseases. Gut. 2008;57:1185–1191. doi: 10.1136/gut.2007.122143. [DOI] [PubMed] [Google Scholar]
  5. Breznak JA, Brune A. Role of microorganisms in the digestion of lignocellulose by termites. Annu. Rev. Entomol. 1994;39:453–487. [Google Scholar]
  6. Brosius J, Palmer ML, Kennedy PJ, Noller HF. Complete nucleotide sequence of a 16S ribosomal RNA gene from Escherichia coli. Proc. Natl. Acad. Sci. USA. 1978;75:4801–4805. doi: 10.1073/pnas.75.10.4801. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Buffie CG, Jarchum I, Equinda M, Lipuma L, Gobourne A, Viale A, Ubeda C, Xavier J, Pamer EG. Profound alterations of intestinal microbiota following a single dose of clindamycin results in sustained susceptibility to clostridium difficile-induced colitis. Infect. Immun. 2012;80:62–73. doi: 10.1128/IAI.05496-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Callaway TR, Dowd SE, Edrington TS, Anderson RC, Krueger N, Bauer N, Kononoff PJ, Nisbet DJ. Evaluation of bacterial diversity in the rumen and feces of cattle fed different levels of dried distillers grains plus solubles using bacterial tag-encoded FLX amplicon pyrosequencing. J Anim Sci. 2010;88:3977–3983. doi: 10.2527/jas.2010-2900. [DOI] [PubMed] [Google Scholar]
  9. Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Peña AG, Goodrich JK, Gordon JI, et al. QIIME allows analysis of high-throughput community sequencing data. Nat. Methods. 2010;7:335–336. doi: 10.1038/nmeth.f.303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Card T, Hubbard R, Logan RF. Mortality in inflammatory bowel disease: a population-based cohort study. Gastroenterology. 2003;125:1583–1590. doi: 10.1053/j.gastro.2003.09.029. [DOI] [PubMed] [Google Scholar]
  11. CCFA. Crohn’s & Colitis Foundation of America. Facts about Inflammatory Bowel diseases. 2015 Jun; 2015. http://www.ccfa.org.
  12. Claesson MJ, Cusack S, O’Sullivan O, Greene-Diniz R, de Weerd H, Flannery E, Marchesi JR, Falush D, Dinan T, Fitzgerald G, et al. Composition, variability, and temporal stability of the intestinal microbiota of the elderly. Proc. Natl. Acad. Sci. USA. 2011;108:4586–4591. doi: 10.1073/pnas.1000097107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Cole JR, Wang Q, Fish JA, Chai B, McGarrell DM, Sun Y, Brown CT, Porras-Alfaro A, Kuske CR, Tiedje JM. Ribosomal Database Project: data and tools for high throughput rRNA analysis. Nucleic Acids Res. 2014;42:D633–D642. doi: 10.1093/nar/gkt1244. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Cryan JF, Dinan TG. Mind-altering microorganisms: the impact of the gut microbiota on brain and behaviour. Nat. Rev. Neurosci. 2012;13:701–712. doi: 10.1038/nrn3346. [DOI] [PubMed] [Google Scholar]
  15. David LA, Maurice CF, Carmody RN, Gootenberg DB, Button JE, Wolfe BE, Ling AV, Devlin AS, Varma Y, Fischbach MA, et al. Diet rapidly and reproducibly alters the human gut microbiome. Nature. 2014;505:559–563. doi: 10.1038/nature12820. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. De Cruz P, Prideaux L, Wagner J, Ng SC, McSweeney C, Kirkwood C, Morrison M, Kamm MA. Characterization of the gastrointestinal microbiota in health and inflammatory bowel disease. Inflamm. Bowel Dis. 2012;18:372–390. doi: 10.1002/ibd.21751. [DOI] [PubMed] [Google Scholar]
  17. Danese S, Sans M, Fiocchi C. Inflammatory bowel disease: the role of environmental factors. Autoimmun. Rev. 2004;3:394–400. doi: 10.1016/j.autrev.2004.03.002. [DOI] [PubMed] [Google Scholar]
  18. DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, Huber T, Dalevi D, Hu P, Andersen GL. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl. Environ. Microbiol. 2006;72:5069–5072. doi: 10.1128/AEM.03006-05. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Duncan SH. Roseburia intestinalis sp. nov., a novel saccharolytic, butyrate-producing bacterium from human faeces. Int. J. Syst. Evol. Microbiol. 2002;52:1615–1620. doi: 10.1099/00207713-52-5-1615. [DOI] [PubMed] [Google Scholar]
  20. Eckburg PB, Bik EM, Bernstein CN, Purdom E, Dethlefsen L, Sargent M, Gill SR, Nelson KE, Relman DA. Diversity of the human intestinal microbial flora. Science. 2005;308:1635–1638. doi: 10.1126/science.1110591. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Faivishevsky L, Goldberger J. Dimensionality reduction based on non-parametric mutual information. Neurocomputing. 2012;80:31–37. [Google Scholar]
  22. Finegold SM, Dowd SE, Gontcharova V, Liu C, Henley KE, Wolcott RD, Youn E, Summanen PH, Granpeesheh D, Dixon D, et al. Pyrosequencing study of fecal microflora of autistic and control children. Anaerobe. 2010;16:444–453. doi: 10.1016/j.anaerobe.2010.06.008. [DOI] [PubMed] [Google Scholar]
  23. Friedman J, Alm EJ. Inferring Correlation Networks from Genomic Survey Data. PLoS Comput. Biol. 2012;8:e1002687. doi: 10.1371/journal.pcbi.1002687. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Furusawa F, Obata Y, Fukuda S, Endo TA, Nakato G, Takahashi D, Nakanishi Y, Uetake C, Kato K, Kato T, et al. Commensal microbe-derived butyrate induces the differentiation of colonic regulatory T cells. Nature. 2013;504:446–450. doi: 10.1038/nature12721. [DOI] [PubMed] [Google Scholar]
  25. Garrett WS, Gallini CA, Yatsunenko T, Michaud M, DuBois A, Delaney ML, Punit S, Karlsson M, Bry L, Glickman JN, et al. Enterobacteriaceae act in concert with the gut microbiota to induce spontaneous and maternally transmitted colitis. Cell Host & Microbe. 2010;8:292–300. doi: 10.1016/j.chom.2010.08.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Gevers D, Kugathasan S, Denson LA, Vázquez-Baeza Y, Van Treuren W, Ren B, Schwager E, Knights D, Song SJ, Yassour M, et al. The treatment-naive microbiome in new-onset Crohn’s disease. Cell Host & Microbe. 2014;15:382–392. doi: 10.1016/j.chom.2014.02.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Gontcharova V, Youn E, Wolcott RD, Hollister EB, Gentry TJ, Dowd SE. Black Box Chimera Check (B2C2): a Windows-Based Software for Batch Depletion of Chimeras from Bacterial 16S rRNA Gene Datasets. Open Microbiol J. 2010;4:47–52. doi: 10.2174/1874285801004010047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Gradel KO, Nielsen HL, Schønheyder HC, Ejlertsen T, Kristensen B, Nielsen H. Increased short- and long-term risk of inflammatory bowel disease after Salmonella or Campylobacter gastroenteritis. Gastroenterology. 2009;137:495–501. doi: 10.1053/j.gastro.2009.04.001. [DOI] [PubMed] [Google Scholar]
  29. Greenblum S, Turnbaugh PJ, Borenstein E. Metagenomic systems biology of the human gut microbiome reveals topological shifts associated with obesity and inflammatory bowel disease. Proc. Natl. Acad. Sci. USA. 2012;109:594–599. doi: 10.1073/pnas.1116053109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Handl S, Dowd SE, Garcia-Mazcorro JF, Steiner JM, Suchodolski JS. Massive parallel 16S rRNA gene pyrosequencing reveals highly diverse fecal bacterial and fungal communities in healthy dogs and cats. FEMS Microbiol Ecol. 2011;76:301–310. doi: 10.1111/j.1574-6941.2011.01058.x. [DOI] [PubMed] [Google Scholar]
  31. Hoekstra HE, Hoekstra JM, Berrigan D, Vignieri SN, Hoang SN, Hill CE, Beerli P, Kingsolver JG. Strength and tempo of directional selection in the wild. Proc. Natl. Acad. Sci. USA. 2001;98:9157–9160. doi: 10.1073/pnas.161281098. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Hong SH, Bunge J, Jeon SO, Epstein SS. Predicting microbial species richness. Proc. Natl. Acad. Sci. USA. 2006;103:117–122. doi: 10.1073/pnas.0507245102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Huse SM, Young VB, Morrison HG, Antonopoulos DA, Kwon J, Dalal S, Arrieta R, Hubert NA, Shen L, Vineis JH, et al. Comparison of brush and biopsy sampling methods of the ileal pouch for assessment of mucosa-associated microbiota of human subjects. Microbiome. 2014;2:5. doi: 10.1186/2049-2618-2-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Jostins L, Ripke S, Weersma RK, Duerr RH, McGovern DP, Hui KY, Lee JC, Schumm LP, Sharma Y, Anderson CA, et al. International IBD Genetics Consortium (IIBDGC) Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature. 2012;491:119–124. doi: 10.1038/nature11582. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Khan S, Bandyopadhyay S, Ganguly AR, Saigal S, Erickson DJ, III, Protopopescu V, Ostrouchov G. Relative performance of mutual information estimation methods for quantifying the dependence among short and noisy data. Phys. Rev. E. 2007;76:026209. doi: 10.1103/PhysRevE.76.026209. [DOI] [PubMed] [Google Scholar]
  36. Kim SC, Tonkonogy SL, Karrasch T, Jobin C, Sartor RB. Dual-association of gnotobiotic II-10−/− mice with 2 nonpathogenic commensal bacteria induces aggressive pancolitis. Inflamm. Bowel Dis. 2007;13:1457–1466. doi: 10.1002/ibd.20246. [DOI] [PubMed] [Google Scholar]
  37. Le Chatelier E, Nielsen T, Qin J, Prifti E, Hildebrand F, Falony G, Almeida M, Arumugam M, Batto JM, Kennedy S, et al. MetaHIT Consortium. Richness of human gut microbiome correlates with metabolic markers. Nature. 2013;500:541–546. doi: 10.1038/nature12506. [DOI] [PubMed] [Google Scholar]
  38. Lewis JD, Chen EZ, Baldassano RN, Otley AR, Griffiths AM, Lee D, Bittinger K, Bailey A, Friedman ES, Hoffmann C, et al. Inflammation, antibiotics, and diet as environmental stressors of the gut microbiome in pediatric Crohn’s disease. Cell host & microbe. 2015;18(4):489–500. doi: 10.1016/j.chom.2015.09.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Lozupone C, Knight R. UniFrac: A new phylogenetic method for comparing microbial communities. Appl. Environ. Microbiol. 2005;71:8228–8235. doi: 10.1128/AEM.71.12.8228-8235.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Manichanh C, Borruel N, Casellas F, Guarner F. The gut microbiota in IBD. Nat. Rev. Gastroenterol. Hepatol. 2012;9:599–608. doi: 10.1038/nrgastro.2012.152. [DOI] [PubMed] [Google Scholar]
  41. Mazmanian SK, Round JL, Kasper DL. A microbial symbiosis factor prevents intestinal inflammatory disease. Nature. 2008;453:620–625. doi: 10.1038/nature07008. [DOI] [PubMed] [Google Scholar]
  42. Moran NA, McCutcheon JP, Nakabachi A. Genomics and evolution of heritable bacterial symbionts. Annu. Rev. Genet. 2008;42:165–190. doi: 10.1146/annurev.genet.41.110306.130119. [DOI] [PubMed] [Google Scholar]
  43. Mukhopadhya I, Hansen R, El-Omar EM, Hold GL. IBD—what role do Proteobacteria play? Nat. Rev. Gastroenterol. Hepatol. 2012;9:219–230. doi: 10.1038/nrgastro.2012.14. [DOI] [PubMed] [Google Scholar]
  44. Nguyen GC, Patel H, Chong RY. Increased prevalence of and associated mortality with methicillin-resistant Staphylococcus aureus among hospitalized IBD patients. Am. J. Gastroenterol. 2009;105:371–377. doi: 10.1038/ajg.2009.581. [DOI] [PubMed] [Google Scholar]
  45. Overstreet AMC, Ramer-Tait AE, Atherly TA, Phillips GJ, Hostetter J, Ziemer CJ, Wannemuehler MJ, Jergens A. W1829 changes in composition of the intestinal microbiota precede onset of colitis in genetically-susceptible (IL-10−/−) mice. Gastroenterology. 2010;138:748–749. [Google Scholar]
  46. Papa E, Docktor M, Smillie C, Weber S, Preheim SP, Gevers D, Giannoukos G, Ciulla D, Tabbaa D, Ingram J, et al. Non-invasive mapping of the gastrointestinal microbiota identifies children with inflammatory bowel disease. PLoS ONE. 2012;7:e39242. doi: 10.1371/journal.pone.0039242. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 2011;12:2825–2830. [Google Scholar]
  48. Powell N, Walker AW, Stolarczyk E, Canavan JB, GM-vkmen MR, Marks E, Jackson I, Hashim A, Curtis MA, Jenner RG, et al. The Transcription Factor T-bet Regulates Intestinal Inflammation Mediated by Interleukin-7 Receptor+ Innate Lymphoid Cells. Immunity. 2012;37:674–684. doi: 10.1016/j.immuni.2012.09.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Qin N, Yang F, Li A, Prifti E, Chen Y, Shao L, Guo J, Le Chatelier E, Yao J, Wu L, et al. Alterations of the human gut microbiome in liver cirrhosis. Nature. 2014;513:59–64. doi: 10.1038/nature13568. [DOI] [PubMed] [Google Scholar]
  50. Quackenbush J. Microarray data normalization and transformation. Nat. Genet. 2002;32:496–501. doi: 10.1038/ng1032. [DOI] [PubMed] [Google Scholar]
  51. Redner S. Random multiplicative processes: An elementary tutorial. Am. J. Phys. 1989;58:267–273. [Google Scholar]
  52. Reza FM. An introduction to information theory. Dover Publications; 2012. [Google Scholar]
  53. Siddall ME, Brooks DR, Desser SS. Phylogeny and the reversibility of parasitism. Evolution. 1993;47:308–313. doi: 10.1111/j.1558-5646.1993.tb01219.x. [DOI] [PubMed] [Google Scholar]
  54. Silverman BW. Density estimation for statistics and data analysis. London: Chapman and Hall/CRC; 1987. [Google Scholar]
  55. Stefka AT, Freehley T, Tripathi P, Qiu J, McCoy K, Mazmanian SK, Tjota MY, Seo G-Y, Cao S, Theriault BR, et al. Commensal bacteria protect against food allergen sensitization. Proc. Natl Acad. Sci. USA. 2014;111:13145–13150. doi: 10.1073/pnas.1412008111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Suchodolski JS, Markel ME, Garcia-Mazcorro JF, Unterer S, Heilmann RM, Dowd SE, Kachroo P, Ivanov I, Minamoto Y, Dillman EM, et al. The fecal microbiome in dogs with acute diarrhea and idiopathic inflammatory bowel disease. PLoS ONE. 2012;7:e51907. doi: 10.1371/journal.pone.0051907. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Turnbaugh PJ, Ley RE, Hamady M, Fraser-Liggett CM, Knight R, Gordon JI. The human microbiome project. Nature. 2007;449:804–810. doi: 10.1038/nature06244. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Wang J, Linnenbrink M, Künzel S, Fernandes R, Nadeau M, Rosenstiel P, Baines JF. Dietary history contributes to enterotype-like clustering and functional metagenomic content in the intestinal microbiome of wild mice. Proc. Natl Acad. Sci. USA. 2014;111:E2703–E2710. doi: 10.1073/pnas.1402342111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. White JR, Nagarajan N, Pop M. Statistical methods for detecting differentially abundant features in clinical metagenomic samples. PLoS Comput. Biol. 2009;5:e1000352. doi: 10.1371/journal.pcbi.1000352. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
2

RESOURCES