Summary
Discerning the effect of pharmacological exposures on intestinal bacterial communities in cancer patients is challenging. Here, we deconvoluted the relationship between drug exposures and changes in microbial composition by developing and applying a new computational method, PARADIGM (PARameters Associated with DynamIcs of Gut Microbiota), to a large set of longitudinal fecal microbiome profiles with detailed medication-administration records from patients undergoing allogeneic hematopoietic cell transplantation. We observed that several non-antibiotic drugs, including laxatives, antiemetics, and opioids, are associated with increased Enterococcus relative abundance and decreased alpha-diversity. Shotgun metagenomic sequencing further demonstrated subspecies competition, leading to increased dominant strain genetic convergence during allo-HCT that is significantly associated with antibiotic exposures. We integrated drug-microbiome associations to predict clinical outcomes in two validation cohorts on the basis of drug exposures alone, suggesting that this approach can generate biologically and clinically relevant insights on how pharmacological exposures can perturb or preserve microbiota composition.
Graphical Abstract

Introduction
Gut microbiota perturbations have been associated with various diseases and frequently linked to environmental exposures including antibiotic use and nutritional deficiencies1,2. Non-antibiotic drugs can also contribute to intestinal microbiota changes3, but their effects in humans are less well-understood and challenging to study due to a lack of reliable drug exposure data (e.g. recall-based surveys of habitual use of chronic medications)1,4, and the absence of densely-collected longitudinal fecal samples5,6. Moreover, several pioneering studies of medication exposures and microbiome composition focused on volunteers at relatively healthy steady states4,5. Patients undergoing allogeneic hematopoietic cell transplantation (allo-HCT) exhibit major perturbations of fecal microbiome composition throughout treatment course that have been associated with increased risk of mortality7–9. These patients are exposed to a variety of drugs upon prolonged hospitalizations, during which a wealth of data is routinely gathered as part of their electronic health records. As such, this patient population, from which we have assembled a large bank of fecal specimens, presents a unique opportunity to investigate intestinal microbial responses to drug exposures in vivo.
Previous studies in these patients have largely focused on the effect of antibiotics on the intestinal microbiota8,10,11, yet many non-antibiotic drugs routinely administered during this treatment have demonstrated anti-bacterial activities in vitro3. Furthermore, the microbiome perturbations in allo-HCT patients are observed prior to administration of broad-spectrum antibiotics, suggesting a potential influence of transplant-associated cancer treatment and supportive-care medications12. Here, we inferred relationships between medications, microbiome composition, and clinical outcomes by developing, applying, and validating a new computational method we named PARADIGM (PARameters Associated with DynamIcs of Gut Microbiota) to a large dataset of 16S rRNA and shotgun metagenomic sequencing profiles of serially collected fecal samples from allo-HCT patients.
Results
Clustering captures the temporal dynamics of the intestinal microbiome during allo-HCT
The dataset consists of 9,167 fecal samples from 1,201 allo-HCT patients at Memorial Sloan Kettering Cancer Center (MSKCC; Figure 1 and Table 1). We divided the MSKCC cohort into discovery (7,454 samples; 778 patients) and validation (1,713 samples; 423 patients) cohorts (Figure 1 and Table 1). We computed the compositional differences among discovery-cohort samples using Bray-Curtis β-diversity dissimilarity indices at the genus level for 16S rRNA-sequenced samples, or at the species level for shotgun metagenomic sequenced samples and visualized the high-dimensional stool composition data via t-stochastic neighbor embedding (tSNE; Figures 2a–b).
Figure 1. Patient selection criteria for the discovery and validation cohorts.
The MSKCC discovery cohort was included in the clustering of sequencing data and PARADIGM algorithm training set. The validation cohorts were included in the analysis of clinical outcomes.
Table 1.
Patient and sample cohorts from MSKCC and Duke cohorts.
| MSKCC Discovery | MSKCC Validation | Duke Validation | |
|---|---|---|---|
| Number of patients | 778 | 423 | 142 |
| Number of patients with 16S-sequenced samples | 778 | 405 | 138 |
| Number of 16S-sequenced samples | 7,454 | 1,713 | 473 |
| Number of 16S-sequenced samples per patient, median (first-third quartile) | 7 (3 – 14) | 3 (2 – 5) | 2 (1 – 5) |
| Number of patients with at least one pair of daily 16S-sequenced samples between day −14 and 14 relative to HCT | 454 | - | - |
| Number of pairs of daily 16S-sequenced samples between day −14 and 14 relative to HCT | 2,039 | - | - |
| Number of pairs of daily 16S-sequenced samples per patient, median (first-third quartile) | 3 (1 – 7) | - | - |
| Number of patients with shotgun metagenomic samples | 340 | 142 | - |
| Number of shotgun metagenomic samples | 980 | 200 | - |
| Number of patients with drug exposures data | 775 | 423 | 142 |
Figure 2. The intestinal microbiota of allo-HCT patients is highly dynamic.

a, b, Compositional space of the intestinal microbiota in the MSKCC discovery cohort visualized by tSNE projection. Each point represents a sample, colored according to the taxon of highest relative abundance based on (a) 16S rRNA (7,454 samples; 778 patients) or (b) shotgun metagenomic sequencing profiles (980 samples; 340 patients) (p: phylum; f: family; o: order; g: genus). Samples were collected between day −30 and 2,205 relative to HCT. c, Ten clusters of intestinal microbiome compositions are assigned by k-means unsupervised clustering. d, e, Relative abundance of the top 20 most observed (d) genera in the 16S rRNA profiles and (e) species in the shotgun metagenomic profiles in the MSKCC discovery cohort. Each column is one sample, each row is one genus or species. Rows are clustered by hierarchical clustering. f, Cluster alpha-diversity (reciprocal Simpson index). The horizontal dashed line represents the median alpha-diversity of the MSKCC discovery cohort. g, Cluster relative frequency over time relative to HCT. h, Network map depicting the transitions among the ten intestinal microbiota clusters over time (5,482 pairs of subsequent samples; 677 patients; collection between day −16 and 1,084 relative to HCT). The thickness of the line is proportional to transition frequency, while the node size is proportional to the number of samples per cluster.
We observed patterns of microbiome injuries, including loss of alpha-diversity and enrichment of potentially pathogenic bacteria such as Enterococcus and Enterobacteriaceae (Figures 2a–b). As has been well-described in allo-HCT patients, these domination events can be profound, to the point of a single taxon comprising >90% of the relative abundance of a fecal sample8,10, and are predictive of specific deleterious clinical outcomes such as bloodstream infections, graft-versus-host disease (GVHD), and mortality9,10,13. A subset of 980 samples with shotgun metagenomic profiling also showed similar patterns of microbiome injuries during allo-HCT (Figure 2b). Specifically, we observed a cluster of samples whose most abundant organisms were various strict anaerobes such as Ruminococcus gnavus or Erysipelatoclostridium ramosum, as well as distinct clusters enriched for potentially pathogenic facultative species including Enterococcus faecium, Klebsiella pneumoniae, and Escherichia coli.
The reproducibly observed microbiome perturbations in allo-HCT patients offer a unique opportunity to understand dynamics and evolution of relatively distinct perturbed microbiome compositions or states under environmental exposures, in contrast to the more fluid and non-discrete microbiota in healthy populations14–18. Given the mathematical challenge of reducing dimensionality complexity while preserving bacterial community structure, we performed unsupervised k-means clustering on the Bray-Curtis β-diversity matrix of samples in the discovery set and identified ten distinct microbiota clusters (Figure 2c). We also explored other clustering approaches including hierarchical clustering (Figures S1a–c) and Dirichlet Multinomial Mixtures (Figures S1d–f)19. Since k-means clustering partitioned samples more evenly (Figure S1g), we utilized k-means clusters for our subsequent analyses.
Lachnospiraceae and Clostridiales, which constitute major commensal taxa present in the healthy human gut, were commonly observed in clusters 1–3, which were also characterized by high alpha-diversity (when compared to the median diversity of the discovery cohort) (Figures 2d–f)20. Clusters 7–10 represented low-diversity “dysbiotic” states (Figures 2d–f). Intestinal domination by a single bacterial organism (≥ 30% relative abundance) is a hallmark of severe intestinal dysbiosis10. Lactobacillus, Proteobacteria and Streptococcus were highly enriched in samples in clusters 5, 8 and 9. Clusters 7 and 10 consisted of Enterococcus-dominated samples, with cluster 10 specifically enriched for E. faecium (Figure 2d–e). These compositional clusters also captured the temporal dynamics of the intestinal microbiota during allo-HCT: high-diversity clusters 1–3 were common pre-allo-HCT, while low-diversity states, particularly clusters 7–10, were more prevalent after allo-HCT (Figure 2g).
This classification of samples into discrete microbiome states enabled us to model the complex changes in microbial communities as cluster transition probabilities. We observed the transition frequencies of consecutive samples collected at most 7 days apart between day −16 and day 1,084 relative to HCT. Patients remained in the same cluster over two consecutive samples in 2,987 (54.5%) pairs of samples (Figure 2h and Table S1). Patients were less likely to remain in high-diversity clusters 1–3 among consecutive samples (mean frequency 46.4%; SD 5.0%), compared with dominated clusters such as 8–10 which are highly stable (mean frequency 65.1%; SD 14.1%). We observed a significant and negative association between cluster alpha-diversity and self-transition probabilities, suggesting that high-diversity clusters are less stable compared to low-diversity clusters (Figure S1j). Transitions from one cluster to another were observed in 2,495 pairs of samples (45.5%).
We observed a particularly strong stability of the Enterococcus-high cluster 10, which is of interest due to the association of this genus with poor clinical outcomes following HCT7,9,10. To investigate potential drivers of Enterococcus domination stability, we developed a logistic regression model with lasso penalty analyzing cluster 10 stability as a function of parameters including antibiotic exposure, time of sample collection, alpha-diversity, and relative abundance of top 20 most abundant genera in cluster 10 (STAR Methods). We applied this model to a dataset of daily sample pairs collected between day −14 and 100 relative to HCT and found that higher relative abundances of Staphylococcus and Erysipelatoclostridium were associated with decreased cluster 10 stability (Figure S1k). On the other hand, higher relative abundance of Enterococcus was associated with increased cluster 10 stability, indicating that Enterococcus domination leads to a positive feedback loop that support its own stability. As expected, antibiotic exposure was associated with increased cluster 10 stability. Here, using real-world data, we showed that both environmental factors such as medication exposures, and ecological relationships between bacteria, contribute to microbiota community stability, specifically regarding Enterococcus domination.
Non-antibiotic exposures during allo-HCT are associated with changes in the intestinal microbiome compositions
To investigate the associations between drug exposures and microbiome cluster transitions, we developed PARADIGM, a computational tool based on a logistic regression model integrated with first-order Markov-chain transitions (STAR Methods). Markov-chain models have been utilized to investigate microbiome dynamics previously, but the associations between Markov transitions and environmental factors have not been extensively studied14,21–24. The model takes advantage of the high resolution daily 16S rRNA-sequenced fecal samples (2,039 sample pairs; 454 patients; Table S2) to infer associations between drug exposures and cluster transitions (Figure 3a). For each cluster, we defined two transition types: self transitions (patients stay in the same cluster) and attractor transitions (patients move to a given cluster). The naming of “self transition” and “attractor transition” is motivated due to the intuitive connotation they convey about our cluster dynamics. Self transition describes the probability of a given cluster preserving its current state, and attractor transition describes the probability of a given cluster receiving transitions from any clusters other than itself, in a pair of daily collected samples.
Figure 3. PARADIGM predicts changes in microbiome features such as genus relative abundance and alpha-diversity following drug exposures.
a, Schematic representation of PARADIGM which takes advantage of daily sampling 16S rRNA-sequenced samples and cluster transitions to infer how drug exposures are associated with microbial dynamics. Bacteria response scores translate drug-cluster associations into drug-genus associations. b, Associations between drug exposures and microbiome cluster dynamics (full results in Figure S3). Self coefficients indicate whether drug exposure increases or decreases the log-odds of cluster stability. Attractor coefficients indicate whether drug exposure increases or decreases the log-odds of transition to a given cluster. TMP-SMX; Sulfamethoxazole/trimethoprim. ATG; Anti-thymocyte globulin. c, Bacteria response scores predict the association between a given drug exposure and changes in genus relative abundance or alpha-diversity (full results in Figure S4). d, Pearson’s correlation between Enterococcus response scores and alpha-diversity response scores. Each point represents an individual drug. e, Pearson’s correlation between bacteria response scores and measurements of in vitro inhibition 3. Each point represents the association between a unique drug-species pair. f, Predicted bacteria response scores by in vitro inhibition. Two-sided Wilcoxon’s rank-sum test.
We focused on 62 drugs to which patients in the discovery set were commonly exposed (STAR Methods). To determine the contribution of each drug to the likelihood of self and attractor transitions, we utilized elastic-net logistic regression, where the resulting coefficients for each drug indicate both the direction and magnitude of the association between the drug and daily cluster transitions. Microbiome injury patterns in allo-HCT are strongly linked with time relative to allo-HCT10,25,26. Therefore, we included time as a co-variable to address the temporal patterns of drug exposures and attenuate time-dependent variability in microbiome cluster dynamics (Figure S2). To account for unequal patient contribution of data points to the training set (Table S2), we pre-specified the 10-fold cross-validation partitions in our training cohort such that samples of the same patient are always in the same partition.
We identified several associations between drug exposures and cluster self and attractor transitions (Figure 3b; full results in Figure S3). As expected, several antibiotics used to empirically treat neutropenic fever were associated with profound changes in the intestinal microbiota. Exposures to meropenem (2.5-fold increase), and metronidazole (3.4-fold increase) were associated with increased transitions to the Enterococcus-high cluster 10, consistent with previous reports10,27. Several non-antibiotic medications were also associated with specific cluster dynamics. Exposure to aprepitant, a tachykinin receptor antagonist used for chemotherapy-induced nausea, was associated with a 2.8-fold increase in transition frequency to the Enterococcus-high cluster 10 (Figure S3). Similarly, exposure to the opioid analgesic fentanyl was associated with a 1.9-fold increase in transition frequency to the Enterococcus-high cluster 10. Other medications such as labetalol and insulin, which are not known to target intestinal bacteria, were associated with decreased stability of the Enterococcus-high cluster 10 (Figure S3). Altogether, PARADIGM identified the association between several non-antibiotic drugs and changes in the intestinal microbiome.
Previous studies have identified specific bacteria that have either beneficial or deleterious associations with clinical outcomes following allo-HCT28–30. As such, we translated drug-cluster associations into drug-taxon associations by calculating bacteria response scores to identify associations between drug exposure and changes in specific taxonomic groups of clinical interests (Figure 3c; full results in Figure S4). In our model, bacteria response scores estimate the association between a drug exposure and a microbiome feature, namely relative abundance of taxa or alpha diversity, where positive scores indicate association with higher relative abundances or diversity values (Figure 3a). We focused on four microbiome features previously associated with allo-HCT patient outcomes, namely relative abundance of Enterococcus, Blautia, Erysipelatoclostridium, and alpha-diversity7–9,29,30. Most antibiotics used in this cohort as empiric or pathogen-directed treatments (metronidazole, meropenem, aztreonam, and cefepime) were associated with increased relative abundance of Enterococcus as well as decreased alpha-diversity, consistent with previous studies (Figures 3c and S4)10,31,32. Our observation that cefepime exposure was associated with Enterococcus expansion is consistent with our previous report31, and may be partly explained by the poor activity of cefepime and other cephalosporins against enterococci. Piperacillin-tazobactam exposures were associated with decreased Enterococcus relative abundance, as well as decreased relative abundance of intestinal commensals such as Blautia and Erysipelatoclostridium to a greater extent compared to other empiric antibiotics (Figure S4). We also observed that drugs most strongly associated with Enterococcus expansion were the non-antibiotic drugs including opioids such as fentanyl and hydromorphone, hormones such as levothyroxine, and anticonvulsants such as gabapentin (Figure 3c and S4). Opioid exposures have been previously associated with decreased relative abundance of Blautia in ICU patients who did not receive antibiotics33, an observation we also reported here for fentanyl and hydromorphone (Figure S3). In contrast, laxatives such as docusate and polyethylene glycol were strongly associated with decreased Enterococcus relative abundance (Figure 3c). Previous experimental studies have demonstrated that polyethylene glycol induces global changes in bacterial compositions in mice, either through modulation of intestinal osmolality or through direct anti-bacterial inhibition34,35. Overall, drug exposures associated with alpha-diversity preservation were correlated with decreased Enterococcus expansion, and vice versa (Figure 3d).
Validation of in silico findings from real-world patient dataset against an independent in vitro dataset
We tested the predictive power of PARADIGM by comparing the in silico results described here using ‘real-world data’ from cancer patients with an independent published in vitro dataset3 (Figures 3e–f). The in vitro screen and the present study share in common 19 bacterial species and 34 drugs. We observed an enrichment of drug-species pairs that showed in vitro inhibition (for both antibiotics and non-antibiotics) in the lower left quadrant of the plot, which corresponds to negative response scores in silico (Fisher’s Exact Test: odd ratio = 0.60, p-value = 0.01). Furthermore, drug-species pairs that showed in vitro inhibition had significantly lower response scores in the patient dataset when compared to those that did not show inhibition in vitro (Figure 3f). Altogether, these results suggest that PARADIGM can accurately predict in vitro anti-bacterial activity of both antibiotics and non-antibiotics and distinguish direct interactions of drugs with bacteria species from the potential confounding influence of the clinical symptoms prompting these drug exposures.
Antibiotic exposure is a strong predictor of subspecies dynamics
Several experimental studies have demonstrated that the bactericidal spectra of several drugs are species- and strain-specific even within the same genus3,36. We therefore explored the associations between drug exposures and changes in relative abundances of species within genera of clinical importance in allo-HCT, using a subset of 980 specimens from 340 patients in the MSKCC discovery cohort with available shotgun metagenomic sequencing profiles (Figure S5). By applying a linear mixed-effects regression model (STAR Methods), we identified associations between drug exposures and changes in bacterial species relative abundance. Again, we detected that several drug exposures spanning different drug classes (antibiotics, laxatives, anti-diarrhea, and opioids) were associated with changes in relative abundance of Blautia coccoides, Blautia producta, Enterococcus faecalis, Enterococcus faecium and Erysipelatoclostridium ramosum, although these associations were not statistically significant.
Within human intestinal microbiome communities, most species are represented by a single dominant strain37, although we have previously observed complex strain dynamics during episodes of domination by E. faecium in allo-HCT patients38. For five species within three genera of interest, we identified the sequence signatures of the dominant strains on the basis of marker gene polymorphisms using the StrainPhlAn algorithm37 and calculated tree-based phylogenetic distances between dominant strains in consecutive patient samples (Figure 4a, upper schematic). Small phylogenetic distances between strains in consecutive samples indicate dominant-strain convergence, while large distances suggest dominant-strain divergence over time. For most species, and particularly E. faecium, the phylogenetic distance between consecutive samples declined over time (Figure 4a, middle row). This temporal pattern suggests reduction of within-species genetic variability across multiple species over time and the rise of a dominant subtype. Furthermore, subtype variability was inversely associated with species relative abundance (Figure 4a, lower row). This correlation might be a consequence of so-called “selective sweeps” 39,40 by comparatively better-fit strains or loss of variability due to a population bottleneck that may occur during allo-HCT41. Subspecies diversification following a parabolic fitness landscape has been observed in vancomycin-resistant E. faecium isolated from longitudinal stool sampling of allo-HCT patients38, which this method of strain classification focusing on dominant strains might fail to capture. We observed that antibiotic exposure was a significant predictor of dominant strain genetic convergence within species E. faecium, while non-antibiotic exposure was not (Figures 4b–c).
Figure 4. Antibiotics are strong predictors of strain genetic convergence during allo-HCT.
a, Strain convergence over time relative to HCT (middle row), or by species relative abundance (bottom row). Each point represents the tree-based phylogenetic distance between the dominant strains of a given species in a pair of subsequently collected samples. Higher phylogenetic distance suggests genetic dissimilarity, while lower phylogenetic distance suggests strain genetic similarity. b, c, Antibiotic exposure (b), but not non-antibiotic exposure (c) is associated with increased E. faecium dominant strain convergence. Each point represents the phylogenetic distance between E. faecium dominant strains in a pair of subsequently collected samples, stratified by drug exposures during the time gap of sample pair collection. Two-sided Wilcoxon’s rank-sum test.
Drug-microbiome associations are predictive of future microbiome trajectories and clinical outcomes following allo-HCT
Having demonstrated that drug exposures are associated with microbiota changes, and in light of previous reports associating fecal microbial composition with allo-HCT clinical outcomes, we next asked whether patterns of drug exposure alone could predict mortality independent of microbiome data. Using drug-exposure data from a separate MSKCC validation cohort, we defined patient-specific response scores as metrics that quantify the net response of microbiome features to drug-exposure profiles (Figure 5a). For example, the patient-specific Enterococcus response scores translated patient drug exposure profiles into relative risk of Enterococcus expansion (Figure 5b). We tested these patient-specific bacteria response scores against the outcomes of all-cause and specific-cause mortality in two independent validation cohorts: a subset of 423 MSKCC patients who were not included in the PARADIGM training cohort, as well as 142 patients from an independent cohort from Duke University Medical Center. All 62 drug exposures were considered for the MSKCC validation cohort, while only antibiotic exposures were evaluated in the Duke validation cohort. Patient characteristics in these two cohorts are outlined in Table 2.
Figure 5. Drug exposure profiles are predictive of future microbiome trajectories and allo-HCT patient outcomes in two distinct validation cohorts.
a, Schematic of the patient-specific bacteria response score calculation. b, Patients-specific Enterococcus response scores in the validation cohort were derived based solely on drug exposure profiles (between day −14 to 14 relative to HCT) and bacteria response scores presented in Figure 3c. A negative score indicates that the drug exposure profile is associated with an Enterococcus-inhibiting effect, while a positive score indicates that the drug exposure profile is associated with an Enterococcus-promoting effect. c, Pearson’s correlation between patient-specific bacteria response scores and observed genus relative abundance or alpha-diversity in samples collected between day 14 and 45 relative to HCT in the MSKCC validation cohort (423 patients) and Duke cohort (142 patients). Adjusted p-values by Benjamini-Hochberg’s correction. d, Patient-specific bacteria response scores are predictive of overall and cause-specific mortality in the MSKCC and Duke validation cohorts, in each respective multivariate Cox proportional hazard or Fine-Gray model, controlled for age, sex, conditioning intensity, graft source and underlying disease. Adjusted p-values by Benjamini-Hochberg’s correction.
Table 2. Patient characteristics from MSKCC and Duke cohorts.
IQR, interquartile range; sd, standard deviation; AML, acute myeloid leukemia; BM, bone marrow; PBSC, peripheral blood stem cell; PBSC T-cell depletion was performed by ex vivo CD34-selection of the graft.
| MSKCC Discovery | MSKCC Validation | Duke Validation | |
|---|---|---|---|
| Number of patients | 778 | 423 | 142 |
| Mean age at HCT, year (sd) | 55 (13) | 53 (13) | 51 (13) |
| Year of HCT (%) | |||
| 2009 – 2015 | 374 (48) | 331 (78) | 33 (23) |
| 2016 – 2019 | 404 (52) | 92 (22) | 109 (77) |
| Sex (%) | |||
| Female | 312 (40) | 166 (39) | 43 (30) |
| Male | 466 (60) | 257 (61) | 99 (70) |
| Disease (%) | |||
| AML | 278 (36) | 144 (34) | 42 (30) |
| Others | 500 (64) | 279 (66) | 100 (70) |
| Conditioning intensity (%) | |||
| Nonmyeloablative | 46 (6) | 89 (21) | 13 (9) |
| Reduced intensity | 171 (22) | 245 (58) | 6 (4) |
| Ablative | 561 (72) | 89 (21) | 123 (87) |
| Median follow-up, months | 46 | 49 | 11 |
| Graft type (%) | |||
| Cord blood | 62 (8) | 131 (31) | 19 (13) |
| BM unmodified | 78 (10) | 30 (7) | 13 (9) |
| PBSC unmodified | 176 (23) | 262 (62) | 110 (78) |
| PBSC T-cell depleted | 462 (59) | - | - |
We observed that patient-specific bacteria response scores based on drug exposures between day −14 to 14 relative to HCT were significantly and positively correlated with observed taxa relative abundance and alpha-diversity, respectively, from samples collected between day 14 and 45 relative to HCT in both the MSKCC and Duke validation cohorts (Figure 5c). Furthermore, patients whose drug exposure profiles predicted higher Enterococcus expansion were at an increased risk of all-cause and transplant-related mortality following allo-HCT in the MSKCC and Duke validation cohorts. Specifically, in the MSKCC validation cohort, patient-specific Enterococcus response scores were also significantly associated with an increased risk of GVHD-related mortality (Figure 5d). Conversely, patients whose drug exposure profiles predicted Eryispelatoclostridium, Blautia or alpha-diversity preservation had a decreased risk of all-cause mortality in both the MSKCC discovery and validation cohorts, as well as a decreased risk of GVHD-related mortality in the MSKCC validation cohort (Figures 5d and S6b). Overall, we demonstrated that drug-microbiome associations are predictive of subsequent changes in the intestinal microbiome compositions post-exposure, and of clinical outcomes in allo-HCT patients.
The framework for this analysis is a hypothesis that drug exposures affect the intestinal microbiota, which in turn shapes clinical outcomes. In some scenarios, however, patients at high risk for adverse outcomes (for a variety of reasons unrelated to the microbiome) might have received drugs that affect the intestinal bacteria. To explore these possibilities, we focused on Enterococcus and compared the hazard ratios of Enterococcus relative abundance with the corresponding patient-specific response scores in a Cox proportional hazard ratio model. We reported a significant association between Enterococcus abundance and mortality risk, which was stronger than the association between patient-specific Enterococcus response scores and overall mortality, specifically in the MSKCC validation cohort (Figure S6c). However, we also observed that patient-specific response score remained a statistically significant predictor of mortality risk when controlled for intestinal microbiome compositions (Figure S6c). Altogether, these results suggest that association between drug exposures and clinical outcomes is partially dependent on drug interactions with the intestinal microbiota.
Discussion
We developed PARADIGM, a computational method that identifies the associations between drug exposures and intestinal microbial dynamics in humans. At its core, PARADIGM analyzes how discrete states of intestinal microbial compositions respond to both antibiotic and non-antibiotic drugs. While other computational methods have largely focused on antibiotics42,43, our approach further reveals the associations between many non-antibiotic drugs and microbial dynamics in high resolution by analyzing a large dataset of daily stool samples from allo-HCT patients.
Our method was able to infer meaningful associations between drug exposures and microbiome despite the various confounding parameters such as the clinical symptoms prompting these drug exposures, which is particularly important when analyzing medications used to treat gut toxicity. We validated our findings by comparing bacterial response scores derived from this real-world patient study with published in vitro observations3 and found that our estimates were significantly correlated with the reported data. Furthermore, the bacterial response scores, calculated based solely on patient drug exposures, allowed us to predict future microbial changes and patient outcomes in two independent validation cohorts, particularly for Enterococcus responses. These results demonstrate that PARADIGM can generate hypotheses with both biological and clinical relevance.
In conclusion, we provide insights into the associations between pharmacologic exposures and changes in intestinal microbial composition at the early stage of transplant. The algorithm we have developed, PARADIGM, identifies biologically meaningful and clinically relevant associations between drug exposures and intestinal microbial dynamics in humans. PARADIGM facilitates the integration of drug exposures from many classes of xenobiotic agents, microbiome dynamics, and clinical outcomes to understand the determinants of microbiome health. This computational framework is well-suited for longitudinal data and may be built upon in the future to investigate other environmental parameters of interest, such as dietary intake or other components of the “exposome”, or applied to other disease settings in which drug-microbiome interactions are of clinical importance44.
Limitations of the Study
Our study enabled the identification of medication-microbiome associations using a high-throughput approach and showed the power of those associations in predicting clinical outcomes. However, important limitations exist, and the interpretation of the results should be performed with care. (i) This paper analyzes a retrospective cohort study. Controlled experiments are required to confirm whether our predictions are causal or not. (ii) Drug-microbiome interactions are potentially dependent on drug dosing, which we did not account for. (iii) While synergistic and antagonistic drug interactions have been reported45, our model assumes drugs to act independently from each other. (iv) Transplant-specific effects, such as conditioning intensity and graft type, are not fully captured by our model. We assume those effects are partially attenuated by a time parameter and the medication protocol depending on transplant type and conditioning regimen. (v) Medication exposure is only one component among the various perturbations a patient is subjected to. Other environmental factors have also been associated with changes in the intestinal microbiome. For example, dietary intake has been shown to play a major role in shaping the intestinal microbiota in both healthy individuals and cancer patients46–48. Future efforts to collect dietary intake data could help elucidate further the association between diet, changes in the intestinal microbiota and clinical outcomes of allo-HCT patients. Finally, we envision PARADIGM to be a valuable tool to investigate microbiome dynamics in vivo and await its application to other comparable datasets, encompassing drug exposures and other environmental factors, to replicate and extend the results in this study.
STAR★Methods
RESOURCE AVAILABILITY
Lead contact
Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Marcel van den Brink (vandenbm@mskcc.org).
Materials availability
This study did not generate new unique reagents.
Data and code availability
16S rRNA and shotgun metagenomic sequencing data have been deposited at the NCBI database and are publicly available as of the date of publication. De-identified patient data necessary to reproduce the results in this manuscript and NCBI SRA accession numbers have been deposited on Figshare and is publicly available as of the date of publication. The DOI is listed in the key resources table. All original code has been deposited at Github and is publicly available as of the date of publication. The DOI is listed in the key resources table. Scripts and data are provided to facilitate reproducibility of all the analyses presented in this manuscript, except for the analysis of clinical outcomes. Patient-level clinical outcomes for MSKCC and Duke cohorts are available via data sharing agreement per institutional policies. Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.
Key resources table
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Biological samples | ||
| Human Fecal Samples | This study | N/A |
| Chemicals, peptides, and recombinant proteins | ||
| Phenol:Chloroform:Isoamyl Alcohol (25:24:1) | Fisher Scientific | Cat#: AC327111000 |
| TRIS-EDTA 1X, pH 8.0, biotech grade | Fisher Scientific | Cat#: BP2473500 |
| 20% SDS Solution, molecular grade | Fisher Scientific | Cat#: BP1311–1 |
| 3M Sodium Acetate, pH 5.2, sterile, Teknova | Fisher Scientific | Cat#: NC0496686 |
| Biospec 0.1mm zirconia-silica beads | Fisher Scientific | Cat#: NC0362415 |
| TE Buffer with 0.1mM EDTA, pH 7.6, sterile | Fisher Scientific | Cat#: NC9491715 |
| Ethanol, 200 proof, 0.125gal | Fisher Scientific | Cat#: 04–355-222 |
| RNase A, 100mg; from bovine pancreas, Roche | Fisher Scientific | Cat#: 50–100-3354 |
| Water, Ultrapure, distilled | Fisher Scientific | Cat#: 10–977-023 |
| Sodium Chloride 1M, Teknova | Fisher Scientific | Cat#: NC1326185 |
| Critical commercial assays | ||
| QIAamp DNA mini kit | Qiagen | Cat#: 51306 |
| Qiaquick PCR Purification Kit | Qiagen | Cat#: 28106 |
| MinElute PCR Purification Kit | Qiagen | Cat#: 28006 |
| Collection tubes, 2 mL | Qiagen | Cat#: 19201 |
| Platinum Taq DNA polymerase, w/buffers | Fisher Scientific | Cat#: 10–966-083 |
| dNTP mix 10mM | Fisher Scientific | Cat#: 18–427-088 |
| Qubit dsDNA broad-range assay kit | Fisher Scientific | Cat#: Q32853 |
| Tapestation D1000 ScreenTape, 112rx | Agilent | Cat#: 5067–5582 |
| Tapestation D1000 Reagents, 112rx | Agilent | Cat#: 5067–5583 |
| Tapestation 96-well plate foil seal | Agilent | Cat#: 5067–5154 |
| Tapestation 96-well sample plate | Agilent | Cat#: 5067–5150 |
| Tapestation Loading Tips | Agilent | Cat#: 5067–5153 |
| Tapestation Optical tube strip caps | Agilent | Cat#: 401425 |
| Tapestation Optical tube strips | Agilent | Cat#: 401428 |
| Tapestation High-Sensitivity D1000 Screen Tape, 112rx | Agilent | Cat#: 5067–5584 |
| Tapestation High-Sensitivity D1000 Reagents, 112rx | Agilent | Cat#: 5067–5585 |
| Agilent Technologies 4200 Tapestation | Agilent | Cat#: G2991BA |
| KAPA Hyper Prep | Roche Diagnostics | Cat#: 07962363001 |
| AMpure Beads XP | Beckman Coulter | Cat#: A63881 |
| MiSeq Reagent Kit v2 (500-cycles) | Illumina | Cat#: MS-102–2003 |
| Deposited data | ||
| Stool metagenomics (16S rRNA and shotgun) sequencing | This study | DOI: 10.6084/m9.figshare.21657806.v1 |
| Oligonucleotides | ||
| 16S rRNA gene sequencing forward primer: (563F): 5′-nnnnnnnn-NNNNNNNNNNNN-AYTGGGYDTAAAGNG-3’ | Integrated DNA Technologies | https://earthmicrobiome.org |
| 16S rRNA gene sequencing reverse primer: (926Rb): 5′-nnnnnnnn-NNNNNNNNNNNN-CCGTCAATTYHTTTRAGT-3 | Integrated DNA Technologies | https://earthmicrobiome.org |
| Software and algorithms | ||
| PARADIGM | This study | DOI: 10.5281/zenodo.7818943 |
| R (version 4.1.0) | R Foundation | https://www.r-project.org |
| DADA2 (version 1.16.0) | Callahan et al., 2016 | https://benjjneb.github.io/dada2/index.html |
| BBMap (version 38.97) | N/A | https://www.sourceforge.net/projects/bbmap/ |
| KneadData (version 0.7.5) | The Huttenhower Lab | https://huttenhower.sph.harvard.edu/kneaddata |
| HUMAnN 3.0 (version 3.0.1) | Beghini et al., 2021 | https://huttenhower.sph.harvard.edu/humann/ |
| MetaPhlan 3.0 (version 3.1.0) | Beghini et al., 2021 | https://huttenhower.sph.harvard.edu/metaphlan/ |
| StrainPhlAn 3.0 | Truong et al., 2017 | http://segatalab.cibio.unitn.it/tools/strainphlan/ |
EXPERIMENTAL MODEL AND SUBJECT DETAILS
Study population of human subjects
The patient and fecal sample cohort in this study has been described in previous studies8,9,49. Stool samples were collected at two different transplant centers, Memorial Sloan Kettering Cancer Center (MSKCC) from April 2009 to September 2019, and Duke University Medical Center from July 2012 to April 2018. Participants in the observational cohorts at both MSKCC and Duke provided written informed consent for the use of their stool samples and clinical data. The use and analysis of these specimens for this study was approved by Institutional Research Boards at both institutions (MSKCC: #16–834; Duke: PRO0006268 and Pro00050975). Stool samples collected after day −30 relative to a first allo-HCT, and before day −10 relative to a second allogeneic transplant (if applicable) were included in the analysis. A subset of patients from MSKCC participated in a randomized clinical trial of fecal microbiota transplantation (FMT, NCT02269150)26. Stool samples from patients in the control arm, as well as from patients in the FMT arm collected pre-FMT were included for analysis. Post-FMT stool samples were excluded. Detailed patient and fecal sample characteristics for both MSKCC and Duke cohorts are outlined in Tables 1–2.
METHOD DETAILS
Stool collection and storage
As DNA extraction procedures, sample-handling environment, sequencing and bioinformatics pipelines are important sources of variability in microbiome data, we minimized bias and institutional batch effects by collecting and freezing samples at each center following the same protocol. All stool samples were collected, aliquoted and frozen at their respective clinical centers; extraction, sequencing, and analyses were performed centrally at MSKCC.
Fecal samples were collected in both inpatient and outpatient settings. At MSKCC, inpatient samples were collected by nursing staff from toilet inverted “hats” into ~100 ml-sized containers at the bedside, promptly delivered to the laboratory via pneumatic tube, and refrigerated at 4°C until aliquoted for long term storage at −80°C. Outpatient stools were collected in the patients’ homes using a commode specimen collection system, after which the entire collection bin was capped, placed inside a biohazard zip lock bag to prevent leakage, and deposited inside a 8 × 6 × 4.25” foam container along with pre-chilled freezer packs. Samples were either brought by the patient to a clinic appointment or shipped directly from the patient’s home to the laboratory via courier.
Upon receipt by the Molecular Microbiology Facility Laboratory at MSKCC, each sample was given a unique ID and recorded into the fecal biobank database. Approximately 0.5 ml of whole stool was aliquoted without any preservative solution into four 2 ml cryovials using a clean disposable spatula and transferred into a −80°C freezer until thawed for DNA processing. Duke samples were aliquoted similarly at Duke, batch-shipped frozen to MSKCC, and extracted and sequenced in the same way as the MSK samples at the Molecular Microbiology Facility Laboratory.
DNA extraction
Bacterial DNA was extracted using an optimized phenol-chloroform protocol to recover nucleic acids from tough microbes commonly present in stool samples, as previously described8,50. Briefly, 200–300 mg of solid or 200–300 μl of liquid stool were aliquoted and the respective wet weight was recorded. Fecal samples were resuspended in 500 μl of extraction buffer (i.e., 0.2 M NaCl, 0.2 M Tris-HCl, pH 8.0, and 20 mM ethylenediaminetetraacetic acid, prepared fresh). The mixture was combined with 0.1 mm zirconia/silica beads (approximately 500 μl), 200 μl of sodium dodecyl sulfate, and 500 μl of phenol-chloroform-isoamyl alcohol (25:24:1 solution). The bacterial cells were lysed by mechanical disruption using a bead-beater for 2 minutes at over than 3000 rpm. Detritus was spun down at 16,000g at 4°C for 5 minutes. The upper aqueous layer was transferred to a clean 1.7 ml tube and mixed with 100 μl of extraction buffer as described above, and 500 μl of phenol-chloroform-isoamyl alcohol (25:24:1). The solutions were homogenized by inversion and spun down again at 16,000g for 5 minutes at 4°C. The upper aqueous layer was recovered, and this process was repeated for a total of three times. After the final round of phenol-chloroform-isoamyl alcohol was completed, 400 μl of the top aqueous layer was combined with 40 μl of sodium acetate and 880 μl of cold 100% ethanol. The mixture was vortexed and then frozen for at least 20 minutes, preferably overnight. After the ethanol freeze incubation period, samples were spun down at 16,000g for 20 minutes at 4°C to pellet the DNA. The upper aqueous layer was aspirated, and the visible DNA pellet was resuspended in 200 μl of TE buffer containing 100 mg/mL of RNase solution, followed by an incubation at 50°C for 20 minutes. The genomic DNA was further purified using the QIAa mp DNA mini kit (Qiagen) following the manufacturer instructions. The purified DNA was eluted in 100 μl of ultrapure water and stored at −80°C prior to quantification of DNA yield and subsequent PCR amplification.
Sequencing
16S rRNA V4-V5 barcoded amplification and multiplexing:
The amplification, multiplexing and sequencing of 16S rRNA from extracted DNA has been previously reported26. Briefly, genomic purified DNA was diluted if necessary and 50 ng was used as template during PCR amplification. The V4-V5 region of the 16S rRNA gene was amplified with the primers 563F (5′-nnnnnnnn-NNNNNNNNNNNN-AYTGGGYDTAAAGNG-3’) and 926Rb (5′-nnnnnnnn-NNNNNNNNNNNN-CCGTCAATTYHTTTRAGT-3), where ‘N’s represent unique 12-base pair Golay barcodes and ‘n’s represent additional nucleotides to offset the sequencing of the primers51. Duplicate PCR reactions were performed for each sample with 2.5 U of Platinum Taq DNA polymerase and 0.5 mM of forward and reverse primers at 94°C for 3 minutes, followed by 27 cycles of 94°C for 50 seconds, 51°C for 30 seconds and 72°C for 1 minute and a final elongation step at 72°C for 5 minutes. Amplicons were purified using the Qiaquick PCR Purification Kit (Qiagen) after pooling the sample replicates. The purified PCR products were quantified using Agilent Technologies 4200 TapeStation and multiplexed at equimolar amounts. The obtained pools of barcoded 16S amplicons went to further processing for library preparation and sequencing on an Illumina MiSeq platform at paired-end 250 base pair (bp) at the MSKCC Integrated Genomics Operation sequencing core. Extraction blanks were included in each extraction batch as negative controls. These blanks were PCR-amplified but did not show PCR products and were subsequently removed from demultiplexing and sequencing.
Shotgun metagenomic sequencing:
Shotgun metagenomic sequencing was conducted as previously described38. Extracted DNA was sheared to a target size of 650 bps using a Covaris ultrasonicator. DNA was then prepared for sequencing using the Illumina TruSeq DNA library preparation kit and sequenced using the Illumina platform targeting approximately 10–20 million reads per sample with 100-bp paired-end reads.
QUANTIFICATION AND STATISTICAL ANALYSIS
Sequencing bioinformatics pipeline
16S rRNA gene sequencing:
Reads were quality-filtered, deduplicated, denoised and amplicon sequence variants (ASVs) were inferred following the DADA2 pipeline (Divisive Amplicon Denoising Algorithm)52. Reads were truncated at a length of 180 bp for forward and reverse reads to ensure sufficient quality and length to use overlapping paired end reads. Default values were used for filtering and trimming reads prior to inferring sequence variants through function filterAndtrim() (maxN= 0, maxEE=2, trunQ=2). Reads per sample were capped at 100,000 prior to sequence variance inference. Samples with more than 100,000 reads were sub-sampled. Each sequencing run was analyzed separately before merging the ASV counts across multiple runs.
Samples with poor sequencing quality were filtered out using the following criteria:
Less than 30% reads remaining after filtering and trimming
Less than 1,000 reads remaining after filtering and trimming
More than 5% adapter contamination
Quality control details are summarized in Table S3. Taxonomic classification was annotated according to the NCBI 16S rRNA sequence database. Alpha-diversity was calculated using the reciprocal Simpson index at the ASV level, which is the metric used for all alpha-diversity measures in this study. Beta-diversity was computed according to the Bray-Curtis distances at the genus level using the beta_diversity.py script in the QIIME bioinformatic pipeline53. We performed t-distributed stochastic neighbor embedding (tSNE) dimensionality reduction for visualization of the intestinal microbiota compositions using a Bray-Curtis β-diversity matrix at the genus level with R package Rtsne (max_iter = 10,000; perplexity = 75; theta = 0.2).
Shotgun metagenomic sequencing:
The right and left side of a read in a pair was trimmed to Q10 using the Phred algorithm, using the bbduk.sh script in the BBMap package (https://www.sourceforge.net/projects/bbmap/). A pair of reads was dropped if either read had a length shorter than 51 nucleotides after trimming. The 3’-end adapters were trimmed using a kmer of length 31, and a shorter kmer of 9 at the other end of the read. One mismatch was allowed in this process, and adapter trimming was based on pair overlap detection (which does not require known adapter sequences) using the ‘tbo’ parameter. The ‘tpe’ parameter was used to trim the pair of reads to the same length. Removal of human contamination was done using Kneaddata with paired end reads, employing BMTagger. The BMTagger database was built with human genome assembly GRCh38.
Samples with poor sequencing quality were filtered out using the following criteria:
Less than 50% reads remaining after trimming with BBMap
Less than 1 million reads remaining after trimming
More than 5% adapter contamination
After decontamination, the paired-end reads were concatenated to a single FASTQ file as the input for functional profiling with the HUMAnN 3.0 pipeline54. After aligning to the updated ChocoPhlAn and UniRef90 database with default settings, the samples were renormalized by library depth to copies per million. MetaPhlAn 3.0 was used to identify taxonomic compositions with the relative abundance of the species using parameter -t rel_ab54. Similar to the 16S rRNA-sequenced data, we performed tSNE analysis for visualization of shotgun metagenomic compositions using a Bray-Curtis β-diversity matrix at the species level with R package Rtsne (max_iter = 3,000; perplexity = 20; theta = 0.1).
Clustering of the intestinal microbiota compositions
Three unsupervised clustering methods were used to identify distinct clusters of the intestinal microbiota compositions within the MSKCC validation cohort. Unsupervised k-means clustering and hierarchical clustering methods were applied to the Bray-Curtis β-diversity matrix. Dirichlet Multinomial Mixture (DMM) was applied to the raw count matrix at the genus level. For all three methods, cluster size parameter from 2 to 15 clusters was evaluated, ultimately choosing ten clusters for downstream analysis. Since k-means clustering partitioned samples more evenly (Figure S1g), we utilized k-means clusters for our subsequent analyses. We determined ten clusters to be a good balance between reducing the complexity of the 16S rRNA sequencing data while still representing the variability of the intestinal microbiome compositions during allo-HCT.
Domination threshold
We defined sample domination by a single taxonomic units using the threshold of ≥ 0.3 of the 16S rRNA sequencing relative abundance8. The taxonomic color scheme was adapted from the R package yingtool2 (https://github.com/ying14/yingtools2) and a previous publication8.
Derivation of a mathematical model of microbial dynamics termed PARADIGM
A biologically motivated, simplified mathematical model termed PARADIGM was developed to model microbial dynamics while simultaneously considering the effects of drug exposures on the intestinal microbiota. A naïve model with possible clusters in a time interval for distinct drugs would require parameters, which would quickly saturate the amount of data available in most practical applications. The following characteristics of intestinal microbial dynamics during allo-HCT were considered to simplify our model: (i) The microbial composition is more likely to stay in same cluster within a day window; (ii) Drug exposures perturb clusters differently; (iii) Transitions are easier to occur among close clusters compared to distant clusters; and (iv) Transition events vary over time. As such, we developed a model that include parameters for time, drug exposures and distance between clusters.
We defined microbiota dynamics in terms of two possibilities: the rate at which the microbiota composition stays in the same cluster (self transition) and the rate at which it attracts transitions from other clusters (attractor transition). Formally, attractor transition is a force, measured in terms of probability, that describes the magnitude in which a given cluster may receive transitions from any clusters other than itself, in a pair of daily collected samples. For example, the attractor transition to Cluster 1 is defined as the patient moves to cluster 1 at time , from any clusters between 2 to 10 at time . We note that, “attractor” has a particular definition in statistical physics, which does not directly translate into the definition outlined in this study.
Let represent the transition probability from cluster to at a single day resolution and and represent the parameters associated with self and attractor transitions. The transition probabilities can be summarized as:
| (1) |
Where represents Kronecker delta.
The transition dynamics are simplified to a 2×2 transition matrix per cluster that represents and states and the rates of transition between those states:
| (2) |
And the self and attractor parameters are formally represented as:
| (3) |
Inference of drug influence on cluster dynamics via elastic net regularized regression
In order to investigate high-resolution cluster dynamics in response to drug exposures, we included samples that were collected 1 day apart between day −14 and 14 relative to HCT from patients in the MSKCC discovery cohort. The criteria for drug inclusion in this analysis are as followed: (1) Drugs were administered via oral/IV routes; (2) Drug exposures occurred between day −14 to 14 relative to HCT; (3) Drugs were administered to at least 5% and no more than 90% patients in the discovery cohort; and (4) Drug exposures occurred in at least 15 patients with single-day resolution samples.
Equations 1–3 provide a formal definition of our microbial dynamics in terms of self and attractor transitions. In practice, we took advantage of a logistic regression fit to solve Equation 3 by assuming a binary value for each transition to a given cluster i. We defined the value of 1 for transitions towards cluster i and a value of 0 otherwise.
We used elastic net regularized regression for feature selection to estimate the influence of drug exposures on self and attractor transitions of each cluster, using the R package caret, e1071 and glmnet for model fitting and parameter tuning. Formally, the coefficients that impact transition probabilities were computed as following:
| (4) |
The parameters have a value of 1 in days patients were exposed to drug and 0 otherwise. The 10-fold cross-validation partitions were pre-specified such that samples from the same patient are always in the same partition using function trainControl() and train() in R package caret.
Calculation of bacterial response score of drugs
The dynamic model we proposed can estimate the association between a given drug exposure with quantitative microbiome features. The self and attractor parameters predicted in Equation 4 can be used to construct transition probabilities according to Equation 1. Let the probability of switching from cluster to in the absence of any drug exposure be , the bacterial response score of a drug on taxon is defined as:
| (5) |
where represents the average relative abundance of taxon from all samples in cluster . A negative score indicates an antibacterial effect of drug on taxon . The transition matrix used to estimate antibacterial score was computed at time .
The bacterial response scores estimated from Equation 5 were compared to published in vitro measurements of anti-bacterial activities3. The in vitro measurements evaluated the effect of specific drugs using the area under the curve (AUC) of bacterial growth. An AUC value significantly below 1 indicates that the drug inhibited bacterial growth in vitro. We restricted our analysis to 19 bacteria species that appeared in more than 10% of the MSKCC samples with relative abundance above 10–4. We identified the relative abundance of a given species by summarizing the relative abundance of all ASVs mapped to that species. Our prediction was considered consistent with in vitro measurements when bacterial response scores were negative for bacteria-drug pairs that showed inhibition in that study. Statistical significance was evaluated using two-sided Wilcoxon’s rank-sum test.
Defining patient-specific bacteria response scores based on drug exposure profiles
A risk score that is associated with patient outcomes can be computed from how drug exposures are predicted to influence the microbial dynamics. We defined specific target values, , which are features of the intestinal microbiota associated with each cluster, and adapted Equation 5 to compute a risk score, , for each patient by averaging the contribution of all drug exposures that occurred within day −14 to 14 relative to HCT.
| (6) |
For this work, we considered four target values: relative abundance of Enterococcus, Erysipelatoclostridium, Blautia, and alpha-diversity. Patient-specific bacteria response scores for each of the considered microbiome features were calculated based on drug exposure profiles between day −14 and 14 and compared with observed taxa relative abundance or alpha diversity in samples collected between day 14 to 45 relative to HCT using Pearson’s correlation, with adjusted p-values by Benjamini-Hochberg’s correction. The median values were taken for patients with multiple samples available.
Imputation of missing samples
The dynamic model defined in Equation 1 can be used to simulate microbial dynamics and to impute cluster state in time points without sample collection. Equation 1 defines transition probabilities at single day resolution and forms the fundamental unit to predict microbial dynamics. Let be the compositional state at time t, the value is equivalent to . Assuming first-order Markov chain, the transition probability matrix between samples at longer time distance can be computed from its fundamental units as following:
| (7) |
Where represents transition probabilities at single day resolution between time to .
The imputation of cluster states at a given time point can be performed by considering the state at nearby time points. Interpolations can be made based on past samples (forward interpolations), future samples (backward interpolations) or both (forward-backward interpolations). Consider the estimation of a cluster state at time point t from samples available at time point and . Forward interpolation imputes the data point based on the probability , backward prediction from and backward-forward interpolation from . Forward interpolations can be estimated from Equation 7. The backward interpolations were obtained by adapting model described in Equation 1 to predict cluster state at time based on cluster state at time and exposures at time . The backward-forward interpolation takes advantage of both forward and backward equations and is computed as:
| (8) |
A modified forward-backward interpolation was also tested in which final predictions is obtained by either forward or backward probabilities:
| (9) |
Where indicates whether forward or backward interpolation should be used to predict state at time t, and c indicates the mean of the forward-backward prediction probabilities.
The imputation accuracy for each model described above was estimated using 10-fold cross validation, in which 90% of patients were selected as the training set and 10% as the test set. In the training set, the logistic regression model defined in Equation 5 was used to estimate the parameters for self and attractor weights as well as the decision parameter for modified forward-backward prediction. In the test set, the regression coefficients were used to reconstruct transition matrices and compute forward, backward, forward-backward as well as modified forward-backward probabilities. The microbial state was classified by the cluster with the maximum probability.
Linear mixed-effects model of drug-species associations using shotgun metagenomic profiles
We defined a linear mixed-effects model to investigate the association between drug exposures and changes at a species-level resolution. Species relative abundance was predicted by MetaPhlAn 3.0 using shotgun metagenomic data. The model considers pairs of samples that were collected between day −14 and 14 relative to HCT and less than five days apart, and with a minimum species relative abundance of 10−4. We assumed that changes in species relative abundance depend on time of sample collection and drug exposure. Formally, let a given sample pair be collected at time and , respectively. The log difference in species relative abundance, ln depends on fixed effects of drug exposures and a random effect relative to time of sample collection, , as following:
| (10) |
The parameter represents time of first sample collection in the sample pair, binned into weekly intervals relative to HCT, as a random effect variable. represents a given drug exposure and has the value of 1 if patients were exposed to the drug in the time interval and . P-values from the linear mixed-effects model were adjusted for multiple hypothesis testing using Benjamini-Hochberg’s correction. Equation 10 was solved using R package lme4.
Dominant strain dynamics and its association with drug exposures
StrainPhlAn 3.0 was used to profile bacterial community at the strain-level resolution37,54. This algorithm identifies the most dominant strain per species per sample by reconstructing dominant consensus sequence variants across species-specific marker genes. We applied StrainPhlAn 3.0 to a dataset of 980 shotgun metagenomic shotgun samples in the MSKCC discovery cohort, which returned the multiple sequence alignment of the dominant strain for a given species and the RAxML (Randomized Axelerated Maximum Likelihood) phylogenetic tree across samples. The phylogenetic tree was then used to calculate the dominant strain phylogenetic distance across a pair of samples. Phylogenetic distance is defined as the branch length between two nodes in the StrainPhlAn phylogenetic tree, normalized over the total branch length of the tree. Branch length was calculated using R function cophenetic(). We focused on strains from five species of interest, including Blautia coccoides, Blautia producta, Enterococcus faecalis, Enterococcus faecium and Erysipelatoclostridium ramosum.
For analysis of dominant strain convergence in association with antibiotic and non-antibiotic exposures, we compared the phylogenetic distances of dominant strains within the species E. feacium across pairs of samples that were collected less than five days apart between day −14 and 14 relative to HCT. Exposures were considered true if patients received the drug during sample pair collection time. Antibiotic exposure includes the seven investigated antibiotics in this study. Non-antibiotic exposure comprises seven drugs with the highest absolute Enterococcus-response score values, indicating strongest associations with changes in Enterococcus relative abundance. These non-antibiotic drugs include diphenoxylate/atropine, polyethylene glycol, levothyroxine, fentanyl, methotrexate, anti-thymocyte globulin and cyclosporine. Statistical significance was evaluated using two-sided Wilcoxon’s rank-sum test.
Model validation using simulated data
We validated our method using simulated data of known parameters. We considered 62 possible drug exposures and estimated their elastic net regularized regression coefficients. We simulated a dataset of single day resolution samples for 1000 patients ranging from day −14 to day 14. Drug exposures were simulated with a 50% possibility for each drug at each day. The cluster state at day −14 was randomly sampled according to pre-transplant cluster frequency observed in the MSKCC cohort. Following days were randomly sampled according to transition probability conditioned to known state at preceding day, . We used this single-day resolution simulated dataset to feed into the elastic net regularized regression to determine how well the model can recover the pre-defined self and attractor coefficients utilized to simulate cluster states. Performance of our method was evaluated using Pearson’s correlation to estimate the correlation between coefficients estimated from real dataset and coefficients recovered from simulated dataset (Figure S7a).
Inference of drug influence on genus relative abundance via elastic net regularized regression
Analysis of changes in specific taxon relative abundance as a function of environmental or clinical factors using regression models is a commonly employed computational approach in microbiome studies. However, this approach focuses on pre-specified taxa, which does not account for the community structure of the intestinal microbiota compositions and has potential mathematical constrains including saturated and overfitted models. By employing clustering approach to reduce complexity while preserving the microbiome community structure, PARADIGM can potentially overcome these mathematical challenges. To demonstrate the improved prediction power of PARADIGM, we analyzed changes in each genus relative abundance between daily collected sample pairs as a function of drug exposures and time of sample collection. The dataset and input parameters were identical to the training dataset for PARADIGM (Table S2). Based on this model, we predicted future changes in genus relative abundance between day 14 and 45 and compared with observed values using Pearson’s correlation. We demonstrated that while the associations between drug exposures and genus relative abundance were correlated between the single-taxon approach and PARADIGM approach (Figure S7b), PARADIGM could reliably predict future microbiome trajectories (Figure 5c) while the single-taxon approach could not (Figure S7c).
Co-inclusionary and exclusionary relationships between bacteria and cluster stability
To identify the potential microbiome influence on the stability of Enterococcus domination in allo-HCT patients, we developed a logistic regression model with lasso penalty. This model takes into account the relative abundance of different bacterial genera on the stability of Enterococcus-high cluster 10. We defined cluster 10 stability as the rate at which the microbiota compositions stay in the same cluster 10, assuming a value of 1 when patients stay in cluster 10, and a value of 0 when patients move from cluster 10 to any other cluster among a pair of daily collected samples. We included as parameters antibiotics exposure, day of sample collection relative to HCT, alpha-diversity and relative abundance of top 20 most abundant genera in cluster 10. Antibiotic exposure includes exposure to any of the seven antibiotics investigated in this study. The input dataset consists of pairs of daily samples collected between day −14 and 100 relative to HCT. Formally, the coefficients of these potential drivers of cluster 10 stability were computed as following:
| (11) |
In which , represents the relative abundance of each taxon in the top 20 most abundant genera in cluster 10, and indicates if patients received any of the seven investigated antibiotics in this study on day . Coefficient values indicate the direction and magnitude of association between each parameter and the stability of Enterococcus-high cluster 10, with a negative value indicating a negative association with Enterococcus domination stability.
QUANTIFCATION AND STATISTICAL ANALYSIS
Statistical analyses of survival outcomes
We used landmark analyses of survival beyond day 14 relative to HCT using R package survival. Patients were censored at the time of last contact or at the time of second allo-HCT (when applicable). All survivors were censored at two years of follow-up. Patients randomized to the FMT arm of the trial were excluded from analysis of clinical outcomes. Patient-specific bacteria response scores corresponding to a given microbiome feature were considered as a continuous variable in a multivariate Cox proportional hazard model, controlled for age, sex, conditioning intensity, graft source and underlying disease as variables with R function coxph(). P-values were adjusted for multiple hypothesis testing using Benjamini-Hochberg’s correction.
Competing risk analyses were performed to identify the association between patient-specific bacteria response scores and cause-specific mortality. We investigated three competing events, namely relapse (defined here as relapse or progression of disease), GVHD-related mortality (defined here as death due to GVHD or after GVHD onset, without relapse), and transplant-related mortality (encompassing deaths from GVHD, infections and organ toxicities). For each competing event, multivariate Fine-Gray subdistribution hazard models were fit by R function crr() from R package tidycmprks. Hazard ratios are presented in main text with the 95% confidence interval indicated in parentheses.
To tease out the potential causal relationships between medication, microbiome and mortality, we compared the effect sizes and statistical strengths of patient-specific bacteria response scores and observed genus relative abundance or diversity values in predicting all-cause mortality in the MSKCC and Duke validation cohorts. In order to assure proper magnitude comparison, microbiome measurements were rescaled by Z-score normalization and fit into either an independent (which consider either microbiome metrics or response scores) or a competing (which considers microbiome feature and response score together) multivariate Cox proportional hazard model, controlled for age, sex, conditioning intensity, graft source and underlying disease. The independent and competing models consider only patients with stool samples collected between day 0 and 45 relative to HCT (MSKCC validation cohort: 340 patients; Duke validation cohort: 108 patients). Genus abundance was calculated from stool samples collected between day 0 and 45 relative to HCT, response scores were calculated using drug exposure profiles between day −14 to 14 relative to HCT. The median values were taken for patients with multiple samples available. P-values were adjusted for multiple hypothesis testing by Benjamini-Hochberg’s correction.
Supplementary Material
Figure S1. Clusters of intestinal microbiota during allo-HCT were identified by other unsupervised clustering methods, related to Figure 1. a, d, Compositional space of the intestinal microbiota visualized by tSNE projection in the MSKCC discovery cohort. Each dot represents a sample, colored according to clusters assigned by (a) hierarchical clustering or (d) Dirichlet Multinomial Mixture (DMM) model. b, e, Alpha-diversity per cluster identified by (b) hierarchical clustering or (e) DMM. The horizontal dashed line represents the median alpha-diversity of samples in the MSKCC discovery cohort. c, f, Compositional characteristics per cluster identified by (c) hierarchical clustering or (f) DMM. Samples in which the relative abundance of the most common taxon is ≥ 30% are color-coded by the most common taxon. Non-dominated samples are colored in white. (p: phylum; f: family; o: order; g: genus). g, Distribution of samples across 10 clusters identified by three unsupervised clustering methods. h, Optimal number of k-means clusters was estimated from the curve of within-cluster sum of square distances from each point to its cluster centroid. i, Optimal number of clusters identified by DMM was estimated by the smallest Laplace approximation metric. j, Correlation between cluster stability and alpha-diversity. Cluster stability was measured by self-transition probability. Alpha-diversity is defined as the cluster median reciprocal Simpson diversity. k, Co-exclusionary and inclusionary relationships associated with the stability of Enterococcus-high cluster 10.
Figure S2. Time courses of drug exposures between day −14 to 14 relative to allo-HCT, related to Figure 3. Red dashed lines indicate day 0 which is the day of stem cell infusion.
Figure S3. Associations between drug exposures and cluster self and attractor transitions, related to Figure 3. Self coefficients indicate whether drug exposure increases (positive coefficients, red shades) or decreases (negative coefficients, blue shades) the log-odds of cluster stability. Attractor coefficients indicate whether drug exposure increases (positive coefficients, red shades) or decreases (negative coefficients, blue shades) the log-odds of transition to a given cluster. #pts indicates the number of patients exposed to each drug, #dps indicates the number of sample pairs collected on the day of each drug exposure.
Figure S4. Bacteria response scores for four microbiome features of interest, related to Figure 3. Bacteria response scores predict the association between a given drug exposure and changes in genus relative abundance or alpha-diversity. Positive response scores (red shades) indicate that drug exposures are associated with increased genus relative abundance or alpha-diversity. Negative response scores (blue shades) indicate that drug exposures are associated with decreased genus relative abundance or alpha-diversity.
Figure S5. Associations between drug exposures and changes in species relative abundance from samples with shotgun metagenomic sequencing, related to Figure 4. Log changes in species relative abundance between subsequently collected samples were analyzed as a function of individual drug exposure in a linear mixed-effects regression model, with time of sample collection binned into weekly intervals as a random effect variable. Positive coefficient values (red shades) indicate that drug exposures are associated with increased species relative abundance. Negative coefficient values (blue shades) indicate that drug exposures are associated with decreased species relative abundance. A white box indicates that there is insufficient datapoint to fit the regression model for a given drug-species pair.
Figure S6. Investigation of causal relationship between drug exposures, microbiome and mortality in all patient cohorts, related to Figure 5. a, Pearson’s correlation between patient-specific bacteria response scores and observed genus relative abundance or alpha-diversity in samples collected between day 14 and 45 relative to HCT in the MSKCC discovery cohort (454 patients, including recipients of PBSC T-cell depleted graft). b, Prediction of overall and cause-specific mortality in the MSKCC discovery cohort (295 patients, excluding recipients of PBSC T-cell depleted graft) based on patient-specific bacteria response scores for four microbiome features of interest, in each respective multivariate Cox proportional hazard or Fine-Gray model, controlled for age, sex, conditioning intensity, graft source and underlying disease. Adjusted p-values by Benjamini-Hochberg’s correction. c, Microbiome feature metrics (taxa relative abundance or alpha-diversity) and corresponding response scores were compared in terms of their associations with overall mortality risk. Measurements were fit into either an independent (which consider either microbiome metrics and response scores) or a competing (which considers microbiome metrics and response scores together) multivariate Cox proportional hazard model.
Figure S7. PARADIGM was validated against simulated data and different regression models, related to STAR Methods. a, PARADIGM was validated against simulated data using Pearson’s correlation. This single-day resolution simulated dataset was input into the elastic net regularized regression to determine how well the model can recover the pre-defined self and attractor coefficients utilized to simulate cluster states. b, PARADIGM was validated against a separate elastic net regularized regression using drug exposures and time of sample collection as predictors of changes in a single genus relative abundance, using the same training dataset. c, Single taxon-based regularized linear regression approach poorly predicted observed changes in Enterococcus relative abundance in samples collected between day 14 and 45.
Table S1. Cluster transition frequencies in consecutive samples collected within a 7-day time window in the MSKCC cohort, related to Figure 2h. The rows indicate the cluster assignment of the first sample in a pair of consecutive samples. The columns indicate the cluster assignment of the second sample in a pair of consecutive samples.
Table S2. Patient characteristics of the MSKCC sub-cohort included in PARADIGM training set of daily collected fecal samples, related to Figure 3. IQR, interquartile range; sd, standard deviation; AML, acute myeloid leukemia; BM, bone marrow; PBSC, peripheral blood stem cell; PBSC T-cell depletion was performed by ex vivo CD34-selection of the graft.
Table S3. Quality control of 16S rRNA and shotgun metagenomic sequencing data, related to STAR Methods.
The application of a computational method called PARADIGM to a large dataset of cancer patients’ longitudinal fecal specimens and detailed daily medication records reveals associations between drug exposures and the intestinal microbiota that recapitulates in vitro findings and also predictive of clinical outcomes.
Highlights
A computational method identifies medication-microbiome associations
Applied to dataset of cancer patients’ fecal microbiome profiles and medication records
In silico results recapitulate in vitro measurements of anti-bacterial activities
Medication-bacteria associations are predictive of patient clinical outcomes
Acknowledgements:
We thank Dr. Tobias Hohl and Dr. Thierry Rolling for discussions, and clinical staff members involved in the fecal sample collection and bio-banking at MSKCC for critical support. Work was supported by the Parker Institute for Cancer Immunotherapy (PICI) Fellowship to C.L.N. M.R.M.v.d.B. is supported by NCI awards, MSKCC Core Grants P30 CA008748, R01-CA228358, R01-CA228308, P01-CA023766; NHLBI award R01-HL125571, R01-HL123340; NIA award Project 2 of P01-AG052359; NIAID award U01 AI124275; Tri-Institutional Stem Cell Initiative award 2016-013; The Lymphoma Foundation; The Susan and Peter Solomon Divisional Genomics Program; and the PICI. K.A.M. is supported by the DKMS and the PICI. J.U.P. reports funding from NHLBI NIH Award K08HL143189, the MSKCC Core Grant NCI P30 CA008748 and the PICI. A.D.S. reports funding from NIA Award R21AG066388, NHIBL Award R01HL151365 and ASH Scholar Award. N.J.C reports funding from NCI Award R01CA203950. E.G.P. reports funding from NIAID award AI124275. J.B.X. reports grants from NIAID: U01 AI124275, grants from R56 AI137269-01, grants from NIH and NSF. O.M. is supported by the ASCO young investigator award, Hyundai Hope on Wheels young investigator award, Cycle for survival Equinox Innovation award, Collaborative Pediatric Cancer Research Program Award, and the Michael Goldberg fellowship.
Declaration of Interests: Memorial Sloan Kettering Cancer Center has financial interests relative to Seres Therapeutics. M.R.M.v.d.B. has received research support from Seres Therapeutics; has consulted, received honorarium from or participated in advisory boards for Seres Therapeutics, WindMIL therapeutics, Rheos, Frazier Healthcare Partners, Nektar Therapeutics, Notch Therapeutics, Forty Seven inc., Priothera, Ceramedix, Lygenesis, Pluto Immunotherapeutics, Magenta Therapeutics, Merck & Co, Inc., and DKMS Medical Council (Board); and has IP Licensing with Seres Therapeutics, Juno Therapeutics, and stock options from Seres and Notch Therapeutics. J.U.P. reports research funding, intellectual property fees, and travel reimbursement from Seres Therapeutics, and consulting fees from Da Volterra, CSL Behring, and MaaT Pharma; serves on an Advisory board of and holds equity in Postbiotics Plus Research; and has filed intellectual property applications related to microbiome. K.A.M. is on the advisory board for and holds stock in PostBiotics Plus; and has served in an advisory role and received honoraria from Incyte. R.S. has served on an advisory board for Medexus. B.G. received research funding from Actinium Pharmaceuticals, Inc. E.G.P. serves on the advisory board of Diversigen and has received speaker honoraria from Bristol Myers Squibb, Celgene, Seres Therapeutics, MedImmune, Novartis and Ferring Pharmaceuticals; is an inventor on patents related to microbiome; and holds patents that receive royalties from Seres Therapeutics. A.D.S. has received research funding from Merck, Norvatis, and Seres; has received honoraria from Abbott Nutrition; has consulted for AVROBIO and Targazyme; and has received research supplies from Clasado and DSM/iHealth. M.A.P. reports honoraria from Adicet, Allovir, Caribou Biosciences, Celgene, Bristol-Myers Squibb, Equilium, Exevir, Incyte, Karyopharm, Kite/Gilead, Merck, Miltenyi Biotec, MorphoSys, Nektar Therapeutics, Novartis, Omeros, OrcaBio, Syncopation, VectivBio AG, and Vor Biopharma; serves on DSMBs for Cidara Therapeutics, Medigene, and Sellas Life Sciences, and the scientific advisory board of NexImmune; has ownership interests in NexImmune, Omeros and OrcaBio; and has received institutional research support for clinical trials from Incyte, Kite/Gilead, Miltenyi Biotec, Nektar Therapeutics, and Novartis. N.J.C. is on DSMBs for Fate Therapeutics, Takeda, and Celularity. A.L.C.G. is currently employed by and has stock options at Xbiome Inc.
Footnotes
Inclusion and Diversity
One or more of the authors of this paper self-identifies as an underrepresented ethnic minority in their field of research or within their geographical location. One or more of the authors of this paper self-identifies as a member of the LGBTQIA+ community.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
REFERENCES
- 1.Korpela K, Salonen A, Virta LJ, Kekkonen RA, Forslund K, Bork P, and de Vos WM (2016). Intestinal microbiome is related to lifetime antibiotic use in Finnish pre-school children. Nat Commun 7, 10410. 10.1038/ncomms10410. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Johnson AJ, Vangay P, Al-Ghalith GA, Hillmann BM, Ward TL, Shields-Cutler RR, Kim AD, Shmagel AK, Syed AN, Personalized Microbiome Class S, et al. (2019). Daily Sampling Reveals Personalized Diet-Microbiome Associations in Humans. Cell Host Microbe 25, 789–802 e785. 10.1016/j.chom.2019.05.005. [DOI] [PubMed] [Google Scholar]
- 3.Maier L, Pruteanu M, Kuhn M, Zeller G, Telzerow A, Anderson EE, Brochado AR, Fernandez KC, Dose H, Mori H, et al. (2018). Extensive impact of non-antibiotic drugs on human gut bacteria. Nature 555, 623–628. 10.1038/nature25979. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Vieira-Silva S, Falony G, Belda E, Nielsen T, Aron-Wisnewsky J, Chakaroun R, Forslund SK, Assmann K, Valles-Colomer M, Nguyen TTD, et al. (2020). Statin therapy is associated with lower prevalence of gut microbiota dysbiosis. Nature 581, 310–315. 10.1038/s41586-020-2269-x. [DOI] [PubMed] [Google Scholar]
- 5.Falony G, Joossens M, Vieira-Silva S, Wang J, Darzi Y, Faust K, Kurilshikov A, Bonder MJ, Valles-Colomer M, Vandeputte D, et al. (2016). Population-level analysis of gut microbiome variation. Science 352, 560–564. 10.1126/science.aad3503. [DOI] [PubMed] [Google Scholar]
- 6.Vich Vila A, Collij V, Sanna S, Sinha T, Imhann F, Bourgonje AR, Mujagic Z, Jonkers D, Masclee AAM, Fu J, et al. (2020). Impact of commonly used drugs on the composition and metabolic function of the gut microbiota. Nat Commun 11, 362. 10.1038/s41467-019-14177-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Holler E, Butzhammer P, Schmid K, Hundsrucker C, Koestler J, Peter K, Zhu W, Sporrer D, Hehlgans T, Kreutz M, et al. (2014). Metagenomic analysis of the stool microbiome in patients receiving allogeneic stem cell transplantation: loss of diversity is associated with use of systemic antibiotics and more pronounced in gastrointestinal graft-versus-host disease. Biol Blood Marrow Transplant 20, 640–645. 10.1016/j.bbmt.2014.01.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Peled JU, Gomes ALC, Devlin SM, Littmann ER, Taur Y, Sung AD, Weber D, Hashimoto D, Slingerland AE, Slingerland JB, et al. (2020). Microbiota as Predictor of Mortality in Allogeneic Hematopoietic-Cell Transplantation. N Engl J Med 382, 822–834. 10.1056/NEJMoa1900623. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Stein-Thoeringer CK, Nichols KB, Lazrak A, Docampo MD, Slingerland AE, Slingerland JB, Clurman AG, Armijo G, Gomes ALC, Shono Y, et al. (2019). Lactose drives Enterococcus expansion to promote graft-versus-host disease. Science 366, 1143–1149. 10.1126/science.aax3760. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Taur Y, Xavier JB, Lipuma L, Ubeda C, Goldberg J, Gobourne A, Lee YJ, Dubin KA, Socci ND, Viale A, et al. (2012). Intestinal domination and the risk of bacteremia in patients undergoing allogeneic hematopoietic stem cell transplantation. Clin Infect Dis 55, 905–914. 10.1093/cid/cis580. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Morjaria S, Schluter J, Taylor BP, Littmann ER, Carter RA, Fontana E, Peled JU, van den Brink MRM, Xavier JB, and Taur Y.(2019). Antibiotic-Induced Shifts in Fecal Microbiota Density and Composition during Hematopoietic Stem Cell Transplantation. Infect Immun 87. 10.1128/IAI.00206-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Shouval R, Waters NR, Gomes ALC, Zuanelli Brambilla C, Fei T, Devlin SM, Nguyen CL, Markey KA, Dai A, Slingerland JB, et al. (2022). Conditioning regimens are associated with distinct patterns of microbiota injury in allogeneic hematopoietic cell transplantation. Clin Cancer Res. 10.1158/1078-0432.CCR-22-1254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Stoma I, Littmann ER, Peled JU, Giralt S, van den Brink MRM, Pamer EG, and Taur Y.(2021). Compositional Flux Within the Intestinal Microbiota and Risk for Bloodstream Infection With Gram-negative Bacteria. Clin Infect Dis 73, e4627–e4635. 10.1093/cid/ciaa068. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.DiGiulio DB, Callahan BJ, McMurdie PJ, Costello EK, Lyell DJ, Robaczewska A, Sun CL, Goltsman DS, Wong RJ, Shaw G, et al. (2015). Temporal and spatial variation of the human microbiota during pregnancy. Proc Natl Acad Sci U S A 112, 11060–11065. 10.1073/pnas.1502875112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Stewart CJ, Ajami NJ, O’Brien JL, Hutchinson DS, Smith DP, Wong MC, Ross MC, Lloyd RE, Doddapaneni H, Metcalf GA, et al. (2018). Temporal development of the gut microbiome in early childhood from the TEDDY study. Nature 562, 583–588. 10.1038/s41586-018-0617-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Human Microbiome Project C.(2012). Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214. 10.1038/nature11234. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Costea PI, Hildebrand F, Arumugam M, Backhed F, Blaser MJ, Bushman FD, de Vos WM, Ehrlich SD, Fraser CM, Hattori M, et al. (2018). Enterotypes in the landscape of gut microbial community composition. Nat Microbiol 3, 8–16. 10.1038/s41564-017-0072-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Munoz A, Hayward MR, Bloom SM, Rocafort M, Ngcapu S, Mafunda NA, Xu J, Xulu N, Dong M, Dong KL, et al. (2021). Modeling the temporal dynamics of cervicovaginal microbiota identifies targets that may promote reproductive health. Microbiome 9, 163. 10.1186/s40168-021-01096-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Holmes I, Harris K, and Quince C.(2012). Dirichlet multinomial mixtures: generative models for microbial metagenomics. PLoS One 7, e30126. 10.1371/journal.pone.0030126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, Nielsen T, Pons N, Levenez F, Ya mada T, et al. (2010). A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464, 59–65. 10.1038/nature08821. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Cerdo T, Ruiz A, Acuna I, Nieto-Ruiz A, Dieguez E, Sepulveda-Valbuena N, Escudero-Marin M, Garcia-Santos JA, Garcia-Ricobaraza M, Herrmann F, et al. (2022). A synbiotics, long chain polyunsaturated fatty acids, and milk fat globule membranes supplemented formula modulates microbiota maturation and neurodevelopment. Clinical Nutrition 41, 1697–1711. 10.1016/j.clnu.2022.05.013. [DOI] [PubMed] [Google Scholar]
- 22.Brooks JP, Buck GA, Chen G, Diao L, Edwards DJ, Fettweis JM, Huzurbazar S, Rakitin A, Satten GA, Smirnova E, et al. (2017). Changes in vaginal community state types reflect major shifts in the microbiome. Microb Ecol Health Dis 28, 1303265. 10.1080/16512235.2017.1303265. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Lee KH, Gordon A, Shedden K, Kuan G, Ng S, Balmaseda A, and Foxman B.(2019). The respiratory microbiome and susceptibility to influenza virus infection. PLoS One 14, e0207898. 10.1371/journal.pone.0207898. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Jin B, Liu R, Hao S, Li Z, Zhu C, Zhou X, Chen P, Fu T, Hu Z, Wu Q, et al. (2017). Defining and characterizing the critical transition state prior to the type 2 diabetes disease. PLoS One 12, e0180937. 10.1371/journal.pone.0180937. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Peled JU, Gomes ALC, Devlin SM, Littmann ER, Taur Y, Sung AD, Weber D, Hashimoto D, Slingerland AE, Slingerland JB, et al. (2020). Microbiota as Predictor of Mortality in Allogeneic Hematopoietic-Cell Transplantation. New Engl J Med 382, 822–834. 10.1056/NEJMoa1900623. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Taur Y, Coyte K, Schluter J, Robilotti E, Figueroa C, Gjonbalaj M, Littmann ER, Ling L, Miller L, Gyaltshen Y, et al. (2018). Reconstitution of the gut microbiota of antibiotic-treated patients by autologous fecal microbiota transplant. Sci Transl Med 10. 10.1126/scitranslmed.aap9489. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Soares FS, Amaral FC, Silva NLC, Valente MR, Santos LKR, Ya mashiro LH, Scheffer MC, Castanheira F, Ferreira RG, Gehrke L, et al. (2017). Antibiotic-Induced Pathobiont Dissemination Accelerates Mortality in Severe Experimental Pancreatitis. Front Immunol 8, 1890. 10.3389/fimmu.2017.01890. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Simms-Waldrip TR, Sunkersett G, Coughlin LA, Savani MR, Arana C, Kim J, Kim M, Zhan X, Greenberg DE, Xie Y, et al. (2017). Antibiotic-Induced Depletion of Anti-inflammatory Clostridia Is Associated with the Development of Graft-versus-Host Disease in Pediatric Stem Cell Transplantation Patients. Biol Blood Marrow Transplant 23, 820–829. 10.1016/j.bbmt.2017.02.004. [DOI] [PubMed] [Google Scholar]
- 29.Jenq RR, Taur Y, Devlin SM, Ponce DM, Goldberg JD, Ahr KF, Littmann ER, Ling L, Gobourne AC, Miller LC, et al. (2015). Intestinal Blautia Is Associated with Reduced Death from Graft-versus-Host Disease. Biol Blood Marrow Transplant 21, 1373–1383. 10.1016/j.bbmt.2015.04.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Golob JL, Pergam SA, Srinivasan S, Fiedler TL, Liu C, Garcia K, Mielcarek M, Ko D, Aker S, Marquis S, et al. (2017). Stool Microbiota at Neutrophil Recovery Is Predictive for Severe Acute Graft vs Host Disease After Hematopoietic Cell Transplantation. Clin Infect Dis 65, 1984–1991. 10.1093/cid/cix699. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Shono Y, Docampo MD, Peled JU, Perobelli SM, Velardi E, Tsai JJ, Slingerland AE, Smith OM, Young LF, Gupta J, et al. (2016). Increased GVHD-related mortality with broad-spectrum antibiotic use after allogeneic hematopoietic stem cell transplantation in human patients and mice. Sci Transl Med 8, 339ra371. 10.1126/scitranslmed.aaf2311. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Lee SE, Lim JY, Ryu DB, Kim TW, Park SS, Jeon YW, Yoon JH, Cho BS, Eom KS, Kim YJ, et al. (2019). Alteration of the Intestinal Microbiota by Broad-Spectrum Antibiotic Use Correlates with the Occurrence of Intestinal Graft-versus-Host Disease. Biol Blood Marrow Transplant 25, 1933–1943. 10.1016/j.bbmt.2019.06.001. [DOI] [PubMed] [Google Scholar]
- 33.Pettigrew MM, Gent JF, Kong Y, Halpin AL, Pineles L, Harris AD, and Johnson JK (2019). Gastrointestinal Microbiota Disruption and Risk of Colonization With Carbapenem-resistant Pseudomonas aeruginosa in Intensive Care Unit Patients. Clinical Infectious Diseases 69, 604–613. 10.1093/cid/ciy936. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Tropini C, Moss EL, Merrill BD, Ng KM, Higginbottom SK, Casavant EP, Gonzalez CG, Fremin B, Bouley DM, Elias JE, et al. (2018). Transient Osmotic Perturbation Causes Long-Term Alteration to the Gut Microbiota. Cell 173, 1742–1754 e1717. 10.1016/j.cell.2018.05.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Nalawade TM, Bhat K, and Sogi SH (2015). Bactericidal activity of propylene glycol, glycerine, polyethylene glycol 400, and polyethylene glycol 1000 against selected microorganisms. J Int Soc Prev Community Dent 5, 114–119. 10.4103/2231-0762.155736. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Maier L, Goemans CV, Wirbel J, Kuhn M, Eberl C, Pruteanu M, Muller P, Garcia-Santamarina S, Cacace E, Zhang B, et al. (2021). Unravelling the collateral damage of antibiotics on gut bacteria. Nature 599, 120–124. 10.1038/s41586-021-03986-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Truong DT, Tett A, Pasolli E, Huttenhower C, and Segata N.(2017). Microbial strain-level population structure and genetic diversity from metagenomes. Genome Res 27, 626–638. 10.1101/gr.216242.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Dubin KA, Mathur D, McKenney PT, Taylor BP, Littmann ER, Peled JU, van den Brink MRM, Taur Y, Pamer EG, and Xavier JB (2019). Diversification and Evolution of Vancomycin-Resistant Enterococcus faecium during Intestinal Domination. Infect Immun 87. 10.1128/IAI.00102-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Bendall ML, Stevens SL, Chan LK, Malfatti S, Schwientek P, Tremblay J, Schackwitz W, Martin J, Pati A, Bushnell B, et al. (2016). Genome-wide selective sweeps and gene-specific sweeps in natural bacterial populations. ISME J 10, 1589–1601. 10.1038/ismej.2015.241. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Diaz Caballero J, Clark ST, Coburn B, Zhang Y, Wang PW, Donaldson SL, Tullis DE, Yau YC, Waters VJ, Hwang DM, and Guttman DS (2015). Selective Sweeps and Parallel Pathoadaptation Drive Pseudomonas aeruginosa Evolution in the Cystic Fibrosis Lung. mBio 6, e00981–00915. 10.1128/mBio.00981-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Ghalayini M, Launay A, Bridier-Nahmias A, Clermont O, Denamur E, Lescat M, and Tenaillon O.(2018). Evolution of a Dominant Natural Isolate of Escherichia coli in the Human Gut over the Course of a Year Suggests a Neutral Evolution with Reduced Effective Population Size. Appl Environ Microbiol 84. 10.1128/AEM.02377-17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Vatanen T, Franzosa EA, Schwager R, Tripathi S, Arthur TD, Vehik K, Lernmark A, Hagopian WA, Rewers MJ, She JX, et al. (2018). The human gut microbiome in early-onset type 1 diabetes from the TEDDY study. Nature 562, 589–594. 10.1038/s41586-018-0620-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Kanjilal S, Oberst M, Boominathan S, Zhou H, Hooper DC, and Sontag D.(2020). A decision algorithm to promote outpatient antimicrobial stewardship for uncomplicated urinary tract infection. Sci Transl Med 12. 10.1126/scitranslmed.aay5067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Vermeulen R, Schymanski EL, Barabasi AL, and Miller GW (2020). The exposome and health: Where chemistry meets biology. Science 367, 392–396. 10.1126/science.aay3164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Yeh PJ, Hegreness MJ, Aiden AP, and Kishony R.(2009). Drug interactions and the evolution of antibiotic resistance. Nat Rev Microbiol 7, 460–466. 10.1038/nrmicro2133. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Spencer CN, McQuade JL, Gopalakrishnan V, McCulloch JA, Vetizou M, Cogdill AP, Khan MAW, Zhang XT, White MG, Peterson CB, et al. (2021). Dietary fiber and probiotics influence the gut microbiome and melanoma immunotherapy response. Science 374, 1632-+. 10.1126/science.aaz7015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Gacesa R, Kurilshikov A, Vich Vila A, Sinha T, Klaassen MAY, Bolte LA, Andreu-Sanchez S, Chen L, Collij V, Hu S, et al. (2022). Environmental factors shaping the gut microbiome in a Dutch population. Nature 604, 732–739. 10.1038/s41586-022-04567-7. [DOI] [PubMed] [Google Scholar]
- 48.Wu GD, Chen J, Hoffmann C, Bittinger K, Chen YY, Keilbaugh SA, Bewtra M, Knights D, Walters WA, Knight R, et al. (2011). Linking long-term dietary patterns with gut microbial enterotypes. Science 334, 105–108. 10.1126/science.1208344. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Liao C, Taylor BP, Ceccarani C, Fontana E, Amoretti LA, Wright RJ, Gomes ALC, Peled JU, Taur Y, Perales MA, et al. (2021). Compilation of longitudinal microbiota data and hospitalome from hematopoietic cell transplantation patients. Sci Data 8, 71. 10.1038/s41597-021-00860-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Rolling T, Zhai B, Gjonbalaj M, Tosini N, Yasuma-Mitobe K, Fontana E, Amoretti LA, Wright RJ, Ponce DM, Perales MA, et al. (2021). Haematopoietic cell transplantation outcomes are linked to intestinal mycobiota dynamics and an expansion of Candida parapsilosis complex species. Nat Microbiol 6, 1505–1515. 10.1038/s41564-021-00989-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Caporaso JG, Lauber CL, Walters WA, Berg-Lyons D, Huntley J, Fierer N, Owens SM, Betley J, Fraser L, Bauer M, et al. (2012). Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms. ISME J 6, 1621–1624. 10.1038/ismej.2012.8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJ, and Holmes SP (2016). DADA2: High-resolution sample inference from Illumina amplicon data. Nat Methods 13, 581–583. 10.1038/nmeth.3869. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Pena AG, Goodrich JK, Gordon JI, et al. (2010). QIIME allows analysis of high-throughput community sequencing data. Nat Methods 7, 335–336. 10.1038/nmeth.f.303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Beghini F, McIver LJ, Blanco-Miguez A, Dubois L, Asnicar F, Maharjan S, Mailyan A, Manghi P, Scholz M, Thomas AM, et al. (2021). Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3. Elife 10. 10.7554/eLife.65088. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Figure S1. Clusters of intestinal microbiota during allo-HCT were identified by other unsupervised clustering methods, related to Figure 1. a, d, Compositional space of the intestinal microbiota visualized by tSNE projection in the MSKCC discovery cohort. Each dot represents a sample, colored according to clusters assigned by (a) hierarchical clustering or (d) Dirichlet Multinomial Mixture (DMM) model. b, e, Alpha-diversity per cluster identified by (b) hierarchical clustering or (e) DMM. The horizontal dashed line represents the median alpha-diversity of samples in the MSKCC discovery cohort. c, f, Compositional characteristics per cluster identified by (c) hierarchical clustering or (f) DMM. Samples in which the relative abundance of the most common taxon is ≥ 30% are color-coded by the most common taxon. Non-dominated samples are colored in white. (p: phylum; f: family; o: order; g: genus). g, Distribution of samples across 10 clusters identified by three unsupervised clustering methods. h, Optimal number of k-means clusters was estimated from the curve of within-cluster sum of square distances from each point to its cluster centroid. i, Optimal number of clusters identified by DMM was estimated by the smallest Laplace approximation metric. j, Correlation between cluster stability and alpha-diversity. Cluster stability was measured by self-transition probability. Alpha-diversity is defined as the cluster median reciprocal Simpson diversity. k, Co-exclusionary and inclusionary relationships associated with the stability of Enterococcus-high cluster 10.
Figure S2. Time courses of drug exposures between day −14 to 14 relative to allo-HCT, related to Figure 3. Red dashed lines indicate day 0 which is the day of stem cell infusion.
Figure S3. Associations between drug exposures and cluster self and attractor transitions, related to Figure 3. Self coefficients indicate whether drug exposure increases (positive coefficients, red shades) or decreases (negative coefficients, blue shades) the log-odds of cluster stability. Attractor coefficients indicate whether drug exposure increases (positive coefficients, red shades) or decreases (negative coefficients, blue shades) the log-odds of transition to a given cluster. #pts indicates the number of patients exposed to each drug, #dps indicates the number of sample pairs collected on the day of each drug exposure.
Figure S4. Bacteria response scores for four microbiome features of interest, related to Figure 3. Bacteria response scores predict the association between a given drug exposure and changes in genus relative abundance or alpha-diversity. Positive response scores (red shades) indicate that drug exposures are associated with increased genus relative abundance or alpha-diversity. Negative response scores (blue shades) indicate that drug exposures are associated with decreased genus relative abundance or alpha-diversity.
Figure S5. Associations between drug exposures and changes in species relative abundance from samples with shotgun metagenomic sequencing, related to Figure 4. Log changes in species relative abundance between subsequently collected samples were analyzed as a function of individual drug exposure in a linear mixed-effects regression model, with time of sample collection binned into weekly intervals as a random effect variable. Positive coefficient values (red shades) indicate that drug exposures are associated with increased species relative abundance. Negative coefficient values (blue shades) indicate that drug exposures are associated with decreased species relative abundance. A white box indicates that there is insufficient datapoint to fit the regression model for a given drug-species pair.
Figure S6. Investigation of causal relationship between drug exposures, microbiome and mortality in all patient cohorts, related to Figure 5. a, Pearson’s correlation between patient-specific bacteria response scores and observed genus relative abundance or alpha-diversity in samples collected between day 14 and 45 relative to HCT in the MSKCC discovery cohort (454 patients, including recipients of PBSC T-cell depleted graft). b, Prediction of overall and cause-specific mortality in the MSKCC discovery cohort (295 patients, excluding recipients of PBSC T-cell depleted graft) based on patient-specific bacteria response scores for four microbiome features of interest, in each respective multivariate Cox proportional hazard or Fine-Gray model, controlled for age, sex, conditioning intensity, graft source and underlying disease. Adjusted p-values by Benjamini-Hochberg’s correction. c, Microbiome feature metrics (taxa relative abundance or alpha-diversity) and corresponding response scores were compared in terms of their associations with overall mortality risk. Measurements were fit into either an independent (which consider either microbiome metrics and response scores) or a competing (which considers microbiome metrics and response scores together) multivariate Cox proportional hazard model.
Figure S7. PARADIGM was validated against simulated data and different regression models, related to STAR Methods. a, PARADIGM was validated against simulated data using Pearson’s correlation. This single-day resolution simulated dataset was input into the elastic net regularized regression to determine how well the model can recover the pre-defined self and attractor coefficients utilized to simulate cluster states. b, PARADIGM was validated against a separate elastic net regularized regression using drug exposures and time of sample collection as predictors of changes in a single genus relative abundance, using the same training dataset. c, Single taxon-based regularized linear regression approach poorly predicted observed changes in Enterococcus relative abundance in samples collected between day 14 and 45.
Table S1. Cluster transition frequencies in consecutive samples collected within a 7-day time window in the MSKCC cohort, related to Figure 2h. The rows indicate the cluster assignment of the first sample in a pair of consecutive samples. The columns indicate the cluster assignment of the second sample in a pair of consecutive samples.
Table S2. Patient characteristics of the MSKCC sub-cohort included in PARADIGM training set of daily collected fecal samples, related to Figure 3. IQR, interquartile range; sd, standard deviation; AML, acute myeloid leukemia; BM, bone marrow; PBSC, peripheral blood stem cell; PBSC T-cell depletion was performed by ex vivo CD34-selection of the graft.
Table S3. Quality control of 16S rRNA and shotgun metagenomic sequencing data, related to STAR Methods.
Data Availability Statement
16S rRNA and shotgun metagenomic sequencing data have been deposited at the NCBI database and are publicly available as of the date of publication. De-identified patient data necessary to reproduce the results in this manuscript and NCBI SRA accession numbers have been deposited on Figshare and is publicly available as of the date of publication. The DOI is listed in the key resources table. All original code has been deposited at Github and is publicly available as of the date of publication. The DOI is listed in the key resources table. Scripts and data are provided to facilitate reproducibility of all the analyses presented in this manuscript, except for the analysis of clinical outcomes. Patient-level clinical outcomes for MSKCC and Duke cohorts are available via data sharing agreement per institutional policies. Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.
Key resources table
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Biological samples | ||
| Human Fecal Samples | This study | N/A |
| Chemicals, peptides, and recombinant proteins | ||
| Phenol:Chloroform:Isoamyl Alcohol (25:24:1) | Fisher Scientific | Cat#: AC327111000 |
| TRIS-EDTA 1X, pH 8.0, biotech grade | Fisher Scientific | Cat#: BP2473500 |
| 20% SDS Solution, molecular grade | Fisher Scientific | Cat#: BP1311–1 |
| 3M Sodium Acetate, pH 5.2, sterile, Teknova | Fisher Scientific | Cat#: NC0496686 |
| Biospec 0.1mm zirconia-silica beads | Fisher Scientific | Cat#: NC0362415 |
| TE Buffer with 0.1mM EDTA, pH 7.6, sterile | Fisher Scientific | Cat#: NC9491715 |
| Ethanol, 200 proof, 0.125gal | Fisher Scientific | Cat#: 04–355-222 |
| RNase A, 100mg; from bovine pancreas, Roche | Fisher Scientific | Cat#: 50–100-3354 |
| Water, Ultrapure, distilled | Fisher Scientific | Cat#: 10–977-023 |
| Sodium Chloride 1M, Teknova | Fisher Scientific | Cat#: NC1326185 |
| Critical commercial assays | ||
| QIAamp DNA mini kit | Qiagen | Cat#: 51306 |
| Qiaquick PCR Purification Kit | Qiagen | Cat#: 28106 |
| MinElute PCR Purification Kit | Qiagen | Cat#: 28006 |
| Collection tubes, 2 mL | Qiagen | Cat#: 19201 |
| Platinum Taq DNA polymerase, w/buffers | Fisher Scientific | Cat#: 10–966-083 |
| dNTP mix 10mM | Fisher Scientific | Cat#: 18–427-088 |
| Qubit dsDNA broad-range assay kit | Fisher Scientific | Cat#: Q32853 |
| Tapestation D1000 ScreenTape, 112rx | Agilent | Cat#: 5067–5582 |
| Tapestation D1000 Reagents, 112rx | Agilent | Cat#: 5067–5583 |
| Tapestation 96-well plate foil seal | Agilent | Cat#: 5067–5154 |
| Tapestation 96-well sample plate | Agilent | Cat#: 5067–5150 |
| Tapestation Loading Tips | Agilent | Cat#: 5067–5153 |
| Tapestation Optical tube strip caps | Agilent | Cat#: 401425 |
| Tapestation Optical tube strips | Agilent | Cat#: 401428 |
| Tapestation High-Sensitivity D1000 Screen Tape, 112rx | Agilent | Cat#: 5067–5584 |
| Tapestation High-Sensitivity D1000 Reagents, 112rx | Agilent | Cat#: 5067–5585 |
| Agilent Technologies 4200 Tapestation | Agilent | Cat#: G2991BA |
| KAPA Hyper Prep | Roche Diagnostics | Cat#: 07962363001 |
| AMpure Beads XP | Beckman Coulter | Cat#: A63881 |
| MiSeq Reagent Kit v2 (500-cycles) | Illumina | Cat#: MS-102–2003 |
| Deposited data | ||
| Stool metagenomics (16S rRNA and shotgun) sequencing | This study | DOI: 10.6084/m9.figshare.21657806.v1 |
| Oligonucleotides | ||
| 16S rRNA gene sequencing forward primer: (563F): 5′-nnnnnnnn-NNNNNNNNNNNN-AYTGGGYDTAAAGNG-3’ | Integrated DNA Technologies | https://earthmicrobiome.org |
| 16S rRNA gene sequencing reverse primer: (926Rb): 5′-nnnnnnnn-NNNNNNNNNNNN-CCGTCAATTYHTTTRAGT-3 | Integrated DNA Technologies | https://earthmicrobiome.org |
| Software and algorithms | ||
| PARADIGM | This study | DOI: 10.5281/zenodo.7818943 |
| R (version 4.1.0) | R Foundation | https://www.r-project.org |
| DADA2 (version 1.16.0) | Callahan et al., 2016 | https://benjjneb.github.io/dada2/index.html |
| BBMap (version 38.97) | N/A | https://www.sourceforge.net/projects/bbmap/ |
| KneadData (version 0.7.5) | The Huttenhower Lab | https://huttenhower.sph.harvard.edu/kneaddata |
| HUMAnN 3.0 (version 3.0.1) | Beghini et al., 2021 | https://huttenhower.sph.harvard.edu/humann/ |
| MetaPhlan 3.0 (version 3.1.0) | Beghini et al., 2021 | https://huttenhower.sph.harvard.edu/metaphlan/ |
| StrainPhlAn 3.0 | Truong et al., 2017 | http://segatalab.cibio.unitn.it/tools/strainphlan/ |




