In many microbiome studies, the necessity to store samples at room temperature (i.e., remote fieldwork) and the ability to ship samples without hazardous materials that require special handling training, such as ethanol (i.e., citizen science efforts), is paramount. However, although room-temperature storage for a few days has been shown not to obscure physiologically relevant microbiome differences between comparison groups, there are still changes in specific bacterial taxa, notably, in members of the class Gammaproteobacteria, that can make microbiome profiles difficult to interpret. Here we identify the most problematic taxa and show that removing sequences from just a few fast-growing taxa is sufficient to correct microbiome profiles.
KEYWORDS: 16S rRNA, DNA sequencing, bioinformatics
ABSTRACT
The use of sterile swabs is a convenient and common way to collect microbiome samples, and many studies have shown that the effects of room-temperature storage are smaller than physiologically relevant differences between subjects. However, several bacterial taxa, notably members of the class Gammaproteobacteria, grow at room temperature, sometimes confusing microbiome results, particularly when stability is assumed. Although comparative benchmarking has shown that several preservation methods, including the use of 95% ethanol, fecal occult blood test (FOBT) and FTA cards, and Omnigene-GUT kits, reduce changes in taxon abundance during room-temperature storage, these techniques all have drawbacks and cannot be applied retrospectively to samples that have already been collected. Here we performed a meta-analysis using several different microbiome sample storage condition studies, showing consistent trends in which specific bacteria grew (i.e., “bloomed”) at room temperature, and introduce a procedure for removing the sequences that most distort analyses. In contrast to similarity-based clustering using operational taxonomic units (OTUs), we use a new technique called “Deblur” to identify the exact sequences corresponding to blooming taxa, greatly reducing false positives and also dramatically decreasing runtime. We show that applying this technique to samples collected for the American Gut Project (AGP), for which participants simply mail samples back without the use of ice packs or other preservatives, yields results consistent with published microbiome studies performed with frozen or otherwise preserved samples.
IMPORTANCE In many microbiome studies, the necessity to store samples at room temperature (i.e., remote fieldwork) and the ability to ship samples without hazardous materials that require special handling training, such as ethanol (i.e., citizen science efforts), is paramount. However, although room-temperature storage for a few days has been shown not to obscure physiologically relevant microbiome differences between comparison groups, there are still changes in specific bacterial taxa, notably, in members of the class Gammaproteobacteria, that can make microbiome profiles difficult to interpret. Here we identify the most problematic taxa and show that removing sequences from just a few fast-growing taxa is sufficient to correct microbiome profiles.
OBSERVATION
The use of sterile swabs is a convenient way to collect samples for microbiome studies, but in some cases, it is not feasible to immediately freeze or utilize a preservative. For example, the American Gut Project (AGP; Qiita study identifier [ID] 10317) allows members of the general public to send samples for 16S rRNA gene amplicon sequencing through domestic post without a preservative. This is because proven preservation methods can be cumbersome, dangerous, expensive, or sample type specific, complicating participation in microbiome citizen science. Although some studies have demonstrated that the effects of room-temperature storage are secondary to physiologically relevant differences between comparison groups (1–3), certain bacterial taxa, particularly those in the class Gammaproteobacteria, grow well at room temperature. This is problematic, as some Gammaproteobacteria species have been associated with disease, such as inflammatory bowel disease (IBD) (4). Therefore, to identify meaningful patterns in microbiome studies that do not utilize sample preservation, it is crucial to remove at high specificity the taxa that thrive at room temperature (i.e., “blooming” bacteria).
Here we performed a meta-analysis that combined fecal samples from storage experiments (low sample numbers but easily interpretable results) with bulk sample statistics from projects comparing room-temperature shipping to immediate freezing, identifying exact sequences corresponding to blooms by applying Deblur (5) to the data sets. We assessed whether any sequences are enriched more than expected in room temperature samples, producing a list of candidate sub-operational taxonomic units (sOTUs) or exact sequences that appear to increase in frequency at room temperature. We then filtered these exact sequences from the AGP data set, restoring a biological association that was obscured by the blooms. We further validate the procedure by confirming that filtered data sets more closely resemble those from immediately frozen samples and by showing that the overall microbiome profiles better match the results of other published human microbiome studies.
To identify the candidate blooming bacteria, we first examined the effect of room-temperature storage on fecal microbiome samples. Using two recent storage studies (1, 2), we showed that taxonomic abundance changes over time in nonfrozen fecal samples compared to frozen samples are mainly due to a small number of taxa (Fig. 1A to D). The taxa that contributed disproportionately are primarily members of the class Gammaproteobacteria, which is unsurprising given that many members of this class are easily cultivable, fast-growing and are commonly isolated from human stool. Unfortunately, these storage studies examined samples from a small number of individuals, and therefore it is possible that additional bacterial taxa bloom in samples shipped via domestic post that by chance were not present in these controlled studies. To address this limitation, we compared all AGP fecal samples (~7,000 samples) to data from 3 studies comprised of fecal samples immediately frozen (fresh-frozen) after collection (6, 7; Personal Genome Project [PGP; unpublished data, Qiita study ID 1189]). Importantly, because each study represented a different population, it is likely that sOTUs were present at different frequencies across these studies. Nevertheless, blooming bacteria are expected to be at a higher frequency in AGP samples than in all of the fresh-frozen samples.
Using reasonable thresholds for relative abundance changes in the storage studies and in the AGP compared to fresh-frozen studies, we identified 20 bacterial sOTUs as candidates for blooming during shipping (Fig. 1E; see Table S1 in the supplemental material) using the following criteria: a fold increase of 2 or more in the room-temperature storage studies (1, 2) and AGP relative to fresh-frozen fecal samples from studies (6, 7; PGP) and a fold increase of 50 or more within the storage studies only or not observed in the storage studies but with at least a 2-fold change in AGP compared to the fresh-frozen studies. The results appear insensitive to these specific thresholds, as we found that removal of a subset of 10 of the identified candidate blooms from the AGP cohort was sufficient to restore a well-characterized age correlation with alpha diversity (Fig. 2E and F) and was sufficient for a significant decrease in the distances to fresh-frozen samples (see Fig. S1 in the supplemental material).
To mitigate the effect of these blooming bacteria on subsequent microbiome analyses, we removed exact sequence matches to identified blooms from 35,146 unique sOTUs identified by applying Deblur (5) to 10,189 samples spanning 338,496,967 sequences from the AGP data set. Each of the 20 blooms had an exact match to one of the unique Deblur sOTUs, and a total of 32,696,826 reads were removed (per-sample dropped sequences spanning 0.4%, 13.1%, and 45.3% for the 25th, 50th, and 75th percentiles, respectively). Importantly, some of the removed sequences were likely “real”; for example, Escherichia coli and Citrobacter sequences present in the candidate blooming list were present at nonnegligible frequencies in fresh-frozen samples. However, these sequences were included for removal as their tendency to grow during shipment can greatly impact the relative abundances of other organisms due to the compositional nature of the data.
Without filtering candidate blooms, there were notable differences (as observed using Bray-Curtis principal-coordinate analysis [PCoA]) between AGP fecal samples and the fresh-frozen fecal samples; filtering the bloom sequences from all samples removed these differences (Fig. 2A versus B). In the PCoA space corresponding to the data determined without filtering, the primary separation is explained by the presence of a large percentage of bloom sequences (Fig. 2A); the sizes of the spheres are scaled by the percentage of bloom sequences in the respective sample. Following the removal of the blooms, this dominant effect was abolished and samples with high levels of blooms clustered with samples from the other studies (Fig. 2B). Similar results were observed in assessing class-level taxonomy abundances (Fig. 2 versus D): prior to filtering, a high relative abundance of Gammaproteobacteria (27%) was present in the AGP samples compared to the fresh-frozen samples (1.5% to 3.5%), while the AGP profile seen after filtering more closely resembled that of the fresh-frozen samples. Importantly, applying the filter minimally changed the taxonomic profiles of fresh-frozen samples (Fig. 2D). The filtering procedure is available in a Jupyter Notebook (8) at https://github.com/knightlab-analyses/bloom-analyses.
There is a balance between type 1 and type 2 errors that must be considered in applying this filter. The cost of removing a sequence is that it becomes “invisible” in the analysis, and it is possible that real sequences are lost. Conversely, retaining a bloom sequence increases noise caused by shipment conditions, which can artificially alter biological conclusions. Therefore, a balance between loss of data and inaccurate, noisy data must be obtained. To select an appropriate number of blooming bacterial sequences to subtract from the AGP data set to maximize the amount of data retained while reducing inaccuracies caused by blooms, we tested the effect of nested filtering levels on the ability to detect the well-known effect of age on alpha diversity (9, 10). As can be seen in Fig. 2E and F, this effect was undetected by a Kruskal-Wallis test when none of the candidate blooms were removed. However, filtering the top four candidate blooms restored the ability to detect a significant difference in diversity by age. Critically, the identification of the bloom sOTUs was done independently of this positive control. For analysis of the AGP cohort, we recommend removal of the sequences of the top 10 candidate blooming bacterial taxa, as this maximizes the expected age effect (Fig. 2E). Different studies may want to remove a different subset of bloom sequences, as retaining some of these sequences might be critical, depending on the study characteristics. With meta-analysis, if this filter is applied, it must be applied identically to all samples represented to avoid introduction of a systematic bias.
Given that most bacteria change in relative abundance relatively little, filtering for blooms removes an important confounding variable and facilitates meta-analysis of projects that have used different storage procedures. We recommend this procedure to facilitate analysis of data produced from fecal studies without the means to immediately freeze or preserve samples such as citizen science efforts or remote fieldwork where it may be impossible to preserve samples immediately. Additionally, these data suggest that further control studies should be performed to allow the evaluation of candidate blooms and their impacts in nonfecal environments.
ACKNOWLEDGMENTS
We thank Evguenia Kopylova for help with the analysis pipeline.
This work was supported by the Keck Foundation (grant DT061413), the Templeton Foundation (grant 44000), and the National Science Foundation (grant DGE-1144086).
REFERENCES
- 1.Song SJ, Amir A, Metcalf JL, Amato KR, Xu ZZ, Humphrey G, Knight R. 2016. Preservation methods differ in fecal microbiome stability, affecting suitability for field studies. mSystems 1:e00021-16. doi: 10.1128/mSystems.00021-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Sinha R, Chen J, Amir A, Vogtmann E, Shi J, Inman KS, Flores R, Sampson J, Knight R, Chia N. 2016. Collecting fecal samples for microbiome analyses in epidemiology studies. Cancer Epidemiol Biomarkers Prev 25:407–416. doi: 10.1158/1055-9965.EPI-15-0951. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Lauber CL, Zhou N, Gordon JI, Knight R, Fierer N. 2010. Effect of storage conditions on the assessment of bacterial community structure in soil and human-associated samples. FEMS Microbiol Lett 307:80–86. doi: 10.1111/j.1574-6968.2010.01965.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Gevers D, Kugathasan S, Denson LA, Vázquez-Baeza Y, Van Treuren W, Ren B, Schwager E, Knights D, Song SJ, Yassour M, Morgan XC, Kostic AD, Luo C, González A, McDonald D, Haberman Y, Walters T, Baker S, Rosh J, Stephens M, Heyman M, Markowitz J, Baldassano R, Griffiths A, Sylvester F, Mack D, Kim S, Crandall W, Hyams J, Huttenhower C, Knight R, Xavier RJ. 2014. The treatment-naive microbiome in new-onset Crohn’s disease. Cell Host Microbe 15:382–392. doi: 10.1016/j.chom.2014.02.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Amir A, McDonald D, Navas-Molina JA, Kopylova E, Morton JT, Zech Xu Z, Kightley EP, Thompson LR, Hyde ER, Gonzalez A, Knight R. 2017. Deblur rapidly resolves single-nucleotide community sequence patterns. mSystems 2:e00191-16. doi: 10.1128/mSystems.00191-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Vitaglione P, Mennella I, Ferracane R, Rivellese AA, Giacco R, Ercolini D, Gibbons SM, La Storia A, Gilbert JA, Jonnalagadda S, Thielecke F, Gallo MA, Scalfi L, Fogliano V. 2015. Whole-grain wheat consumption reduces inflammation in a randomized controlled trial on overweight and obese subjects with unhealthy dietary and lifestyle behaviors: role of polyphenols bound to cereal dietary fiber. Am J Clin Nutr 101:251–261. doi: 10.3945/ajcn.114.088120. [DOI] [PubMed] [Google Scholar]
- 7.Goodrich JK, Waters JL, Poole AC, Sutter JL, Koren O, Blekhman R, Beaumont M, Van Treuren W, Knight R, Bell JT, Spector TD, Clark AG, Ley RE. 2014. Human genetics shape the gut microbiome. Cell 159:789–799. doi: 10.1016/j.cell.2014.09.053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Pérez F, Granger BE. 2007. IPython: A system for interactive scientific computing. Comput Sci Eng 9:21–29. doi: 10.1109/MCSE.2007.53. [DOI] [Google Scholar]
- 9.Yatsunenko T, Rey FE, Manary MJ, Trehan I, Dominguez-Bello MG, Contreras M, Magris M, Hidalgo G, Baldassano RN, Anokhin AP, Heath AC, Warner B, Reeder J, Kuczynski J, Caporaso JG, Lozupone CA, Lauber C, Clemente JC, Knights D, Knight R, Gordon JI. 2012. Human gut microbiome viewed across age and geography. Nature 486:222–227. doi: 10.1038/nature11053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Koenig JE, Spor A, Scalfone N, Fricker AD, Stombaugh J, Knight R, Angenent LT, Ley RE. 2011. Succession of microbial consortia in the developing infant gut microbiome. Proc Natl Acad Sci U S A 108(Suppl 1):4578–4585. doi: 10.1073/pnas.1000081107. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.