Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Oct 26.
Published in final edited form as: Gastroenterology. 2020 Sep 7;159(6):2019–2024. doi: 10.1053/j.gastro.2020.09.002

How to improve a human gut microbiome study: Pathway to success from lessons learned

Jun Miyoshi 1,2, Mrinalini C Rao 1, Eugene B Chang 1,*
PMCID: PMC8546501  NIHMSID: NIHMS1745643  PMID: 33181127

Introduction: Investigations of the human gut microbiome

The rapid evolution of cultivation-independent, multi-’omic technologies, has increased our appreciation of the complexity, diversity and vital roles of the human gut microbiome in health and disease taking it from the realm of dark mystery to one of enlightened fascination. We now know that in health, the gut microbiome is important for the proper functioning of both digestive and extraintestinal organ systems. It has also become apparent that perturbations of the gut that disrupt critical interactions between host and gut microbes, i.e. gut dysbiosis, can cause or promote disease. These revelations became possible as new technologies to study complex microbial communities were developed. One of the first was based on the recognition that RNA of the 16S subunit of the prokaryote ribosome found in all bacteria, contained conserved and variable regions. By sequencing the variable 16S ribosomal RNA (rRNA), it became possible to phylogenetically classify microbial populations and define membership. While this approach is highly informative and remains the standard for most human-based microbiome studies, it provides no direct information of microbial community functions, mediators, and their impact on the host. These types of insight are needed to get beyond correlative description and to achieve a better mechanistic understanding of host-microbe interactions. Even new technologies can be limiting if careful thought to study design, sampling processes, and methodologies are not taken into consideration. The challenges for elucidating human-gut microbiome interactions are many and include inherent nuances of clinical research, suboptimal approaches for tissue/microbiota sampling, and limitations with both cultivation-dependent and -independent technologies. Thus, the human gut microbiome is an open frontier with many unknowns and few precedents, awaiting new explorations that will provide a better understanding of human health and disease and lead to innovative solutions for advancing precision and outcomes in clinical practice.

Recognizing and overcoming challenges to studying the human gut microbiome; Designing the study

We start with the caveat that humans and the gut microbiome are inherently difficult to study. While cross-sectional observational studies are useful in generating testable hypotheses, they have greater value if associated metadata are collected and integrated. The first critical step is to pose a scientific and clinical question that addresses gaps in knowledge, unmet needs, therapeutics and/or best practices. The second step is to develop a study design to address these research questions that anticipates potential limitations and builds in alternative strategies that can be implemented in a timely manner. Such diligence prior to the onset of a study is especially vital to the success of any investigation involving humans. Human studies cannot be readily repeated and improperly planned studies run the risk of being inconclusive. Several crucial factors need to be considered when designing a human gut microbiome study (Table 1).

  1. The enormous variations in gut microbiota among “healthy” individuals1, 2, make it difficult to predict effect sizes. Cross-sectional studies generally require very large subject and sample sizes sufficient to power the analysis2. Depending on the questions being asked, such power calculations can be challenging, even for experienced biostatisticians. This type of studies are generally associative; however they could provide mechanistic insights when coupled with approaches where experimental parameters are carefully controlled or where study groups are more homogeneous. On the other hand, small studies where subjects serve as their own controls and involve prospective or longitudinal study design, may be as, and often more, informative than large cross-sectional population studies (Figure 1). In this regard, time sequence sampling matched to well curated clinical metadata can provide great insights and lead to potential cause-effect relationships.

  2. Various confounding factors affect the gut microbiome including age3, 4, body mass index5, 6, diet5, 79, genetic background10, 11, sex5, 6, circadian rhythmicity12, disease activity13, medications14 and geographic environment15. Although these factors are difficult to control for technical and/or ethical reasons, not controlling them can lead to artifacts.

  3. While easy to sample, human fecal samples poorly represent microbiota of the upper and even lower gastrointestinal tract16. The microbiota composition varies down the cephalocaudal axis and in the mucus vs. luminal layers. Moreover, colonic transit time takes hours, during which the microbiome membership and functions undergoe change in membership and function. Correlating temporal relationships to other events (meal time, sleep/wake cycles, blood glucose, activity, etc) becomes problematic. Therefore, it is important to a priori establish if fecal sample analyses will adequately address the proposed research question.

  4. Bioinformatic analytical approaches vary in their advantages and limitations. For example, 16S rRNA-based taxonomical profiling cannot be used to assess function of complex microbial communities. Metagenomic shotgun sequencing provides a plethora of data on the community functions, but is costly, often limited by biomass, subject to human DNA contamination, and labor intensive to analyze. Investigators need to carefully consider at the design stage, which analytical tools are most appropriate, feasible, and informative to answer their research questions.

  5. The common practice of grouping subjects into general categories (e.g. inflammatory bowel diseases (IBD), body weight, age group) to increase study population size for greater statistical power is tempting but can be counterproductive by obscurring correlations that may exist only among more defined subsets of that population. An important consideration is where the study populations can be stratified into more homogeneous groups. As an example, Crohn’s Disease (CD) patients with ileal involvement represent a more well-defined subset than the broader descriptor of CD, which potentially includes patients with many types of disorders and genotypes sharing similar clinical features.

Table 1.

Challenges in the human gut microbiome study and how to address them

Challenges How to address
There are enormous interindividual variations in the gut microbiome Employ the longitudinal study design where subjects serve as their own controls
Conduct time sequence sampling matched to well-curated clinical metadata
Various confounding factors affect the gut microbiome Control possible confounding factors among subjects
Consider technical and/or ethical limitations
Fecal samples poorly represent microbiota of the upper and even lower gastrointestinal tract Collect adequate samples to answer the research question
Consider technical and/or ethical limitations
Bioinformatic analytical methodologies have their advantages and limitations Understand the pros and cons of each methodology
Employ most appropriate, feasible, and informative approaches to answer the research question
Grouping subjects into general categories can obscure characteristics of a more particular population Stratify subjects into more homogenous groups or more defined subsets

Figure 1.

Figure 1.

Study designs of a cross-sectional study and a longitudinal study (A) In a cross-sectional study, subjects are grouped (e.g with/without an intervention) and samples obtained from all subjects are compared between the groups. (B) In a longitudinal study, samples are obtained at multiple time points from each subject (e.g. before/during/after an intervention). Samples at the outset works as their own controls.

An example of approach these challenges is a prospective study of 17 ulcerative colitis (UC) subjects who underwent total colectomy with ileal pouch-anal anastomosis. The study started with all subjects being “disease-free” and off medications17, 18. They were followed over a two-year period with sequential endoscopic sampling of luminal and mucosa-associate pouch and “pre-pouch” (the region proximal to the ileal pouch) microbiota. These endoscopically obtained samples were subjected to 16s rRNA gene and metagenomic sequencing analyses and compared to time-matched mucosal gene profiles from the same patient. All data were analyzed in the context of patient metadata including clinical course and outcomes with each subject serving as their own control. As predicted19, nearly half of the subjects developed pouchitis, providing a large effect size and enhancing the likelihood of establishing correlation between host and microbe. A high-resolution platform for metagenomic analysis and visualization20, revealed specific genomic clusters involved in the biosynthesis of capsular polysaccharides that differentiated mucosa-associated pouch microbes from their nearly isogenic luminal counterparts. Capsular polysaccharides have diverse physiological functions in cross-talks between microbes and the environment. This finding raised the possibility that commensal microbial strains or populations can transform to become more fit to a specific environment leading to virulence and shift the balance of host-microbe interactions to trigger active disease17. In addition, an anomalous genetic programmed response was identified in all patients with ileal pouch. The gene expression changes were not associated with triggering pouchitis, i.e. they were not sufficient alone, but likely rendered these patients at risk18. Our finding that the mucosa of ileal pouch in UC patients came to exhibit a colon-like gene expression pattern was vetted by reanalysis of published datasets from a cross-sectional study by Morgan et al.21 with a much larger patient cohort of UC and FAP pouch patients. This alteration was not observed in ileal pouch mucosa of FAP patients. The genetic response pattern that we observed in our UC cohort probably extends to other forms of IBD since Weiser et al.22 observed a nearly identical shifts of mucosal gene expression profile in a subset of CD patients. These studies are still ongoing to confirm the findings in a validation cohort and, equally important to define the underlying basis for disease susceptibility and to identify, cultivate, and functionally test suspected pathobionts triggering UC pouchitis.

Whenever possible, clinical observations should be followed by experimental approaches that can test causality and define underlying mechanisms. Finally, this data must be taken back to the clinical setting to determine if the experimental findings are relevant and found in the human condition. This iterative reinforcement between clinical and experimental observations ensures determining the correct answer to the clinical question.

What kind of samples are needed and how should they be analyzed for understanding the gut microbiota?: Sampling and analysis

Most studies rely on fecal samples to assess the gut microbiome because they require little other than patient volunteerism23 and minimal expertise in extracting microbial DNA for sequencing. However, feces are comprised primarily of luminal, rather than mucosa-associated, colonic microbiota. They at best represent the colon and are neither surrogates for nor universal measures of dynamic and regional changes in the gut microbiome16, 2426. Recent murine studies showed that diet and other environmental factors influence host metabolic, immune, and digestive functions, in part, via their effects on small bowel microbiota25, 27, 28. Newer technologies enabling studies of small bowel microbiota are urgently needed, such as a less invasive sampling of small bowel mucosa and content.

Although 16S rRNA gene amplicon analysis is affordable, relatively rapid and therefore widely performed, it has several limitations. Analyzing 16S rRNA marker genes provides the information about “who is there” (i.e. microbial population by general taxonomical groups or sequence clusters), but not “what are they doing” (i.e. microbial functions). Furthermore, the low resolution limits observations at best to a genus level and the tremendous diversity of strains, even within the same species29, 30, makes it difficult to infer function by comparisons to related reference genomes. Thus, 16S rRNA marker gene analysis provides limited mechanistic insights. Current alternative approaches, such as metabolomics and metatranscriptomes analyses, in vitro human organoids and in vivo gnotobiotic animal models systems, need to be further exploited to assess microbial function and their effects on host cells. However, these ‘omic technologies also have their downsides: they are costly, labor-intensive, require bioinformatic expertise and samples with significant biomass.

Human intestinal organoids derived from intestinal stem cells and animal models have been invaluable for testing hypotheses generated through clinical observation and for providing mechanistic and conceptual insights. However, the relevance of these findings to the original clinical observations must be confirmed. Discordance between observations in humans and animal models arising from species-specific differences in host-microbe interactions, environment, diet, and genetic responses have been reportedRef. Thus, a combined approach involving parallel or coupled investigations using human samples and experimental models is recommended.

Bioinformatics: seeing the forest for the trees: Analytical approaches

The rapid evolution of next-generation sequencing and meta-’omic technologies has allowed for the integration and analyses of large datasets for the study of the human gut microbiome. While several open source bioinformatics platforms and reference databases are available, they pose challenges that the discerning investigator should consider when interpreting data.

  1. Bioinformatic analyses need more robust genomic and functional annotation of sequence reads and more user friendly pipeline for data analysis, storage, and integration.

  2. The various platforms and reference databases developed for 16S rRNA analysis3137 often employ different filters and algorithms. Therefore the same data analyzed by different platforms can yield varied results38.

  3. Metagenomic and metatranscriptomic sequencing that currently generate data based on “shotgun sequencing” of short read sequences (50–150 nucleotides) and other ‘omic large data provide a plethora of information, but require machine learning and artificial intelligence platforms to “see the forest for the trees”. Assembling and clustering steps to obtain genome bins (sorting short read sequences to functional subsystems) and annotation of these analyses can be challenging but are necessary to achieve species-level resolution. Meanwhile, like 16S rRNA analysis, there are several ways to analyze these data20, 3941, often resulting in varied interpretations. In addition, few studies do the diligence to follow up inferred conclusions through experimental or clinical validation (Figure 2A).

Figure 2.

Figure 2.

Metagenomic shotgun sequencing analysis (A) Short-read based analysis and binning-based analysis. Machine-learning and artificial-intelligence platforms have a potential to improve the quality of data analysis. (B) Interdisciplinary approach is need to obtain mechanical insights and establish clinical relevance.

An assembly and visualization platform (anvi’o)20 has a potential to assemble long-read contigs and more accurate annotations. With deeper shotgun sequencing of a sample, it has been possible to derive metagenome assembled genomes (MAGs) that provide an extremely high level of genomic resolution of individual microbial populations17, 42. The ability to define specific MAGs in a clinical sample will provide nearly complete draft genomes making metagenomic sequence analysis invaluable. Even with this high resolution, MAGs still need to be confirmed by traditional cultivation and whole geneome sequencing of the MAG-relevant strains. This will reveal subtle genomic variations even among nearly isogenic microbial strains. Such variations could potentially account for differences in phenotype and function and provide mechanistic insights at a genomic level resolution. Cultivated microbial strains can be further characterized for phenotype and function in patient-derived intestinal organoids in vitro and in gnotobiotic animal models in vivo (Figure 2B). The undergirding premise should always be the clinical relevance and physiological significance.

Gut microbiome dark matter – unknown and unexplored space

The bacterial kingdom of the gut microbiome has received the greatest attention because technologies, reference annotations, and bioinformatic platforms are available to study them. However, a large part of the gut microbiome, the fungal, archaebacterial, and bacteriophage kingdoms, and their relevance to human health and disease remains unexplored because of a paucity of necessary tools, genome annotations, and technology. The unmet need is to develop ways to navigate this space, as new clinical opportunities and advances in experimental approaches emerge.

Conclusions and Future Directions

The human gut microbiome studies need to go beyond description and correlation. This starts with well-articulated studies having defined research questions, strong scientific merit, and a clear plan of execution including alternate strategies. Given the tremendous inter-individual variation of the human gut microbiome, changes in clinical states over time (e.g. remission and relapse), and other confounding factors inherent to cross-sectional human subject studies, we submit that more can be gained through longitudinal investigations involving time sequence analyses of the gut microbiome relative to clinical metadata and outcomes. Resultant hypotheses and mechanistic insights should then be vetted through experimental models. In turn, these findings, should be evaluated in the context of clinical observations to elucidate their physiological relevance. New technological and bioinformatics advances should be applied when appropriate with the recognition that each has limitations and challenges. The good news is that the field will move forward as we learn how to better navigate this space, honing existing paths and creating new avenues.

References

  • 1.Human Microbiome Project C. Structure, function and diversity of the healthy human microbiome. Nature 2012;486:207–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Falony G, Joossens M, Vieira-Silva S, et al. Population-level analysis of gut microbiome variation. Science 2016;352:560–4. [DOI] [PubMed] [Google Scholar]
  • 3.Saraswati S, Sitaraman R. Aging and the human gut microbiota-from correlation to causality. Front Microbiol 2014;5:764. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.O’Toole PW, Jeffery IB. Gut microbiota and aging. Science 2015;350:1214–5. [DOI] [PubMed] [Google Scholar]
  • 5.Dominianni C, Sinha R, Goedert JJ, et al. Sex, body mass index, and dietary fiber intake influence the human gut microbiome. PLoS One 2015;10:e0124599. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Haro C, Rangel-Zuniga OA, Alcala-Diaz JF, et al. Intestinal Microbiota Is Influenced by Gender and Body Mass Index. PLoS One 2016;11:e0154090. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Wu GD, Chen J, Hoffmann C, et al. Linking long-term dietary patterns with gut microbial enterotypes. Science 2011;334:105–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.David LA, Maurice CF, Carmody RN, et al. Diet rapidly and reproducibly alters the human gut microbiome. Nature 2014;505:559–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Kashtanova DA, Popenko AS, Tkacheva ON, et al. Association between the gut microbiota and diet: Fetal life, early childhood, and further life. Nutrition 2016;32:620–7. [DOI] [PubMed] [Google Scholar]
  • 10.Goodrich JK, Waters JL, Poole AC, et al. Human genetics shape the gut microbiome. Cell 2014;159:789–99. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Blekhman R, Goodrich JK, Huang K, et al. Host genetic variation impacts microbiome composition across human body sites. Genome Biol 2015;16:191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Leone V, Gibbons SM, Martinez K, et al. Effects of diurnal variation of gut microbes and high-fat feeding on host circadian clock function and metabolism. Cell Host Microbe 2015;17:681–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Lewis JD, Chen EZ, Baldassano RN, et al. Inflammation, Antibiotics, and Diet as Environmental Stressors of the Gut Microbiome in Pediatric Crohn’s Disease. Cell Host Microbe 2015;18:489–500. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Maier L, Pruteanu M, Kuhn M, et al. Extensive impact of non-antibiotic drugs on human gut bacteria. Nature 2018;555:623–628. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Yatsunenko T, Rey FE, Manary MJ, et al. Human gut microbiome viewed across age and geography. Nature 2012;486:222–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Zmora N, Zilberman-Schapira G, Suez J, et al. Personalized Gut Mucosal Colonization Resistance to Empiric Probiotics Is Associated with Unique Host and Microbiome Features. Cell 2018;174:1388–1405 e21. [DOI] [PubMed] [Google Scholar]
  • 17.Vineis JH, Ringus DL, Morrison HG, et al. Patient-Specific Bacteroides Genome Variants in Pouchitis. MBio 2016;7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Huang Y, Dalal S, Antonopoulos D, et al. Early Transcriptomic Changes in the Ileal Pouch Provide Insight into the Molecular Pathogenesis of Pouchitis and Ulcerative Colitis. Inflamm Bowel Dis 2017;23:366–378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Mahadevan U, Sandborn WJ. Diagnosis and management of pouchitis. Gastroenterology 2003;124:1636–50. [DOI] [PubMed] [Google Scholar]
  • 20.Eren AM, Esen OC, Quince C, et al. Anvi’o: an advanced analysis and visualization platform for ‘omics data. PeerJ 2015;3:e1319. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Morgan XC, Kabakchiev B, Waldron L, et al. Associations between host gene expression, the mucosal microbiome, and clinical outcome in the pelvic pouch of patients with inflammatory bowel disease. Genome Biol 2015;16:67. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Weiser M, Simon JM, Kochar B, et al. Molecular classification of Crohn’s disease reveals two clinically relevant subtypes. Gut 2018;67:36–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Del Savio L, Prainsack B, Buyx A. Motivations of participants in the citizen science of microbiomics: data from the British Gut Project. Genet Med 2017;19:959–961. [DOI] [PubMed] [Google Scholar]
  • 24.Donaldson GP, Lee SM, Mazmanian SK. Gut biogeography of the bacterial microbiota. Nat Rev Microbiol 2016;14:20–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Martinez-Guryn K, Hubert N, Frazier K, et al. Small Intestine Microbiota Regulate Host Digestive and Absorptive Adaptive Responses to Dietary Lipids. Cell Host Microbe 2018;23:458–469 e5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Martinez-Guryn K, Leone V, Chang EB. Regional Diversity of the Gastrointestinal Microbiome. Cell Host Microbe 2019;26:314–324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Wang L, Fouts DE, Starkel P, et al. Intestinal REG3 Lectins Protect against Alcoholic Steatohepatitis by Reducing Mucosa-Associated Microbiota and Preventing Bacterial Translocation. Cell Host Microbe 2016;19:227–39. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Wang Y, Kuang Z, Yu X, et al. The intestinal microbiota regulates body composition through NFIL3 and the circadian clock. Science 2017;357:912–916. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Meyer F, Trimble WL, Chang EB, et al. Functional predictions from inference and observation in sequence-based inflammatory bowel disease research. Genome Biol 2012;13:169. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Morgan XC, Segata N, Huttenhower C. Biodiversity and functional genomics in the human microbiome. Trends Genet 2013;29:51–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Caporaso JG, Kuczynski J, Stombaugh J, et al. QIIME allows analysis of high-throughput community sequencing data. Nat Methods 2010;7:335–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.PeerJ-Preprints. QIIME 2: Reproducible, interactive, scalable, and extensible microbiome data science, 2018. [DOI] [PMC free article] [PubMed]
  • 33.Schloss PD, Westcott SL, Ryabin T, et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 2009;75:7537–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Eren AM, Morrison HG, Lescault PJ, et al. Minimum entropy decomposition: unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences. ISME J 2015;9:968–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.DeSantis TZ, Hugenholtz P, Larsen N, et al. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol 2006;72:5069–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Pruesse E, Quast C, Knittel K, et al. SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res 2007;35:7188–96. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Cole JR, Wang Q, Cardenas E, et al. The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res 2009;37:D141–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Sinha R, Abu-Ali G, Vogtmann E, et al. Assessment of variation in microbial community amplicon sequencing by the Microbiome Quality Control (MBQC) project consortium. Nat Biotechnol 2017;35:1077–1086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Meyer F, Paarmann D, D’Souza M, et al. The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 2008;9:386. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Segata N, Waldron L, Ballarini A, et al. Metagenomic microbial community profiling using unique clade-specific marker genes. Nat Methods 2012;9:811–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Wattam AR, Davis JJ, Assaf R, et al. Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource Center. Nucleic Acids Res 2017;45:D535–D542. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Lee STM, Kahn SA, Delmont TO, et al. Tracking microbial colonization in fecal microbiota transplantation experiments via genome-resolved metagenomics. Microbiome 2017;5:50. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES