Abstract
Background:
Biological factors impact the human microbiome, highlighting the need for reasonably estimating sample sizes in future population studies.
Methods:
We assessed the temporal stability of fecal microbiome diversity, species composition, and genes and functional pathways through shallow shotgun metagenome sequencing. Using intraclass correlation coefficients (ICC), we measured biological variability over six months. We estimated case numbers for 1:1 or 1:3 matched case-control studies, considering significance levels of 0.05 and 0.001 with 80% power, based on the collected fecal specimens per participant.
Results:
The fecal microbiome’s temporal stability over six months varied (ICC <0.6) for most alpha and beta diversity metrics. Heterogeneity was seen in species, genes, and pathways stability (ICC 0.0–0.9). Detecting an odds ratio of 1.5 per standard deviation required 1,000–5,000 cases (0.05 significance for alpha and beta; 0.001 for species, genes, pathways) with equal cases and controls. Low-prevalent species needed 15,102 cases; high-prevalent species required 3,527. Similar needs applied to genes and pathways. In a 1:3 matched case-control study with one fecal specimen, 10,068 cases were needed for low-prevalent species; 2,351 for high-prevalent species. For odds ratios of 1.5 with multiple specimens, cases needed for low-prevalent species were 15,102 (one specimen), 8,267 (two specimens), and 5,989 (three specimens).
Conclusions:
Detecting disease associations requires a large number of cases. Repeating prediagnostic samples and matching cases to more controls could decrease the needed number of cases for such detections.
Impact:
Our results will help future epidemiologic studies design and implement well-powered microbiome studies.
Introduction
Human microbiome research is a rapidly expanding field with promising advances (1), allowing the assessment of associations between the human microbiome and several chronic diseases such as obesity (2,3), diabetes (4,5) and different types of cancer (6,7). However, our comprehension of the structure and function of the human microbiome is generally based on observations from cross-sectional studies (8–10). Thus, the current findings from these studies do not eliminate the threats of reverse causation and confounding bias (11,12).
In most current epidemiological studies, the exposure (e.g., the gut microbiome) was assessed only through samples collected at baseline or around the time of diagnosis, even though the relationship between the human microbiome and disease outcomes is complex; it is impossible to conclude if these differences in microbial composition and taxa associations are a cause or a consequence of the disease. Particularly in the case of the human microbiome, a baseline assessment could differ from a sample collected months or years later or earlier due to the dynamic nature of the microbiome (13–15). Previous studies have shown that host lifestyle or age-dependent exposures, such as changes in diet (16,17) or use of drugs (18,19), continuously impact the microbiome over time. Indeed, microbiome data present unique challenges due to various biological factors, multiple molecular measurement types, and technical and biological variability (20). It is therefore essential to carefully evaluate statistical power and sample size requirements, particularly in the context of complex human populations and the inherent variability in microbiome data.
As a continuation of a previous study using 16S rRNA gene sequencing (21), we evaluated the temporal stability of fecal microbial diversity, composition, and functional profile based on shallow shotgun metagenome sequencing, which sequences at a shallower depth than whole genome sequencing while maintaining deep taxonomic resolution (22), from randomly selected participants in a population-based study in Guanacaste, Costa Rica (23) with two fecal samples separated by an interval of six months. Based on the estimated degree of temporal microbiome variability within individuals, we calculated sample size requirements for future case-control studies nested within prospective studies.
Materials and Methods
Study population.
The present study is nested within a population-based study conducted in the Guanacaste region of Costa Rica. A detailed report of this population was described previously (24). Briefly, we included 40 participants with two fecal samples, separated by an interval of six months between the first and second clinic visits. All participants provided written informed consent. The study was approved by the Special Studies Institutional Review Board (IRB) at the National Institute of Health (NIH) and the Costa Rican IRB CEC-UNA (Comité Ético Científico de la Universidad Nacional).
Fecal specimen collection.
Fecal samples were self-collected at home by the participants within 24 hours of their clinic visits, from their first bowel movement of the day. Fecal samples were collected using four cryovals with attached scoopers on the lids, containing 2.5 mL of RNAlater, and kept in a thermo-safe container with dry ice for immediate freezing of specimens. Fecal samples were returned to the clinic by the participants or retrieved by study staff within 24 hours of collection. All specimens were stored in liquid nitrogen at the Proyecto Epidemiológico Guanacaste repository, and shipped to the NCI repository in Frederick, Maryland.
DNA extraction and sequencing.
DNA extraction and sequencing were performed at Diversigen, Inc. (New Brighton, Minnesota). Up to 250 uL of primary fecal sample was extracted with PowerSoil Pro (Qiagen) automated for high throughput on the QiaCube HT (Qiagen), using Powerbead Pro Plates (Qiagen) with 0.5mm and 0.1mm ceramic beads, following the manufacturer’s instructions. After purification, DNA concentrations were quantified using Quant-iT PicoGreen dsDNA Assay (Invitrogen). Library preparation for shallow shotgun metagenome sequencing was performed with a procedure adapted from the Illumina DNA Prep kit (Illumina). Libraries were sequenced on a NovaSeq 6000 System (Illumina) using the 2 × 150 bp paired-end protocol following manufacturer’s instructions at an average depth of 2,053,445 reads per sample. We included standard quality control samples from five volunteers with different phenotypes, (i.e., one obese female, one healthy male, one male on a low- carb diet, one infant, and one male with Crohn’s Disease) at baseline. Technical reproducibility was good to excellent for most microbiome diversity metrics and species’ relative abundances, with ICCs varying between 0.8 and 0.9.
Bioinformatic processing.
After sequencing, DNA sequences were filtered for low quality (Q-Score < 30) and length (< 60 bp), and adapter sequences were trimmed using bbmap from bbtools suite. After quality trimming and filtering, host filtering was performed by aligning reads to a human reference genome (hg19) that was pulled from Bowtie2.4.2 (25). Bacterial taxonomic annotations were performed using MetaPhlAn3.0.2 (2020 July version) (26) based on custom marker gene database. The sequence table and phylogenetic tree were then generated from QIIME2 (2020.8) package (27). Gene and pathway profiling was performed using HUMAnN3 (v3.0.0-alpha) with default parameters (28). Genes were annotated using the KEGG Orthology database (29). Pathways were annotated using pathways databases included with HUMAnN3 for alignments to UniRef gene families. Unclassified and unmapped categories were removed, and data were re-normalized for further analyses.
Alpha and beta diversity measures were calculated after rarefaction to 474,445 reads per sample; no samples were excluded after rarefaction. Alpha diversity measures, i.e., observed species to estimate taxa richness, Shannon Diversity to estimate taxa richness and evenness, and Faith’s Phylogenetic Diversity to consider phylogenetic differences in samples, were calculated using the R phyloseq package (30). For beta diversity measures, the Bray–Curtis and Jaccard distances were calculated to capture differences in species abundances and presence/absence, respectively, between samples using the R vegan package (31). Unweighted UniFrac, and weighted UniFrac were calculated to capture differences between samples based on sequence distances using the R GUniFrac package (32). Using non-rarefied data, species, genes and pathways with relative abundance or relative abundance <0.001% and prevalence <5% were filtered using the R microbiome package (http://microbiome.github.com/microbiome). Filtered species, genes and pathways were then categorized into the following prevalence categories: 5%–10%, 10%–25%, 25%–50%, 50%–75%, >75%.
Statistical analysis.
For each microbiome metrics, we quantified the biological variability over six months using intraclass correlation coefficients (ICC) estimated by a linear mixed-effects model. The ICC were calculated on the basis of (i) three alpha diversity metrics (observed species, Shannon Diversity, and Faith’s Phylogenetic Diversity); (ii) the first three principal coordinates (PC1, PC2 and PC3) of four beta diversity metrics (Bray-Curtis, 21.0%, 11.2% and 8.3% of explained variability for PC1, PC2 and PC3 respectively; Jaccard, 14.9%, 7.5% and 6.3% for PC1, PC2 and PC3; unweighted UniFrac, 17.0%, 12.3% and 8.2% for PC1, PC2 and PC3; weighted UniFrac, 39.7%, 13.4% and 10.0% for PC1, PC2 and PC3); (iii) the untransformed, the square root-transformed, and the centered log ratio-transformed relative abundance of the species which were present in at least 5% of samples with relative abundance of >0; (iv) relative abundance of genes and pathways which were present in at least 5% of samples with relative abundance of >0. The 95% confidence intervals for the estimated ICC were approximated using 1,000 bootstrap samples. Based on the 95% confident interval of the ICC estimate, values were interpreted as follow: less than 0.5 indicates poor reliability, between 0.5 and 0.75 suggests moderate reliability, between 0.75 and 0.9 implies good reliability, and greater than 0.90 reflects excellent reliability (33).
We evaluated the effect of microbiome variability on statistical power; the detailed statistical methods and R code for power calculation were previously provided (21). Briefly, we evaluated the impact of the temporal variability of each microbiome metric as measured by the ICC and the inclusion of multiple fecal samples (i.e., 1, 2, or 3) to better estimate the “long-term” microbiome metric on sample size calculations. For a logistic regression model, we estimated the sample size required to detect specified ORs (i.e., 1.5, 2.0, 2.5, 3.0, and 3.5) for the top quartile compared to the bottom quartile of the microbiome metrics based on the estimated ICCs. We calculated the required numbers of cases for 1:1 matched or 1:3 matched case-control studies at significance levels of 0.05, 0.01, 0.001, and 0.0001 with 80% power. We considered sample size requirements based on (i) the estimated ICCs from the three alpha diversity metrics (observed species, Shannon Diversity, and Faith’s Phylogenetic Diversity); (ii) the estimated ICCs of the first three principal coordinates (PC1, PC2 and PC3) of the four beta diversity metrics (Bray-Curtis, Jaccard, unweighted UniFrac, and weighted UniFrac); (iii) the estimated median ICC of the untransformed, the square root-transformed, and the centered log ratio-transformed relative abundance of the species in each prevalence categories (i.e. 5%–10%, 10%–25%, 25%–50%, 50%–75%, >75%); and (iv) the estimated median ICC of the relative abundance of genes and pathways in each prevalence categories.
Data availability.
Data are available at the National Center for Biotechnology Information Sequence Read Archive (accession number PRJNA1045701).
Results
On the 40 participants with two fecal samples, age ranged from 21 to 81 years old, 55% (N=22) were male, 62.5% (N=25) were married, and 62.5% (N=25) had completed primary school. Overall, 92.5% (N=37) of participants reported ever drinking alcohol, and 30% (N=12) reported ever smoking cigarettes. In addition, 80% (N=32) of participants did not consume probiotic supplements, or yogurt at least once per week. Finally, none of the participants received antibiotics within the last two to three months prior to the first fecal sample collection; 17.5% (N=7) of participants received antibiotics within the last two to three months prior to the second fecal sample collection
Temporal stability over six months.
Temporal stability as measured by ICC based on the alpha diversity metrics varied between 0.13 for Shannon diversity and 0.43 for the observed number of species and Faith’s phylogenetic diversity (Figure 1, Supplementary Table 1). Based on the first principal coordinate of beta diversity measures, the fecal microbiome showed poor to good stability over 6 months (ICC = 0.63 for Bray-Curtis and Jaccard, 0.90 for unweighted UniFrac, suggesting that the presence of taxa was reasonably stable, and 0.34 for weighted UniFrac; Figure 1 and Supplementary Table 1).
Figure 1.

ICC based on three alpha diversity metrics (observed species, Shannon Diversity, and Faith’s Phylogenetic Diversity) and the first three principal coordinates (PC1, PC2 and PC3) of four beta diversity metrics (Bray-Curtis, 21.0%, 11.2% and 8.3% of explained variability for PC1, PC2 and PC3 respectively; Jaccard, 14.9%, 7.5% and 6.3% for PC1, PC2 and PC3; unweighted UniFrac, 17.0%, 12.3% and 8.2% for PC1, PC2 and PC3; weighted UniFrac, 39.7%, 13.4% and 10.0% for PC1, PC2 and PC3). ICC are interpreted based on their value: values closer to 1 indicate excellent agreement or stability, values around 0.5 suggest moderate agreement, and values closer to 0 indicate poor agreement or stability.
For the untransformed, the square root-transformed, and the centered log ratio-transformed relative abundances of 160 species, there was a wide range of ICCs from 0.00 to >0.95 (Supplementary Tables 2–4). Based on prevalence categories (i.e., 5%–10%, 10%–25%, 25%–50%, 50%–75%, >75%), the median ICC varied from 0.09 for the 5%–10% category to 0.45 for the 50%–75% category for untransformed relative abundances; from 0.33 for the 5%–10% category to 0.56 for the 50%–75% category for square root-transformed relative abundances; or from 0.37 for the >75% category to 0.70 for the 50%–75% category for centered log ratio-transformed relative abundances (Figure 2A, Figure 2B, Figure 2C). Overall, the transformed relative abundances of species showed higher ICCs compared to the untransformed relative abundances of species.
Figure 2.

Box plot distribution of ICCs based on (A) the untransformed, (B) the square root-transformed, and (C) the centered log ratio-transformed relative abundance of the species which were present in at least 5% of samples with relative abundance of >0, categorized by prevalence categories: 5%–10%, 10%–25%, 25%–50%, 50%–75%, >75%. The box represents the interquartile range of ICC, the line inside the box shows the median ICC, and the whiskers indicate the range of ICC, while outliers are plotted as individual points. ICC are interpreted based on their value: values closer to 1 indicate excellent agreement or stability, values around 0.5 suggest moderate agreement, and values closer to 0 indicate poor agreement or stability.
For the relative abundance of 3,404 genes and 351 pathways, the ICCs varied between 0.00 to >0.95 for both genes and pathways (Supplementary Tables 5–6). Based on prevalence categories (i.e., 5%–10%, 10%–25%, 25%–50%, 50%–75%, >75%), the median ICC varied from 0.22 for the 10%–25% category to 0.37 for the >75% category for the relative abundance of genes; or from 0.05 for the 5%–10% category to 0.46 for the >75% category for relative abundance of pathways (Figure 3A, Figure 3B).
Figure 3.

Box plot distribution of ICCs based on (A) the relative abundance of genes and (B) the relative abundance of pathways, categorized by prevalence categories: 5%–10%, 10%–25%, 25%–50%, 50%–75%, >75%. The box represents the interquartile range of ICC, the line inside the box shows the median ICC, and the whiskers indicate the range of ICC, while outliers are plotted as individual points. ICC are interpreted based on their value: values closer to 1 indicate excellent agreement or stability, values around 0.5 suggest moderate agreement, and values closer to 0 indicate poor agreement or stability.
Estimates of sample size requirements for microbiome studies using metagenomics.
Assuming a 1:1 matched case-control study at a significance level of 0.05 based on one fecal specimen per participant, detecting a relatively weak association (i.e., odds ratio = 1.5) between alpha/beta diversity metrics and an outcome would require between 1,000–5,000 cases (Table 1). In contrast, for a stronger association (i.e., odds ratio = 3.5) with one fecal specimen at a significance level of 0.05, approximately 100–500 cases would be sufficient for all diversity metrics when matching one case to one control. Using the same study setting with multiple specimens per participant over time, the required sample size would be lower. Indeed, detection of an odds ratio of 1.5 for the number of observed species (ICC = 0.43) would require 1,518 cases with one specimen, 1,088 cases with two specimens, and 945 cases with three specimens. Detecting an odds ratio of 1.5 for the first principal coordinate of Bray-Curtis (ICC = 0.63) would require 1,047 cases with one specimen, 853 cases with a second specimen, and 788 cases with a third specimen. Assuming a 1:3 matched case-control study at a significance level of 0.05 based on one fecal specimen per participant, detecting an odds ratio of 1.5 between alpha/beta diversity metrics and an outcome would require between 1,000–3,500 cases (Supplementary Table 7).
Table 1.
Number of cases (assuming 1:1 matched case-control study) required to detect an association at a significance level of 0.05 with 80% power, based on 1, 2, or 3 fecal specimens per participant and intraclass correlation coefficients estimated from fecal samples collected at 6-months interval. Odds ratio for the top 25% diversity metric versus the bottom 25% diversity metric.
| Measure | Estimated ICC | No. of Specimens | Odds Ratio | ||||
|---|---|---|---|---|---|---|---|
| 1.5 | 2.0 | 2.5 | 3.0 | 3.5 | |||
|
| |||||||
| Alpha Diversity | |||||||
| No. of observed species | 0.43 | 1 | 1518 | 506 | 277 | 192 | 153 |
| 2 | 1088 | 362 | 198 | 138 | 110 | ||
| 3 | 945 | 315 | 172 | 120 | 95 | ||
| Shannon index | 0.13 | 1 | 5182 | 1727 | 946 | 657 | 524 |
| 2 | 2920 | 973 | 533 | 370 | 295 | ||
| 3 | 2166 | 722 | 395 | 275 | 219 | ||
| Faith’s PD | 0.43 | 1 | 1513 | 504 | 276 | 192 | 153 |
| 2 | 1086 | 362 | 198 | 137 | 109 | ||
| 3 | 943 | 314 | 172 | 119 | 95 | ||
| Beta Diversity | |||||||
| Bray-Curtis PC1 | 0.63 | 1 | 1047 | 349 | 191 | 133 | 106 |
| 2 | 853 | 284 | 155 | 108 | 86 | ||
| 3 | 788 | 262 | 143 | 100 | 79 | ||
| Bray-Curtis PC2 | 0.61 | 1 | 1070 | 356 | 195 | 135 | 108 |
| 2 | 864 | 288 | 157 | 109 | 87 | ||
| 3 | 795 | 265 | 145 | 101 | 80 | ||
| Bray-Curtis PC3 | 0.45 | 1 | 1518 | 506 | 277 | 192 | 153 |
| 2 | 1088 | 362 | 198 | 138 | 110 | ||
| 3 | 945 | 315 | 172 | 120 | 95 | ||
| Jaccard PC1 | 0.62 | 1 | 5182 | 1727 | 946 | 657 | 524 |
| 2 | 2920 | 973 | 533 | 370 | 295 | ||
| 3 | 2166 | 722 | 395 | 275 | 219 | ||
| Jaccard PC2 | 0.66 | 1 | 1513 | 504 | 276 | 192 | 153 |
| 2 | 1086 | 362 | 198 | 137 | 109 | ||
| 3 | 943 | 314 | 172 | 119 | 95 | ||
| Jaccard PC3 | 0.49 | 1 | 1047 | 349 | 191 | 133 | 106 |
| 2 | 853 | 284 | 155 | 108 | 86 | ||
| 3 | 788 | 262 | 143 | 100 | 79 | ||
| Unweighted UniFrac PC1 | 0.90 | 1 | 1070 | 356 | 195 | 135 | 108 |
| 2 | 864 | 288 | 157 | 109 | 87 | ||
| 3 | 795 | 265 | 145 | 101 | 80 | ||
| Unweighted UniFrac PC2 | 0.78 | 1 | 1518 | 506 | 277 | 192 | 153 |
| 2 | 1088 | 362 | 198 | 138 | 110 | ||
| 3 | 945 | 315 | 172 | 120 | 95 | ||
| Unweighted UniFrac PC3 | 0.39 | 1 | 5182 | 1727 | 946 | 657 | 524 |
| 2 | 2920 | 973 | 533 | 370 | 295 | ||
| 3 | 2166 | 722 | 395 | 275 | 219 | ||
| Weighted UniFrac PC1 | 0.34 | 1 | 1513 | 504 | 276 | 192 | 153 |
| 2 | 1086 | 362 | 198 | 137 | 109 | ||
| 3 | 943 | 314 | 172 | 119 | 95 | ||
| Weighted UniFrac PC2 | 0.20 | 1 | 1047 | 349 | 191 | 133 | 106 |
| 2 | 853 | 284 | 155 | 108 | 86 | ||
| 3 | 788 | 262 | 143 | 100 | 79 | ||
| Weighted UniFrac PC3 | 0.65 | 1 | 1070 | 356 | 195 | 135 | 108 |
| 2 | 864 | 288 | 157 | 109 | 87 | ||
| 3 | 795 | 265 | 145 | 101 | 80 | ||
When matching one case to one control with one fecal specimen per participant at P = 0.001, to mimic multiple testing correction, small associations (i.e., odds ratio = 1.5) between individual species and an outcome could only be detected in studies with large sample sizes (Table 2). Specifically, for low prevalent species with untransformed relative abundance (5% - 10%, median ICC = 0.09), 15,102 cases would be required; in contrast, for species with high prevalence (>75%, median ICC = 0.41) 3,527 cases would be necessary. The estimates of sample size requirements to detect associations between rare species and an outcome were more optimistic when using transformations on relative abundances of species. Differences in the estimates of sample size requirements between transformed and untransformed relative abundances of species were less important for species with the highest prevalences (>75%) (Supplementary Table 8).
Table 2.
Number of cases (assuming 1:1 matched case-control study) required to detect an association at a significance level of 0.001 with 80% power, based on 1, 2, or 3 fecal specimens per participant and median intraclass correlation coefficients estimated from fecal samples collected at 6-months interval. Odds ratio for the top 25% untransformed and transformed relative abundance of species versus the bottom 25% in each prevalence category (i.e., 5%–10%, 10%–25%, 25%–50%, 50%–75%, >75%).
| Measure and Prevalence | Median ICC | No. of Specimens | Odds Ratio | ||||
|---|---|---|---|---|---|---|---|
| 1.5 | 2.0 | 2.5 | 3.0 | 3.5 | |||
|
| |||||||
| Relative Abundance of Species (no transformation) | |||||||
| 5% - 10% | 0.095 | 1 | 15102 | 5034 | 2757 | 1917 | 1528 |
| 2 | 8267 | 2755 | 1509 | 1049 | 836 | ||
| 3 | 5989 | 1996 | 1093 | 760 | 606 | ||
| 10% - 25% | 0.33 | 1 | 4387 | 1462 | 801 | 557 | 444 |
| 2 | 2910 | 970 | 531 | 369 | 294 | ||
| 3 | 2417 | 805 | 441 | 306 | 244 | ||
| 25% - 50% | 0.41 | 1 | 3485 | 1161 | 636 | 442 | 352 |
| 2 | 2459 | 819 | 449 | 312 | 248 | ||
| 3 | 2116 | 705 | 386 | 268 | 214 | ||
| 50% - 75% | 0.45 | 1 | 3195 | 1065 | 583 | 405 | 323 |
| 2 | 2314 | 771 | 422 | 293 | 234 | ||
| 3 | 2020 | 673 | 368 | 256 | 204 | ||
| >75% | 0.41 | 1 | 3527 | 1175 | 644 | 447 | 357 |
| 2 | 2480 | 826 | 452 | 314 | 251 | ||
| 3 | 2130 | 710 | 389 | 270 | 215 | ||
| Relative Abundance of Species (square root transformation) | |||||||
| 5% - 10% | 0.33 | 1 | 4327 | 1442 | 790 | 549 | 438 |
| 2 | 2880 | 960 | 525 | 365 | 291 | ||
| 3 | 2397 | 799 | 437 | 304 | 242 | ||
| 10% - 25% | 0.53 | 1 | 2719 | 906 | 496 | 345 | 275 |
| 2 | 2076 | 692 | 379 | 263 | 210 | ||
| 3 | 1861 | 620 | 339 | 236 | 188 | ||
| 25% - 50% | 0.55 | 1 | 2604 | 868 | 475 | 330 | 263 |
| 2 | 2018 | 672 | 368 | 256 | 204 | ||
| 3 | 1823 | 607 | 332 | 231 | 184 | ||
| 50% - 75% | 0.56 | 1 | 2542 | 847 | 464 | 322 | 257 |
| 2 | 1987 | 662 | 363 | 252 | 201 | ||
| 3 | 1802 | 600 | 329 | 228 | 182 | ||
| >75% | 0.44 | 1 | 3226 | 1075 | 589 | 409 | 326 |
| 2 | 2329 | 776 | 425 | 295 | 235 | ||
| 3 | 2030 | 676 | 370 | 257 | 205 | ||
| Relative Abundance of Species (centered log-ratio transformation) | |||||||
| 5% - 10% | 0.54 | 1 | 2667 | 889 | 487 | 338 | 270 |
| 2 | 2050 | 683 | 374 | 260 | 207 | ||
| 3 | 1844 | 614 | 336 | 234 | 186 | ||
| 10% - 25% | 0.57 | 1 | 2513 | 837 | 459 | 319 | 254 |
| 2 | 1973 | 657 | 360 | 250 | 199 | ||
| 3 | 1793 | 597 | 327 | 227 | 181 | ||
| 25% - 50% | 0.59 | 1 | 2414 | 804 | 440 | 306 | 244 |
| 2 | 1923 | 641 | 351 | 244 | 194 | ||
| 3 | 1760 | 586 | 321 | 223 | 178 | ||
| 50% - 75% | 0.70 | 1 | 2050 | 683 | 374 | 260 | 207 |
| 2 | 1741 | 580 | 318 | 221 | 176 | ||
| 3 | 1638 | 546 | 299 | 208 | 165 | ||
| >75% | 0.37 | 1 | 3815 | 1271 | 696 | 484 | 386 |
| 2 | 2623 | 874 | 479 | 333 | 265 | ||
| 3 | 2226 | 742 | 406 | 282 | 225 | ||
Like for species, assuming a 1:1 matched case-control study at significance level of 0.001 based on one fecal specimen per participant, detecting a small association (i.e., odds ratio = 1.5) between microbial genes/pathways and an outcome would need studies with large sample sizes (Table 3). For example, for low prevalent genes (5% - 10%, median ICC = 0.23), 6,130 cases would be required while, for genes with high prevalence (>75%, median ICC = 0.37) 3,874 cases would be required. The same applies to pathways, for low prevalent pathways (5% - 10%, median ICC = 0.05), 30,884 cases would be necessary compared to 3,085 cases for pathways with high prevalence (>75%, median ICC = 0.46). Including multiple specimens per participant over time with the same study settings would decrease the required sample size.
Table 3.
Number of cases (assuming 1:1 matched case-control study) required to detect an association at a significance level of 0.001 with 80% power, based on 1, 2, or 3 fecal specimens per participant and median intraclass correlation coefficients estimated from fecal samples collected at 6-months interval. Odds ratio for the top 25% relative abundance of genes or pathways versus the bottom 25% in each prevalence category (i.e., 5%–10%, 10%–25%, 25%–50%, 50%–75%, >75%).
| Measure and Prevalence | Median ICC | No. of Specimens | Odds Ratio | ||||
|---|---|---|---|---|---|---|---|
| 1.5 | 2.0 | 2.5 | 3.0 | 3.5 | |||
|
| |||||||
| Relative abundance of Genes | |||||||
| 5%–10% | 0.23 | 1 | 6130 | 2043 | 1119 | 778 | 620 |
| 2 | 3781 | 1260 | 690 | 480 | 382 | ||
| 3 | 2998 | 999 | 547 | 380 | 303 | ||
| 10%–25% | 0.22 | 1 | 6475 | 2158 | 1182 | 822 | 655 |
| 2 | 3954 | 1318 | 722 | 502 | 400 | ||
| 3 | 3113 | 1037 | 568 | 395 | 315 | ||
| 25%–50% | 0.36 | 1 | 3965 | 1321 | 724 | 503 | 401 |
| 2 | 2699 | 899 | 492 | 342 | 273 | ||
| 3 | 2277 | 759 | 415 | 289 | 230 | ||
| 50%–75% | 0.35 | 1 | 4046 | 1348 | 738 | 513 | 409 |
| 2 | 2739 | 913 | 500 | 347 | 277 | ||
| 3 | 2304 | 768 | 420 | 292 | 233 | ||
| >75% | 0.37 | 1 | 3874 | 1291 | 707 | 491 | 392 |
| 2 | 2653 | 884 | 484 | 336 | 268 | ||
| 3 | 2246 | 748 | 410 | 285 | 227 | ||
| Relative abundance of Pathways | |||||||
| 5%–10% | 0.05 | 1 | 30884 | 10294 | 5639 | 3921 | 3126 |
| 2 | 16158 | 5386 | 2950 | 2051 | 1635 | ||
| 3 | 11250 | 3749 | 2054 | 1428 | 1138 | ||
| 10%–25% | 0.32 | 1 | 4410 | 1470 | 805 | 560 | 446 |
| 2 | 2921 | 973 | 533 | 370 | 295 | ||
| 3 | 2425 | 808 | 442 | 307 | 245 | ||
| 25%–50% | 0.30 | 1 | 4800 | 1600 | 876 | 609 | 485 |
| 2 | 3116 | 1038 | 569 | 395 | 315 | ||
| 3 | 2555 | 851 | 466 | 324 | 258 | ||
| 50%–75% | 0.39 | 1 | 3696 | 1232 | 674 | 469 | 374 |
| 2 | 2564 | 854 | 468 | 325 | 259 | ||
| 3 | 2187 | 729 | 399 | 277 | 221 | ||
| >75% | 0.46 | 1 | 3085 | 1028 | 563 | 391 | 312 |
| 2 | 2259 | 753 | 412 | 286 | 228 | ||
| 3 | 1983 | 661 | 362 | 251 | 200 | ||
In the supplementary tables, we have provided detailed sample-size calculations for 1:1 case-control matching (Supplementary Tables 9–11) and 1:3 case-control matching (Supplementary Tables 7, 8, 12–15) at different levels of significance (P = 0.05–0.0001).
Discussion
In this study, we have quantified temporal variation in microbiome measurements obtained from shallow shotgun metagenome sequencing and provided the sample size requirements for future nested case-control studies based on this data. For most alpha and beta diversity metrics, the temporal stability of the fecal microbiome from samples collected at a six month interval was low to moderate, with ICCs of 0.6 or less, except for the first principal coordinate of unweighted UniFrac which was stable over time. However, we observed a lot of heterogeneity in the temporal stability for the relative abundance of species, genes and pathways, with ICCs varying between 0.0 and 0.9. In addition, transformed relative abundances of species showed higher ICCs than untransformed relative abundances of species. Many of the species, genes and pathways from low prevalence categories showed lower temporal stability as compared to the species, genes and pathways from high prevalence categories. For a single fecal collection per participant with most microbiome diversity metrics, assuming an equal number of cases and controls at a significance level of 0.05, detecting an odds ratio of 1.5 would require hundreds to thousands of cases. The estimates of sample size requirements to detect associations between rare species and an outcome, at significance level of 0.001, suggested that studies with very large sample sizes would be required. Similar results were observed for genes and pathways in low prevalence categories. Finally, when including multiple specimens per participant over time or several matched-controls with the same study parameters, the required sample size would be far less for all microbiome metrics. Our findings suggest that temporal stability should be considered when interpreting associations between relatively unstable microbiome metrics and a disease outcome. As large samples sizes will be required to detect such associations, future epidemiological studies should plan on collecting multiple samples or including several matched-controls, as well as pooling or meta-analyzing data from multiple studies.
We previously quantified the temporal stability over six months in microbiome measurements assessed using 16S rRNA gene sequencing in three different studies – a National Cancer Institute colorectal cancer study, this Costa Rica study, and the Human Microbiome Project (21). Briefly, the study found that ICCs for temporal stability were generally 0.5 or less for most of the phylum-level relative abundances and alpha diversity metrics. For measures of beta diversity, unweighted UniFrac was relatively stable in fecal samples, while weighted UniFrac, which considers the relative abundances of taxa and the phylogenetic tree, had a lower ICC. In the present analysis, using shallow shotgun metagemome sequencing at an average depth of 2,053,445 reads per sample, we observed the same findings for alpha and beta diversity metrics. Specifically, based on the first principal coordinate of unweighted UniFrac, the fecal microbiome showed good stability over six months, with an ICC of 0.9. We previously concluded that unweighted UniFrac may be a useful metric for identifying disease associations, as compared to other microbiome metrics, which also seems true with metagenomics data. However, it is important to note that these ICC were calculated only for the first principal coordinate and as such, are not fully reflective of the beta diversity measure. For example, although the first principal coordinate of unweighted UniFrac showed good stability over time, the second and the third seemed less stable (ICC = 0.78 and 0.39, respectively). In addition, most diversity metrics reflect different characteristics of the microbiome diversity and composition, thus, using only one metric to explore microbiome diversity may not represent the actual complexity of the gut microbiome.
Although sequencing costs can vary substantially across sequencing laboratories and countries, metagenomic analysis is becoming more affordable over the years, making it feasible for population-based microbiome projects. We used shallow shotgun metagenome sequencing to assess fecal microbial diversity, composition and functional profile. We were hence able to evaluate the temporal stability of the relative abundances of taxa at the species level and functional profiles through genes and pathways. We found that temporal stability over six months was very heterogeneous depending on the specific bacterial species, genes or pathways. In addition, many of the species, genes and pathways from low prevalence categories showed lower temporal stability as compared to the species, genes and pathways from high prevalence categories. These observations were expected as we previously found that stability ICCs were lowest for the low-abundance phyla (21). Rare bacterial species, based on prevalence or relative abundance, are understudied in population-based cohorts (34). Although the conditions under which rare species contribute substantially to the etiology of some diseases remain unclear, advances in characterizing specific species and strains of the human microbiome have offered potential insights. For example, data from in vitro and murine models have demonstrated that low prevalence Fusobacterium nucleatum might promote colorectal cancer cell proliferation and increase tumor growth rates (35,36). To detect associations between rare species, gene or pathway exposure and a disease outcome in case-control studies nested within prospective cohort studies, future investigators would need to recruit thousands of participants and follow the cohort for a long period to get a sufficient number of endpoints. Another approach would be to collect specimens at multiple time points for each participant. Indeed, we found that repeated collection of specimens per participant over time, would help to decrease the required sample size for all microbiome metrics, specifically for rare species, genes and pathways. This second approach could also offer the possibility to assess temporal shifts in microbiome diversity, composition and functional profile, which may help to understand the underlying mechanisms of the relationship between the human microbiome and its host. A third approach, which seems more easy to implement in future population studies, would be to include several matched controls for each case, as we found that a 1:3 matched case-control study setting would help to decrease the required size.
The sum of the relative abundances of all microbial taxa is one, and thus, represent compositional data. As explained in previous studies, standard statistical methods do not correctly take into consideration the compositional nature of microbiome data, leading to spurious interpretations (37–40). For example, an increase in relative abundance of one highly prevalent species can lead to false negative correlations for the abundance of other species. Normalization approaches, based on transformations of relative abundances, were developed to address the challenge of compositional data (41). These methods include the square root transformation (42) and the centered log ratio transformation (37) of the relative abundances of species, which we used in this study. Similar to our results, when using data from the Human Microbiome Project, McSkimming et al. reported much higher ICC values after centered log ratio transformation as compared to untransformed relative abundances at the phylum level (21,43). The observed high ICC after centered log ratio transformation in our data and in the Human Microbiome Project data would suggest that microbiome stability over time only has a small impact in estimating an exposure’s effect. However, centered log ratio transformation may overestimate stability, and as such, effect size changes induced by this transformation could be an important factor that might impact statistical power in future cohort studies. There is currently no consensus on the most appropriate statistical methods and transformations for analyzing genes and pathways derived from microbiome data. These features often follow highly skewed distributions that may not be suitable for linear mixed-effects models without appropriate transformation. Most studies have employed data-driven approaches tailored to the specific question at hand. In our study, we evaluated these features without transformations. Analyses using untransformed data should be interpreted cautiously due to potential deviations from statistical assumptions. Future research is needed to establish best practices for handling genes and pathways, particularly in contexts where their biological distributions and relationships with outcomes are poorly characterized. We recommend to evaluate the temporal stability under different transformations that do or do not consider the compositional feature of the microbiome. For example, if the risk of developing a disease depends linearly on the relative abundance of a species, the relevant ICC should be evaluated using the untransformed relative abundance. In contrast, if the disease risk depends non-linearly on the relative abundance of a species, the relevant ICC may be evaluated using the log transformed relative abundance. Normalization of microbiome data should be compatible not only with the compositional structure of the data but also with the postulated disease model.
This study has several limitations. First, the interval of time between the two sample collections was limited to six months. Although, our results suggest that it is crucial to systematically evaluate the temporal stability of microbiome measures, future microbiome prospective cohorts might collect repeated samples over longer intervals, implying increased power loss due to more important temporal variations. In addition, the lack of technical duplicates prevented the separation of temporal shifts from technical variability. However, we included standard quality control samples as described in the methods section. Finally, in the current analysis, the mixed-effects models did not adjust for potential confounding factors due to limited data on covariates. As a result, potential confounding by these factors was not directly accounted for in the ICC estimation. We acknowledge that adjusting for these covariates may have reduced variability and improved ICC values.
Repeated prediagnostic samples from thousands of prospectively ascertained cases are required to detect modest disease associations with microbiome features, most notably for rare taxa and functional pathways. Researchers conducting population- based studies need to find cost-effective options that might facilitate the inclusion of participants, enabling to match several controls for each case, and the collection of sequential specimens for adequate statistical power and may want to consider pooling or meta-analyzing data from multiple studies.
Supplementary Material
Funding:
This work was supported by the Intramural Research Program of the National Cancer Institute at the National Institutes of Health.
Footnotes
Conflict of interest: The authors declare no potential conflicts of interest.
References
- 1.Proctor L, LoTempio J, Marquitz A, Daschner P, Xi D, Flores R, et al. A review of 10 years of human microbiome research activities at the US National Institutes of Health, Fiscal Years 2007–2016. Microbiome. 2019. Feb 26;7(1):31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Turnbaugh PJ, Ley RE, Mahowald MA, Magrini V, Mardis ER, Gordon JI. An obesity-associated gut microbiome with increased capacity for energy harvest. Nature. 2006. Dec 21;444(7122):1027–31. [DOI] [PubMed] [Google Scholar]
- 3.Turnbaugh PJ, Hamady M, Yatsunenko T, Cantarel BL, Duncan A, Ley RE, et al. A core gut microbiome in obese and lean twins. Nature. 2009. Jan 22;457(7228):480–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Larsen N, Vogensen FK, Berg FWJ van den, Nielsen DS, Andreasen AS, Pedersen BK, et al. Gut Microbiota in Human Adults with Type 2 Diabetes Differs from Non-Diabetic Adults. PLOS ONE. 2010. Feb 5;5(2):e9085. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Qin J, Li Y, Cai Z, Li S, Zhu J, Zhang F, et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature. 2012. Oct;490(7418):55–60. [DOI] [PubMed] [Google Scholar]
- 6.Huybrechts I, Zouiouich S, Loobuyck A, Vandenbulcke Z, Vogtmann E, Pisanu S, et al. The Human Microbiome in Relation to Cancer Risk: A Systematic Review of Epidemiologic Studies. Cancer Epidemiol Biomark Prev Publ Am Assoc Cancer Res Cosponsored Am Soc Prev Oncol. 2020. Oct;29(10):1856–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Wu Z, Byrd DA, Wan Y, Ansong D, Clegg-Lamptey JN, Wiafe-Addai B, et al. The oral microbiome and breast cancer and nonmalignant breast disease, and its relationship with the fecal microbiome in the Ghana Breast Health Study. Int J Cancer. 2022. Oct 15;151(8):1248–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Robinson CK, Brotman RM, Ravel J. Intricacies of assessing the human microbiome in epidemiologic studies. Ann Epidemiol. 2016. May;26(5):311–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Vandeputte D, Tito RY, Vanleeuwen R, Falony G, Raes J. Practical considerations for large-scale gut microbiome studies. FEMS Microbiol Rev. 2017. Aug 1;41(Supp_1):S154–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Gilbert JA, Blaser MJ, Caporaso JG, Jansson JK, Lynch SV, Knight R. Current understanding of the human microbiome. Nat Med. 2018. Apr;24(4):392–400. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Hanage WP. Microbiology: Microbiome science needs a healthy dose of scepticism. Nature. 2014. Aug;512(7514):247–8. [DOI] [PubMed] [Google Scholar]
- 12.Cani PD. Human gut microbiome: hopes, threats and promises. Gut. 2018. Sep;67(9):1716–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Costello EK, Lauber CL, Hamady M, Fierer N, Gordon JI, Knight R. Bacterial Community Variation in Human Body Habitats Across Space and Time. Science. 2009. Dec 18;326(5960):1694–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Flores GE, Caporaso JG, Henley JB, Rideout JR, Domogala D, Chase J, et al. Temporal variability is a personalized feature of the human microbiome. Genome Biol. 2014;15(12):531. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Mehta RS, Abu-Ali GS, Drew DA, Lloyd-Price J, Subramanian A, Lochhead P, et al. Stability of the human faecal microbiome in a cohort of adult men. Nat Microbiol. 2018. Mar;3(3):347–55. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Conlon MA, Bird AR. The Impact of Diet and Lifestyle on Gut Microbiota and Human Health. Nutrients. 2014. Dec 24;7(1):17–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Strasser B, Wolters M, Weyh C, Krüger K, Ticinesi A. The Effects of Lifestyle and Diet on Gut Microbiota Composition, Inflammation and Muscle Performance in Our Aging Society. Nutrients. 2021. Jun 15;13(6):2045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Weersma RK, Zhernakova A, Fu J. Interaction between drugs and the gut microbiome. Gut. 2020. Aug 1;69(8):1510–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Vich Vila A, Collij V, Sanna S, Sinha T, Imhann F, Bourgonje AR, et al. Impact of commonly used drugs on the composition and metabolic function of the gut microbiota. Nat Commun. 2020. Jan 17;11:362. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Sinha R, Abnet CC, White O, Knight R, Huttenhower C. The microbiome quality control project: baseline study design and future directions. Genome Biol. 2015;16:276. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Sinha R, Goedert JJ, Vogtmann E, Hua X, Porras C, Hayes R, et al. Quantification of Human Microbiome Stability Over 6 Months: Implications for Epidemiologic Studies. Am J Epidemiol. 2018. Jun;187(6):1282–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Hillmann B, Al-Ghalith GA, Shields-Cutler RR, Zhu Q, Gohl DM, Beckman KB, et al. Evaluating the Information Content of Shallow Shotgun Metagenomics. mSystems. 2018;3(6):e00069–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Gonzalez P, Hildesheim A, Herrero R, Katki H, Wacholder S, Porras C, et al. Rationale and design of a Long Term Follow-up study of women who did and did not receive HPV 16/18 vaccination in Guanacaste, Costa Rica. Vaccine. 2015. Apr 27;33(18):2141–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Farhat Z, Sampson JN, Hildesheim A, Safaeian M, Porras C, Cortés B, et al. Reproducibility, temporal variability, and concordance of serum and fecal bile acids and short chain fatty acids in a population-based study. Cancer Epidemiol Biomark Prev Publ Am Assoc Cancer Res Cosponsored Am Soc Prev Oncol. 2021. Oct;30(10):1875–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012. Mar 4;9(4):357–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Truong DT, Franzosa EA, Tickle TL, Scholz M, Weingart G, Pasolli E, et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat Methods. 2015. Oct;12(10):902–3. [DOI] [PubMed] [Google Scholar]
- 27.Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith GA, et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat Biotechnol. 2019. Aug;37(8):852–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Franzosa EA, McIver LJ, Rahnavard G, Thompson LR, Schirmer M, Weingart G, et al. Species-level functional profiling of metagenomes and metatranscriptomes. Nat Methods. 2018. Nov;15(11):962–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Kanehisa M, Furumichi M, Sato Y, Kawashima M, Ishiguro-Watanabe M. KEGG for taxonomy-based analysis of pathways and genomes. Nucleic Acids Res. 2023. Jan 6;51(D1):D587–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.McMurdie PJ, Holmes S. phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PloS One. 2013;8(4):e61217. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Dixon P VEGAN, a package of R functions for community ecology. J Veg Sci. 2003;14(6):927–30. [Google Scholar]
- 32.Chen J, Bittinger K, Charlson ES, Hoffmann C, Lewis J, Wu GD, et al. Associating microbiome composition with environmental covariates using generalized UniFrac distances. Bioinforma Oxf Engl. 2012. Aug 15;28(16):2106–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Koo TK, Li MY. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. J Chiropr Med. 2016. Jun;15(2):155–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Jousset A, Bienhold C, Chatzinotas A, Gallien L, Gobet A, Kurm V, et al. Where less may be more: how the rare biosphere pulls ecosystems strings. ISME J. 2017. Apr;11(4):853–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Garrett WS. Cancer and the microbiota. Science. 2015. Apr 3;348(6230):80–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Shang FM, Liu HL. Fusobacterium nucleatum and colorectal cancer: A review. World J Gastrointest Oncol. 2018. Mar 15;10(3):71–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Mandal S, Van Treuren W, White RA, Eggesbø M, Knight R, Peddada SD. Analysis of composition of microbiomes: a novel method for studying microbial composition. Microb Ecol Health Dis. 2015;26:27663. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Gloor GB, Reid G. Compositional analysis: a valid approach to analyze microbiome high-throughput sequencing data. Can J Microbiol. 2016. Aug;62(8):692–703. [DOI] [PubMed] [Google Scholar]
- 39.Gloor GB, Wu JR, Pawlowsky-Glahn V, Egozcue JJ. It’s all relative: analyzing microbiome data as compositions. Ann Epidemiol. 2016. May 1;26(5):322–9. [DOI] [PubMed] [Google Scholar]
- 40.Morton JT, Marotz C, Washburne A, Silverman J, Zaramela LS, Edlund A, et al. Establishing microbial composition measurement standards with reference frames. Nat Commun. 2019. Jun 20;10(1):2719. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Weiss S, Xu ZZ, Peddada S, Amir A, Bittinger K, Gonzalez A, et al. Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome. 2017. Mar 3;5(1):27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Scealy JL, de Caritat P, Grunsky EC, Tsagris MT, Welsh AH. Robust Principal Component Analysis for Power Transformed Compositional Data. J Am Stat Assoc. 2015. Jan 2;110(509):136–48. [Google Scholar]
- 43.McSkimming DI, Banack HR, Genco R, Wactawski-Wende J, LaMonte MJ. RE: “QUANTIFICATION OF HUMAN MICROBIOME STABILITY OVER 6 MONTHS: IMPLICATIONS FOR EPIDEMIOLOGIC STUDIES.” Am J Epidemiol. 2019. Apr 1;188(4):808–9. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data are available at the National Center for Biotechnology Information Sequence Read Archive (accession number PRJNA1045701).
