Skip to main content
PLOS Biology logoLink to PLOS Biology
. 2022 Mar 2;20(3):e3001556. doi: 10.1371/journal.pbio.3001556

Systematically assessing microbiome–disease associations identifies drivers of inconsistency in metagenomic research

Braden T Tierney 1,2,3,4, Yingxuan Tan 1,¤, Zhen Yang 2,3,4, Bing Shui 5, Michaela J Walker 6, Benjamin M Kent 7, Aleksandar D Kostic 2,3,4,*, Chirag J Patel 1,*
Editors: Jonathan A Eisen8, Marcus Munafò9
PMCID: PMC8890741  PMID: 35235560

Abstract

Evaluating the relationship between the human gut microbiome and disease requires computing reliable statistical associations. Here, using millions of different association modeling strategies, we evaluated the consistency—or robustness—of microbiome-based disease indicators for 6 prevalent and well-studied phenotypes (across 15 public cohorts and 2,343 individuals). We were able to discriminate between analytically robust versus nonrobust results. In many cases, different models yielded contradictory associations for the same taxon–disease pairing, some showing positive correlations and others negative. When querying a subset of 581 microbe–disease associations that have been previously reported in the literature, 1 out of 3 taxa demonstrated substantial inconsistency in association sign. Notably, >90% of published findings for type 1 diabetes (T1D) and type 2 diabetes (T2D) were particularly nonrobust in this regard. We additionally quantified how potential confounders—sequencing depth, glucose levels, cholesterol, and body mass index, for example—influenced associations, analyzing how these variables affect the ostensible correlation between Faecalibacterium prausnitzii abundance and a healthy gut. Overall, we propose our approach as a method to maximize confidence when prioritizing findings that emerge from microbiome association studies.


The human microbiome has been associated with many aspects of our health, but how many of these associations are truly reproducible? This study attempts to address this question by systematically testing the robustness of 581 microbial features that have been reported as being disease-associated.

Introduction

With its role in host health, the microbiome field is rapidly accelerating toward the clinic in the form of new diagnostics and therapeutics. An instrumental first step toward this lofty goal, however, is vetting individual microbial features (e.g., species abundance) for their association with disease. These microbiome association studies (MAS, i.e., identifying sets of microbiome features in observational datasets that are correlated with disease presence) have been critical in generating beaucoup hypotheses regarding the role of, specifically, the gut microbiome in many diseases, from autoimmune disorders to cancers [1,2]. However, MAS are subject to a litany of biases. From DNA extraction method to bioinformatics pipelines, every choice a researcher makes can potentially influence the conclusions derived from an observational analysis.

Additionally, variation in modeling strategy can lead to analogously varying findings [3,4]. One specific form of bias that can affect results is confounding, or not accounting or stratifying for variables that are associated with both a given microbial feature and a disease phenotype of interest [5]. Accounting for confounding in the microbiome space is particularly difficult due to the sheer volume of variables that can potentially affect microbiome composition as well as the millions of possible features (e.g., taxa, pathways, and genes) that can be identified in microbiomes.[6,7].

An analytic approach to address confounding includes adjusting by an a priori selection of potential confounders. Many choose a bespoke set of variables to control for a priori hypothesized confounding. However, this is a choice that must be justified. Sensitivity analyses (which fall under the broader umbrella of “multiverse” analyses) [8,9] allow investigators to explore the space of analytic choices (i.e., what specific variables to adjust for) that may influence modeling outcomes. These outcomes may include association sizes, predictions, and p-values, which can vary depending on modeling strategy, sampling size, and measurement error [3,1012]. These analyses may be particularly useful for discovery-based studies (very common in the microbiome and genomic fields), approaches designed to generate, rather than test specific candidate, hypotheses from complex datasets. We refer to the distribution of possible associations that emerge from different modeling scenarios when carrying out discovery as vibration of effects (VoE) [3,1315]. In some cases, slight changes to model specification yield polar opposite results (e.g., a particular microbiome feature being both negatively and positively associated with disease) [16]. In this manuscript, VoE is computed by, for each microbial feature–disease pairing, fitting all possible linear models, each adjusted by different features, while tabulating how the association between the microbial feature and disease changes. Robust microbial feature–disease associations are those whose association size does not change too much with respect to the number and type of adjustment variables in the model.

To date, the immense impact model choice and confounding can have on the microbiome has only been investigated in some isolated cases. For example, Forslund and colleagues found that patient use of metformin—a common antidiabetic medication—confounded the association between type 2 diabetes (T2D) and gut microbiome features, generating misleading and difficult-to-interpret conclusions [2]. Vieira-Silva and colleagues demonstrated that statins, a common cholesterol lowering therapeutic, confound associations between BMI and the gut microbiota [17]. Similarly, other studies have identified that certain features, like age and stool consistency, can confound associations with host phenotypes and microbiome data [1821]. Most recently, Vujkovic-Cvijin demonstrated that dietary variables, age, sex, and BMI can confound associations between a range of diseases and the gut microbiome [6]. These studies, however, consider limited and candidate groups of potential confounding adjustment variables, and they do not systematically assess how sets of confounders or varying study designs (when considered together or separately) influence association size and direction (e.g., associated with risk for disease or protective of disease) across published results.

Here, to gauge the impact of model specification in MAS, we deploy a systematic sensitivity analysis, measuring VoE in reported microbiome associations. Comparing modeling strategies, we quantify the robustness (variation as a function of model specification) in microbial taxon–disease associations across 6 different phenotypes. We counted how many associations (published and otherwise) are recovered (e.g., appear as statistically significant) or lost when undergoing sensitivity analysis. We propose modeling VoE as one of many potential steps in building association prioritization frameworks, metrics for prioritizing microbiome findings for in vivo validation.

Results

Meta-analysis combined with modeling vibration of effects recovers and prioritizes associations

From an existing database [22] of public metagenomic shotgun sequencing data, we accessed 15 studies comprising samples from patients with 6 diseases: colorectal cancer (CRC), type 1 diabetes (T1D), T2D, atherosclerotic cardiovascular disease (ACVD), inflammatory bowel disease (IBD), and liver cirrhosis (CIRR) (Fig 1A and 1B). We built a database of taxa that were prevalent (in >10% of samples) in each study. We then searched the literature, taxon by taxon, for reports of each feature being associated with any of the 6 diseases (see Methods). Importantly, many (214, 37.8%) of these findings were directly from papers present in the data used in this study (in the case of CRC, ACVD, CIRR, T2D, and T1D; S1 Table).

Fig 1.

Fig 1

(A) Overview of approach. We extract prevalent microbial features from our datasets and attempt to reproduce the findings from the literature by modeling VoE. We additionally review the literature for reported gut microbiome associations (their reported direction of correlation) with 6 diseases of interest. (B) Volcano plots showing the output from the initial, univariate associations. Point color corresponds to if an association was identified in our literature review solid line represents FDR significance (adjusted p < 0.05). (C–F) Examples of robust (C) and nonrobust associations. Each point represents a different modeling strategy. Solid line is nominal (p < 0.05) significance. This figure can be generated using the code deposited in https://github.com/chiragjp/ubiome_robustness and the data deposited in https://figshare.com/projects/Microbiome_robustness/127607. ACVD, atherosclerotic cardiovascular disease; CIRR, cirrhosis; CRC, colorectal cancer; FDR, false discovery rate; IBD, inflammatory bowel disease; T1D, type 1 diabetes; T2D, type 2 diabetes; VoE, vibration of effects.

We computed initial, univariate associations for each taxon in our aggregated dataset. We additionally benchmarked the data transformation and modeling strategies underlying these associations (S1 Text, S1 Fig). We refer to this model, which contains the phenotypic variable of interest as the sole covariate, as the baseline model (S1 Table). Three of these diseases (T2D, T1D, and CRC) had data spread across multiple cohorts. For these, we meta-analyzed across individual associations within each cohort to compute overall summary statistics. We found a total of 720 features that were statistically significant after adjusting for false discovery rate (FDR), 199 (24.8%) of which were reported in the literature (Fig 1B). The number of significant features was dependent on phenotype and number of cohort analyzed; for example, T1D had no statistically significant results, CRC had 29 (52.7% of which were reported in the literature), and CIRR had 298 (25% of which were reported in the literature).

We next executed a systematic VoE analysis, fitting a total of 6,035,110 models, each employing multiple linear regression with microbial feature abundances as the dependent variable (S1 Table contains information on the number and type of covariates per disease). Some associations (e.g., Fig 1C) were robust; that is to say, regardless of modeling strategy they yielded similar results (e.g., same direction of association across models). Others, however, demonstrated striking variation in output as a function of modeling strategy (e.g., Fig 1D). In some cases, very similar models (e.g., only different by inclusion/exclusion of a single variable or 2 in the model) yielded opposite and nominally significant results (Fig 1E and 1F).

We next aimed to stratify associations by their (1) presence in the literature; and (2) robustness/recovery as a function of vibration effects. A total of 509 features were FDR significant at baseline, additionally during VoE, and not reported in the literature review (Fig 2A, blue column). A total of 187 were FDR significant at baseline and reported in the literature (Fig 2A, orange column). An additional 61 of others (10% of all literature-based associations) found in the literature were FDR significant in at least 1 vibration but not in the baseline model (Fig 2A, red columns). This brought the total number of taxa–disease associations we were able to recover to 248. Additionally, 264 taxa were not found in the literature but were FDR significant at least once during a vibration (Fig 2A, green columns). In other words, we observed that modeling VoE was able to shed light on associations that would be potentially overlooked by single modeling strategies, in some cases recovering results reported in the literature.

Fig 2. Comparing single modeling approaches to modeling VoE.

Fig 2

(A) Stratifying features by their being identified in our literature review, p-value/FDR significant (<0.05) in our initial analysis, or having at least 1 significant model upon vibrations. The gray bars labeled “set size” indicate the number of features associated with a given row in the bottom of the plot (e.g., about 1000 features were p-value significant in the initial model). The gray bars in the upper portion of the panel are those we chose not to highlight, as they do not fall into any category indicated by the colors or referenced in the manuscript. (B) Distribution of fraction of models matching literature review–defined direction of associations. This figure can be generated using the code deposited in https://github.com/chiragjp/ubiome_robustness and the data deposited in https://figshare.com/projects/Microbiome_robustness/127607. ACVD, atherosclerotic cardiovascular disease; CRC, colorectal cancer; FDR, false discovery rate; IBD, inflammatory bowel disease; T1D, type 1 diabetes; T2D, type 2 diabetes; VoE, vibration of effects.

As another measure of robustness, for each taxon for each disease, we report the fraction of associations with signs matching the literature (Fig 2B). For all diseases except T1D, these distributions matched what was reported before was bimodal. The mode of the distribution was closer to 1, indicating a large frequency of high concordance associations and a moderate frequency of extremely low concordance (i.e., almost all models pointing the opposite direction as the literature) associations. Given the distribution of the data in Fig 2B, we defined a low concordance association as agreeing in direction in 50% of models fit. In total, 27.9% of all features fit into this category; in other words, 1 in 3 features were discordant with the literature in 50% of all vibrations.

Association robustness varies as a function of disease and cohort

For the remainder of the analysis, we opted to prioritize (and thereby analyze) only the 581 findings that we identified as reported in the literature (Fig 2A, orange columns, S2 Table). The robustness (i.e., number of associations with consistent direction and significance) of prior-reported associations varied for different disease phenotypes (Fig 3). CIRR had the greatest fraction of associations that achieved initial FDR significance (64/106, 60.4%), followed by ACVD (47/96, 49.0%), followed by IBD (46/140, 32.9%). T1D and T2D contained the least robust associations, with 0 features (out of 34) and 5 features (out of 96, 5.2%) achieving FDR significance, respectively. For T2D, this is particularly surprising as the species we selected to analyze had been reported as significantly associated with disease even when adjusting for metformin usage.

Fig 3. VoE for reported associations from the literature in the form of summarized modeling output.

Fig 3

(A) We measure association robustness in part by computing the fraction of associations with signs greater than 0. These values show the distributions of these values for each microbial taxon). (B) Summarized modeling output. Red blocks indicate organisms that were FDR significant in our study. The middle bar describes the fraction of association sizes greater than 0 per taxon: a highly confounded association will be closer to 0.5 and pink, whereas more robust associations will be closer to 0 or 1 and green. The gray bars in the upper bar plot corresponds the fraction of models that were nominally (p-value < 0.05) significant for the microbial feature–disease association, whereas the black bars correspond to the fraction of models that were FDR significant. Features marked as significant in our study but never FDR significant were only significant after the meta-analysis and did not have any nominal significant p-values. See S2 Fig for this plot reproduced with species names on the x-axis. This figure can be generated using the code deposited in https://github.com/chiragjp/ubiome_robustness and the data deposited in https://figshare.com/projects/Microbiome_robustness/127607. ACVD, atherosclerotic cardiovascular disease; CRC, colorectal cancer; FDR, false discovery rate; IBD, inflammatory bowel disease; T1D, type 1 diabetes; T2D, type 2 diabetes; VoE, vibration of effects.

Of the 325 features that, after vibrating, had at least 1 FDR significant model, 114 (35%) were highly nonrobust, with at least 20% of models conflicting in association direction. This was most striking for T1D and T2D (Fig 3, bottom row), where nearly all tested associations were nonrobust. The diseases for which we had only a single cohort dataset (ACVD, IBD, and CIRR; see Methods) had greater association consistency and higher fractions of statistically significant findings when compared to the multicohort, meta-analyzed, phenotypes. This indicates that the single cohort associations should be further validated/tested in other populations to confirm their robustness.

We next sought to probe researcher degrees of freedom: the probability that a statistically significant association would arise in the event that a researcher were to fit a single model instead of multiple. We calculated the fraction of FDR adjusted (using the cutoff from our prior analyses) and nominal p-values that were less than 0.05 for all models for a given feature from our literature review (Fig 3, S2 Fig). For all models for a given taxonomic feature, on average, 38% had nominally significant (or 16% FDR significant) p-values. Of the reported associations we tested, 488 (84.0%) had at least 1 model that was nominally significant, and 248 (42.6%) had at least 1 model that was FDR significant.

We additionally identified particular cases where our baseline modeling approach was clearly too conservative, initially excluding associations reported in the literature that, after modeling VoE, clearly should be of interest. For example, in the association between Roseburia and ACVD, the association sizes all pointed in the same direction, and 74/127 (58.6%) of models were FDR significant, despite the univariate association being not (S3 Fig). We examined the variables present in each model and identified that 64/74 of the FDR significant vibrations were adjusted for gender, whereas none of the nonsignificant models were.

A vast array of possible adjusting covariates influence model output in microbiome associations

Motivated in part by this Roseburia—gender result and similar observations, we next aimed to identify the sources of VoE in our associations en masse, computing how variation in estimate size can be attributed to the presence or absence of specific adjusting variables from a model. We used a mixed effect linear modeling approach (see Methods) to determine how associations with a given disease changed as a function of presence or absence of other adjusting variables (e.g., age, sex, and BMI). We hypothesized that this approach would identify different kinds of biasing adjusters, like confounder or collider variables. As a form of benchmarking, we estimated in the T2D cohort how the beta coefficients on the adjusting variables from the mixed modeling approach changed as a function of the number of vibrations executed (S4 Fig). We found 100,000 vibrations (our upper limit) to be sufficient to identify consistent correlation between these beta coefficients (Pearson >0.9).

Accounting for BMI, age, sequencing depth, and gender all had strong influence on both the size and direction of microbial feature–disease associations. While these variables altered the absolute value of association size across all diseases, the direction and order of magnitude of change (i.e., inflating or deflating associations) were, as expected, disease specific. Adjusting for gender, for example, consistently increased association sizes across diseases, with the only exception being CIRR. In the case of T2D, country of origin inflated association sizes, whereas it deflated association sizes for IBD and CRC.

T2D and T1D were influenced by a wide array of adjusting variables, some on the pathway for disease, others not. First, adjustment for glucose levels inflated association sizes for both diseases. We found feeding practice, as well as HLA genotype (Fig 1E), to have a substantial influence on associations. Feeding practice has been reported as a confounder of T1D [23]. T2D associations were also substantially affected by blood pressure, BMI, cholesterol, creatinine, and HbA1c. In prior studies [2,24], metformin usage also confounded associations; however, its effect was not as strong as some of the other adjusting variables (Fig 4A), such as adiponectin. Similarly, alcohol use appeared to bias CIRR associations and cholesterol levels influenced associations in ACVD. In Fig 4B and 4D, we indicate the impact of these and other adjustment variables visually, showing which models accounted for them during the vibration analysis. For example, models including delivery mode or feeding practice in Streptococcus and T1D associations tended to yield positive associations (e.g., an increase in Streptococcus associated with higher T1D odds), whereas adjusting for cholesterol yielded negative associations (an increase in Streptococcus associated with lower T1D odds). Similarly, in T2D, adjusting for HbA1c in Roseburia associations yielded positive associations, whereas adjusting for creatinine yielded the opposite, on average.

Fig 4. The effects of different adjusters on human microbiome associations.

Fig 4

(A) Various adjusters for our diseases of interest. For each disease in our study, we report the change in the association sizes between microbiome features and disease as a function of adjusting variable presence or absence (See Methods). Each individual plot summarizes the output for the 2^n models fit for each feature within a given disease, where n = number of adjusters. The y-axis corresponds to the mean change in Beta coefficient (in units of relative abundance) on the independent, binary disease outcome when a given adjusting variable (x-axis) is included in the model. (B–D) Visualization of the impact of the presence/absence of different confounders for 3 organisms and their associations with T1D/T2D. This figure can be generated using the code deposited in https://github.com/chiragjp/ubiome_robustness and the data deposited in https://figshare.com/projects/Microbiome_robustness/127607. ACVD, atherosclerotic cardiovascular disease; BP, blood pressure; CRC, colorectal cancer; IBD, inflammatory bowel disease; T1D, type 1 diabetes; T2D, type 2 diabetes.

Vibration of effects reveals disease-specific variation in Faecalibacterium prausnitzii associations

We additionally took interest in the microbe F. prausnitzii, as it was reported in the literature [2529] as negatively associated with 5 out of 6 diseases except T1D. We found this negative association to be highly robust for 3/5 diseases; however, CRC and T2D exhibited notable inconsistency in association direction (Fig 5A).

Fig 5. Exploring the impact of variable adjustment strategies on F. prausnitzii ~ disease associations.

Fig 5

(A) VoE for the 5 phenotypes that had associations with F. prausnitzii reported in the literature. (B, C) The impact of variable adjusting strategies for (B) T2D and (C) CRC. This figure can be generated using the code deposited in https://github.com/chiragjp/ubiome_robustness and the data deposited in https://figshare.com/projects/Microbiome_robustness/127607. ACVD, atherosclerotic cardiovascular disease; BP, blood pressure; CRC, colorectal cancer; HDL, high-density lipoprotein; HSCRP, high-sensitivity C-reactive protein; IBD, inflammatory bowel disease; LDL, low-density lipoprotein; T2D, type 2 diabetes; VoE, vibration of effects.

We identified that the association between T2D and F. prausnitzii tended to be positive when we adjusted for glucose, high-sensitivity C-reactive protein (HSCRP), and cholesterol (Fig 5B). For CRC, the association tended to appear positive when models were adjusted for cholesterol (Fig 5C). It should be noted that it is possible that some of these variables could be colliders (e.g., BMI and T2D). Comparably, the immediately robust associations in the other 3 diseases should be treated with higher priority and perhaps are worthy of validation in wet lab experiments.

Discussion

In this study, we explored the utility of modeling VoE for MAS. Across a number of diseases with varying number of cohorts, sample sizes, and variables, we showed how massive-scale, automated sensitivity analysis can be used to (1) query how associations found in the literature may change as a function of model choice; (2) identify overlooked associations (such as the association between Roseburia and ACVD, which a single modeling approach may miss); and (3) identify sets of potentially important, disease-specific confounding variables that should be accounted for in current and future MAS. We do not claim that our approach is the be-all-and-end-all of microbiome statistical sensitivity analysis. It is merely one approach for quantifying association robustness. It is for this reason that we hesitate to claim that we are holistically measuring reproducibility, as the requirements for rigorously doing so are nebulous [30] and would likely require considering all analytic biases that pervade microbiome analyses, something outside the scope of this project. For example, to assess reproducibility in totality, we should have recreated modeling strategies (e.g., using particular adjusting variables or different regression approaches) used in published results as opposed to instead fitting univariate models.

A few microbiome–disease associations, such as Helicobacter pylori and gastric cancer [31], Akkermansia muciniphila and obesity [32], and Fusobacterium nucleatum and CRC [33], have been replicated in multiple studies or experimentally validated (e.g., in animal models or clinical trials). Until we have a framework for the sensitivity of the resultant observational host disease–microbiome associations, the thousands of associations published will be of limited value in experimental settings.

We claim that one step—out of many—toward translating microbiome findings into biological understanding is determining how best to prioritize for future (e.g., in vivo) investigation associations arising from MAS. There is a need for association prioritization frameworks: the contexts in which of these manifold associations are most worthy of wet lab experimentation [3436]. In this study, we do so by identifying which associations are consistent in direction and statistically significant across multiple cohorts. Furthermore, we propose modeling VoE to pressure test existing associations, and, sidestep reliance on single models or metrics (e.g., p-values), potentially recovering or identifying features that would be overlooked with only one modeling approach (as in S3 Fig). Further, VoE provides a path forward for identifying adjusting variables that may influence association size and direction.

That said, modeling VoE is certainly not the only way to identify an association worth prioritizing. Furthermore, an ostensibly robust association viewed only through the lens of VoE still could be a false positive or dependent on, for example, data processing pipeline choice (e.g., the decision to average repeated measures data versus selecting one sample per individual) [37]. A more comprehensive framework could rely on a number of heuristics, for example, putting the greatest emphasis on associations that have the best model fit, are reported across multiple large cohorts, and/or have undergone sensitivity analysis via VoE.

Of course, our approach is not without other drawbacks. First, we did not preregister our study, which would set an even higher bar for reproducible results. This is particularly difficult for hypothesis-generating studies, though, as the outcomes cannot be preset—for example, our decision to focus on F. prausnitzii as a case study could not have been made until seeing the results.

Second, as we observed in S1 Fig, vibration may be contingent on the number and type of variables measured, the size of the cohort, and measurement error. Further, sources of variables that influence associations that we identified could themselves hold interesting biological signals. For example, diet, BMI, and cholesterol had strong influence on associations and are worthy of further investigation. Specifically, these variables may interact with microbes in their associations with disease (e.g., m higher BMI and presence of a microbe may have larger associations than the association across all BMI groups).

Third, our approach, at present, relies on exclusively linear modeling due to its speed and the ease of performing statistical inference on its association. Many microbiome studies have adopted random forests, which can capture nonlinearities but can be difficult to perform inference on or interpret individual variables [3840]. Moreover, we ourselves made choices in our modeling strategy that could influence statistical power such as averaging repeated samples from the same individual. Finally, associations can also vary due to data processing choices, like different approaches quantifying taxon abundances, such log transforming or discretizing, yielding more or less reproducible results. Study design characteristics, such as error in the measurement, inclusion criteria of individuals, and sample size, will also influence associations. Alternatively, different studies use different methods or cutoffs for computing FDR (e.g., 0.15 or 0.07) [41,42]. Many of the associations that we extracted (214 of the 581) came directly from the cohorts analyzed here; therefore, disparities in associations, we claim, can be attributed to modeling approach and parameters and not to other differences in study design, such as sample size.

Finally, our approach does not in any way address causality. Indeed, many of the adjusters we identify could be on the pathway for a human disease, or a mediator [43], as in the case of alcohol usage and liver CIRR or BMI and T2D. In this sense, it is crucial to note that modeling VoE with this approach does not distinguish between certain forms of bias—like collider bias—that could be mistaken for confounding. As we stated before, however, this application of modeling VoE is meant to only identify potential variables of interest that are worth consideration and may or may not be confounders.

Notably, the systematic nature of VoE stands apart from many traditional modeling approaches. Instead of finding one correct model with a priori knowledge, it assumes a limited theoretical basis for how to model a question (e.g., what adjusters to include). While it is useful young fields like the microbiome, it may also introduce other biases (e.g., colliders) as a function of its use and may not be relevant for all disciplines, especially those steeped in theory (e.g., economics). It complements existing approaches, such as Bayesian model averaging [44], whose primary goal is to provide an optimal single predictor by averaging across the many different models. Therefore, we posit here that VoE, which in future work should be compared to these other methods, should at present be used primarily as a way to probe associations from different modeling strategies to systematically assess the combination of potential adjustments. Further still, models need to be checked to ensure that they do not violate any assumptions. For example, we claim that our F. prausnitzii results indicate the need for further exploration to determine the biological relationship between F. prausnitzii and glucose levels.

Fundamentally, it is a broad scientific challenge to prioritize the best findings from observational studies in human populations to test their causal mechanism in an experimental setting. The history of science in this field is long, dating back from Fisher [45] to Bradford Hill [46] and to the current day of counterfactual and causal modeling.[47] However, these heuristics or approaches have yet to be extended to high-throughput observational research (many input variables and outcome variables) such as microbiome research [48]. Further still, criteria such as the Bradford Hill criteria, may not be compatible with high-throughput research. For example, Bradford Hill criteria such as “strength of association” may be impossible to fulfill, as microbiome–human disease association sizes are generally small and sensitive to adjustments.

In short, fitting and reporting a single model that encapsulates a few assumptions will not be conducive to efficiently and consistently delivering clinically actionable biology from microbiome association studies. However, if—in part by modeling VoE—we are able to identify robust-to-model assumption associations that reproduce across cohorts, we are one step closer to achieving clinical relevance for microbiome-based diagnostics and a deeper understanding of the role of the microbiome in human biology.

Methods

Data collection

curated Metagenomic DataV1.14.1 [22] contains a large number of shotgun sequencing microbiome datasets preprocessed with HUMAnN2 [49] output for each sample. We used this dataset for our analysis, using only the taxa-level (MetaPhlAn2) data. We selected for diseases that had either greater than 100 samples or had multiple cohorts, excluding 2—infectious gastroenteritis and otitis—due to their limited presence in the microbiome literature. This resulted in our including 2,343 samples, 15 cohorts, and 6 diseases (CRC, T1D, T2D, ACVD, IBD, and liver CIRR).

We would additionally like to note that our study was not preregistered. While this is typical for both microbiome studies and hypothesis generating (versus hypothesis testing) studies in general, we do acknowledge (and we thank a reviewer of this manuscript for pointing this out) that it would be useful for these metascience applications going forward. Indeed, given that our literature review involved searching for single reports of microbial species already in our dataset associated with disease (as opposed to looking for all possible reported associations), some form of preregistration could have clarified this approach. Going forward, preregistration would set an even higher bar for robust MAS.

Literature review

We used PubMed as the source of our literature review. We did not aim to make this literature systematic in the sense that we were searching for every reported finding in the literature across multiple databases. Rather, in order to target our core question (what fraction of reported associations are reproducible in the datasets we had gathered), we were interested strictly if a given species found in curated metagenomic data (with greater than 0% abundance in at least 10% of samples for a given cohort) had been reported in the literature as ever associated with 1 of our 6 diseases of interest. As a result, we took the following general approach:

  1. We used NCBI’s Entrez search utility to download information on all papers matching disease-specific criteria. This script is available at https://github.com/chiragjp/ubiome_robustness.

  2. We then filtered out papers that were either not in human systems or involved clinical endpoints not relevant to the disease of interest (e.g., the association between the microbiome and a specific disease treatment).

  3. We then read the remaining publications and looked specifically for reports that the microbes present in curatedMetagenomicData were associated with a given disease. Upon identifying a given association, we recorded the microbe, the direction of association, and the source paper. We opted to not record all of the different sources in which a microbe had been reported in the literature, instead only recording the reference where we initially encountered it. The exception to this rule was if we found conflicting reports of the direction of the association (i.e., positively correlated with disease, negatively in another). In this case, we recorded the second publication that reported the opposite sign association.

In the case of reviews, we reported the associations the review described, recording as well the publications cited by the review that contained the original reporting of a given association.

We had additionally specific subprotocols for the phenotypes in certain cases, which we detail in the following 4 blocks of text:

  • T2D: Given the reported confounding between T2D and metformin and massive variation in modeling strategies reported in the literature, we specifically recorded not just the initial association we encountered, but all of them found within our downloaded papers from PubMed. We then only included associations that had been reported consistently as either (1) adjusted for metformin usage across all studies reporting them or (2) consistently not adjusted for metformin across all studies. We did not include microbes that were, for example, associated with T2D in one study, but not found to be associated with T2D in another study where the modeling strategy was adjusted for metformin.

  • IBD: We included in our analysis microbes associated with either Crohn disease, Ulcerative Colitis, or both.

  • CIRR: We only included cohorts that included patients with the same description as those in the paper our data came from [50]. This was limited to patients with any kind of liver CIRR, but it excluded those where the disease had advanced to or was comorbid with other liver conditions (e.g., carcinoma).

  • T1D: Our metadata indicated if patients were ever diabetic or only healthy. Given that we chose to average across samples, creating one sample for each individual, we opted to only select findings from the literature that compared T1D cases to T1D controls. We did not, for example, include studies that attempted to compare T1D cases prior to onset to healthy individuals.

Modeling vibration of effects

We used the quantvoe package (https://github.com/chiragjp/quantvoe) to compute associations, meta-analyses, and model VoE for each disease–microbe pair of interest. We first computed an initial, univariate association for each pair. These took the form of a standard linear regression, ln (microbial_feature + f) ~ disease, where the disease variable is a binary variable indicating disease status and the microbial_feature variable corresponds to the relative abundance of a particular taxon. f is a fudge factor of 0.00000001 to account for 0 values prior to logging our data. For diseases with repeated sampling per individuals (T1D and IBD), we computed the average abundance of each feature within an individual during the entire observation window. For diseases found in multiple cohorts (T1D, T2D, and CRC), we computed a random effects meta-analysis using R’s metafor [51] package over the initial association outputs (estimates and standard errors) for each input cohort (parameters: comb. fixed = FALSE, comb. random = TRUE, method. tau = ‘REML’,hakn = FALSE, prediction = TRUE, sm = “SMD”, control = list(maxiter = 1000)).

After computing these initial associations, we adjusted for FDR across all 6 diseases simultaneously (i.e., generating just 1 cutoff instead of 6) using the Benjamini–Yekutieli (BY) method, selecting 0.05 as our significance threshold.

We then computed the VoE for each microbial feature—disease pairing of interest by fitting an array of models with varying specification strategies (e.g., differing covariates). We used clinical/human phenotypic covariates present in curated metagenomic data’s combined metadata file as our database of adjusting variables. We only used those for which we had recorded information for 90% of samples. We only allowed a maximum of 20 adjusting variables per model.

The only case in which we did not fit every possible model given the available metadata was for one of the T2D cohorts, which, given the number of potential adjusting variables, yielded millions of possible models. Given that we were to vibrate over many thousands of features associated in our initial meta-analysis with T2D, we found computing so many models for each one to be computationally intractable. As such, we selected, for each feature, 10,000 models to fit at random.

Identification of adjusting variables that heavily confounded associations

For each disease, we modeled the association between the presence or absence of a given adjuster and the change in the absolute value of the average beta coefficient on the independent, binary disease variable across all microbial features. To account for shifts in the average association between a feature and disease, we used a mixed effect model, with a random effect for individual taxa (Eq 1, n corresponds to the number of possible adjusters for a given disease, and i corresponds to the number of taxa investigated in relation to that disease).

abs(estimate_on_disease_variable)adjuster1++adjustern+(1|taxon1..i) (1)

The resulting estimate size on each of the binary adjuster variables in Eq 1 indicates the change in average microbial_feature ~ disease association when a given adjuster is present. These values are reported in the bar plots in Fig 4.

Benchmarking data transformations, adjusting variables, and vibration numbers

We used the T2D datasets (which have, in total, the most number of potential adjusting variables recorded) to compare the impact of different data transformations, adjusting variables, and vibration numbers on our results. We compared center logged ratio (CLR) transformations on each dataset, our logging strategy described above, and the raw abundance data, running our entire pipeline with 10,0000 vibrations. We additionally compared our results when 3, 6, or 9 variables were selected at random from each cohort.

As a measure of robustness, we computed the fraction of all associations that were positive, with fractions approaching 1 or 0 being highly robust, and fractions approaching 0.5 being nonrobust (e.g., reporting conflicting results close to half of the time). For S2D Fig, we computed the Pearson correlation between these fractions.

Finally, we additionally compared the results of the mixed effects confounder analysis as a function of number of vibrations fit. In S4 Fig, we report the output of this analysis and the correlation between the estimate impact of each adjusting variable on model output.

Plotting and figure generation

All plots were made with R’s ggplot2 [52] package. We assembled figures in Adobe Illustrator.

Other software information

All statistical analyses were conducted in R. We ran the VoE pipeline on Harvard Research Computing’s O2 system.

Supporting information

S1 Table. Cohort summary statistics and adjusters present for each disease and references for species of interest from associations in the literature.

This figure can be generated using the code deposited in https://github.com/chiragjp/ubiome_robustness and the data deposited in https://figshare.com/projects/Microbiome_robustness/127607.

(XLSX)

S2 Table. Output of initial associations and summary of VoE for findings reported in the literature.

This figure can be generated using the code deposited in https://github.com/chiragjp/ubiome_robustness and the data deposited in https://figshare.com/projects/Microbiome_robustness/127607. VoE, vibration of effects.

(XLSX)

S1 Fig. Benchmarking data transformations and the number of adjusting variables using our T2D datasets.

(A–C) The impact of the different numbers of vibrations and data transformation methods on VoE. We plot the number of features that were FDR significant at least once upon vibration with different numbers of adjusting variables considered as well as different data transformation strategies (i.e., logged versus raw abundances versus center log ratio transformations). (D) The number of p-value and (E) FDR significant findings for the T2D cohort with the largest number of possible adjusters (using only log-transformed data, as opposed to the previous 3 panels). (F) The robustness of associations as a function of number of vibration variables and modeling strategy. We computed the fraction of associations that were positive for any given microbial feature—a highly robust association is 100% positive or 0% positive (i.e., negative), whereas a nonrobust association is closer to 50% positive (i.e., inconsistent in direction). In this heatmap, we correlated these associations for all features to gauge if the different data transformations and numbers of adjusting variables yielded similar measures of robustness across all datasets. This figure can be generated using the code deposited in https://github.com/chiragjp/ubiome_robustness and the data deposited in https://figshare.com/projects/Microbiome_robustness/127607. FDR, false discovery rate; T2D, type 2 diabetes; VoE, vibration of effects.

(PDF)

S2 Fig. VoE for reported associations from the literature. Bolded species names on the x-axis correspond to organisms that were FDR significant in our study. N corresponds to the number of models that converged for a given organism.

The middle bar describes the fraction of association sizes greater than 0 for a given association: A highly confounded association will be closer to 0.5 and pink, whereas more robust associations will be closer to 0 or 1 and blue. The gray bars in upper bar plot corresponds the fraction of models that were nominally (p-value < 0.05) significant for the microbial feature–disease association, whereas the black bars correspond to the fraction of models that were FDR significant. This figure can be generated using the code deposited in https://github.com/chiragjp/ubiome_robustness and the data deposited in https://figshare.com/projects/Microbiome_robustness/127607. FDR, false discovery rate; VoE, vibration of effects.

(PDF)

S3 Fig. The association between Roseburia and ACVD, which we could have overlooked if only fitting a single modeling strategy.

Each point represents a different model specification, the x-axis is the beta coefficient on the binary disease variable, the y-axis is the −log10(p-value). The dotted line represents FDR adjusted significance. The solid line represents nominal significance. This figure can be generated using the code deposited in https://github.com/chiragjp/ubiome_robustness and the data deposited in https://figshare.com/projects/Microbiome_robustness/127607. ACVD, atherosclerotic cardiovascular disease; FDR, false discovery rate.

(PDF)

S4 Fig. Benchmarking the number of vibrations needed to estimate the effect of confounding on microbiome associations.

(A) The output of our confounder analysis (e.g., in Fig 4). The x-axis is the number of vibrations. The y-axis is each possible adjusting variable in the T2D associations. The values correspond to the beta coefficient (from our mixed effects analysis) describing the average change in microbiome–disease associations when a given adjusting variable is present in a model. (B) The correlation between the values in panel A at different numbers of vibrations. This figure can be generated using the code deposited in https://github.com/chiragjp/ubiome_robustness and the data deposited in https://figshare.com/projects/Microbiome_robustness/127607. T2D, type 2 diabetes.

(PDF)

S1 Text. The impact of the number of adjusting variables and data transformation method on VoE. VoE, vibration of effects.

(DOCX)

Acknowledgments

We thank Harvard Research Computing for providing compute resources for this work.

Abbreviations

ACVD

atherosclerotic cardiovascular disease

BY

Benjamini–Yekutieli

CIRR

cirrhosis

CLR

center logged ratio

CRC

colorectal cancer

FDR

false discovery rate

HSCRP

high-sensitivity C-reactive protein

IBD

inflammatory bowel disease

MAS

microbiome association studies

T1D

type 1 diabetes

T2D

type 2 diabetes

VoE

vibration of effects

Data Availability

Scripts used in the analysis (i.e. to generate figures and download literature review information) can be found at https://github.com/chiragjp/ubiome_robustness. All the relevant microbiome datasets can be downloaded from the R package associated with curatedMetagenomicData [Pasolli et al. Accessible, curated metagenomic data through ExperimentHub. Nat Methods. 2017;14: 1023–1024. ]. All analyzed data from this study can be located at https://figshare.com/projects/Microbiome_robustness/127607.

Funding Statement

CJP and BTT were funded by the National Science Foundation (#1636870), the National Institute of Allergy and Infectious Diseases (R01AI127250), and the National Institute of Health Sciences (R01ES032470). ADK was funded by an American Diabetes Association Pathway to Stop Diabetes Initiator Award (#1-17-INI-13) and the Smith Family Foundation Award for Excellence in Biomedical Research. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Wirbel J, Pyl PT, Kartal E, Zych K, Kashani A, Milanese A, et al. Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer. Nat Med. 2019;25:679–89. doi: 10.1038/s41591-019-0406-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Forslund K, Hildebrand F, Nielsen T, Falony G, Le Chatelier E, Sunagawa S, et al. Disentangling type 2 diabetes and metformin treatment signatures in the human gut microbiota. Nature. 2015;528:262–6. doi: 10.1038/nature15766 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Patel CJ, Burford B, Ioannidis JPA. Assessment of vibration of effects due to model specification can demonstrate the instability of observational associations. J Clin Epidemiol. 2015;68:1046–58. doi: 10.1016/j.jclinepi.2015.05.029 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Nearing JT, Douglas GM, Hayes M, MacDonald J, Desai D, Allward N, et al. Microbiome differential abundance methods produce disturbingly different results across 38 datasets. bioRxiv. 2021:p. 2021.05.10.443486. doi: 10.1101/2021.05.10.443486 [DOI] [Google Scholar]
  • 5.Forstmeier W, Wagenmakers E-J, Parker TH. Detecting and avoiding likely false-positive findings—a practical guide. Biol Rev Camb Philos Soc. 2017;92:1941–68. doi: 10.1111/brv.12315 [DOI] [PubMed] [Google Scholar]
  • 6.Vujkovic-Cvijin I, Sklar J, Jiang L, Natarajan L, Knight R, Belkaid Y. Host variables confound gut microbiota studies of human disease. Nature. 2020. doi: 10.1038/s41586-020-2881-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Rothschild D, Weissbrod O, Barkan E, Kurilshikov A, Korem T, Zeevi D, et al. Environment dominates over host genetics in shaping human gut microbiota. Nature. 2018. doi: 10.1038/nature25973 [DOI] [PubMed] [Google Scholar]
  • 8.Steegen S, Tuerlinckx F, Gelman A, Vanpaemel W. Increasing Transparency Through a Multiverse Analysis. Perspect Psychol Sci. 2016;11:702–12. doi: 10.1177/1745691616658637 [DOI] [PubMed] [Google Scholar]
  • 9.Harder JA. The Multiverse of Methods: Extending the Multiverse Analysis to Address Data-Collection Decisions. Perspect Psychol Sci. 2020;15:1158–77. doi: 10.1177/1745691620917678 [DOI] [PubMed] [Google Scholar]
  • 10.Le Goallec A, Tierney BT, Luber JM, Cofer EM, Kostic AD, Patel CJ. A systematic machine learning and data type comparison yields metagenomic predictors of infant age, sex, breastfeeding, antibiotic usage, country of origin, and delivery type. PLoS Comput Biol. 2020;16:e1007895. doi: 10.1371/journal.pcbi.1007895 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Klau S, Hoffmann S, Patel CJ, Ioannidis JPA, Boulesteix A-L. Examining the robustness of observational associations to model, measurement and sampling uncertainty with the vibration of effects framework. Int J Epidemiol. 2020. doi: 10.1093/ije/dyaa164 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Ioannidis JPA, Tarone R, McLaughlin JK. The False-positive to False-negative Ratio in Epidemiologic Studies. Epidemiology. 2011;22:450–6. doi: 10.1097/EDE.0b013e31821b506e [DOI] [PubMed] [Google Scholar]
  • 13.Chu L, Ioannidis JPA, Egilman AC, Vasiliou V, Ross JS, Wallach JD. Vibration of effects in epidemiologic studies of alcohol consumption and breast cancer risk. Int J Epidemiol. 2020;49:608–18. doi: 10.1093/ije/dyz271 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Palpacuer C, Hammas K, Duprez R, Laviolle B, Ioannidis JPA, Naudet F. Vibration of effects from diverse inclusion/exclusion criteria and analytical choices: 9216 different ways to perform an indirect comparison meta-analysis. BMC Med. 2019;17:174. doi: 10.1186/s12916-019-1409-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Tierney BT, Anderson E, Tan Y, Claypool K, Tangirala S, Kostic AD, et al. Leveraging vibration of effects analysis for robust discovery in observational biomedical data science. PLoS Biol. 2021;19(9):e3001398. doi: 10.1371/journal.pbio.3001398 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Tierney BT, Tan Y, Kostic AD, Patel CJ. Gene-level metagenomic architectures across diseases yield high-resolution microbiome diagnostic indicators. Nat Commun. 2021;12:1–12. doi: 10.1038/s41467-020-20314-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Vieira-Silva S, Falony G, Belda E, Nielsen T, Aron-Wisnewsky J, Chakaroun R, et al. Statin therapy is associated with lower prevalence of gut microbiota dysbiosis. Nature. 2020;1–6. doi: 10.1038/s41586-020-2269-x [DOI] [PubMed] [Google Scholar]
  • 18.Gilbert JA, Alverdy J. Stool consistency as a major confounding factor affecting microbiota composition: an ignored variable? Gut. 2016:1–2. doi: 10.1136/gutjnl-2015-310043 [DOI] [PubMed] [Google Scholar]
  • 19.Ghosh TS, Das M, Jeffery IB, O’Toole PW. Adjusting for age improves identification of gut microbiome alterations in multiple diseases. Elife. 2020;9. doi: 10.7554/eLife.50240 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Kim D, Hofstaedter CE, Zhao C, Mattei L, Tanes C, Clarke E, et al. Optimizing methods and dodging pitfalls in microbiome research. Microbiome. 2017;5:52. doi: 10.1186/s40168-017-0267-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Duvallet C, Gibbons SM, Gurry T, Irizarry RA, Alm EJ. Meta-analysis of gut microbiome studies identifies disease-specific and shared responses. Nat Commun. 2017;8:1784. doi: 10.1038/s41467-017-01973-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Pasolli E, Schiffer L, Manghi P, Renson A, Obenchain V, Truong DT, et al. Accessible, curated metagenomic data through ExperimentHub. Nat Methods. 2017;14:1023–4. doi: 10.1038/nmeth.4468 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Siljander H, Honkanen J, Knip M. Microbiome and type 1 diabetes. EBioMedicine. 2019;46:512–21. doi: 10.1016/j.ebiom.2019.06.031 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Mardinoglu A, Boren J, Smith U. Confounding Effects of Metformin on the Human Gut Microbiome in Type 2 Diabetes. Cell Metab. 2016:10–2. doi: 10.1016/j.cmet.2015.12.012 [DOI] [PubMed] [Google Scholar]
  • 25.Sokol H, Seksik P, Furet JP, Firmesse O, Nion-Larmurier I, Beaugerie L , et al. Low counts of Faecalibacterium prausnitzii in colitis microbiota. Inflamm Bowel Dis. 2009;15:1183–9. doi: 10.1002/ibd.20903 [DOI] [PubMed] [Google Scholar]
  • 26.Remely M, Hippe B, Zanner J, Aumueller E, Brath H, Haslberger AG. Gut Microbiota of Obese, Type 2 Diabetic Individuals is Enriched in Faecalibacterium prausnitzii, Akkermansia muciniphila and Peptostreptococcus anaerobius after Weight Loss. Endocr Metab Immune Disord Drug Targets. 2016;16:99–106. doi: 10.2174/1871530316666160831093813 [DOI] [PubMed] [Google Scholar]
  • 27.Miquel S, Martín R, Rossi O, Bermúdez-Humarán LG, Chatel JM, Sokol H, et al. Faecalibacterium prausnitzii and human intestinal health. Curr Opin Microbiol. 2013;16:255–61. doi: 10.1016/j.mib.2013.06.003 [DOI] [PubMed] [Google Scholar]
  • 28.Ferreira-Halder CV, Faria AV de S, Andrade SS. Action and function of Faecalibacterium prausnitzii in health and disease. Best Pract Res Clin Gastroenterol. 2017;31:643–648. doi: 10.1016/j.bpg.2017.09.011 [DOI] [PubMed] [Google Scholar]
  • 29.El-Semman IE, Karlsson FH, Shoaie S, Nookaew I, Soliman TH, Nielsen J. Genome-scale metabolic reconstructions of Bifidobacterium adolescentis L2-32 and Faecalibacterium prausnitzii A2-165 and their interaction. BMC Syst Biol. 2014;8:41. doi: 10.1186/1752-0509-8-41 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Goodman SN, Fanelli D, Ioannidis JPA. What does research reproducibility mean? Sci Transl Med. 2016;8:341ps12. doi: 10.1126/scitranslmed.aaf5027 [DOI] [PubMed] [Google Scholar]
  • 31.Parsonnet J, Friedman GD, Orentreich N, Vogelman H. Risk for gastric cancer in people with CagA positive or CagA negative Helicobacter pylori infection. Gut. 1997;40:297–301. doi: 10.1136/gut.40.3.297 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Everard A, Belzer C, Geurts L, Ouwerkerk JP, Druart C, Bindels LB, et al. Cross-talk between Akkermansia muciniphila and intestinal epithelium controls diet-induced obesity. Proc Natl Acad Sci U S A. 2013;110:9066–71. doi: 10.1073/pnas.1219451110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Kostic AD, Gevers D, Pedamallu CS, Michaud M, Duke F, Earl AM, et al. Genomic analysis identifies association of Fusobacterium with colorectal carcinoma. Genome Res. 2012;22:292–8. doi: 10.1101/gr.126573.111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith GA, et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat Biotechnol. 2019;37:852–7. doi: 10.1038/s41587-019-0209-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Moossavi S, Fehr K, Khafipour E, Azad MB. Repeatability and reproducibility assessment in a large-scale population-based microbiota study: case study on human milk microbiota. Microbiome. 2021;9:41. doi: 10.1186/s40168-020-00998-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Schloss PD. Identifying and Overcoming Threats to Reproducibility, Replicability, Robustness, and Generalizability in Microbiome Research. mBio. 2018;9. doi: 10.1128/mBio.00525-18 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Nearing JT, Douglas GM, Hayes MG, MacDonald J, Desai DK, Allward N, et al. Microbiome differential abundance methods produce different results across 38 datasets. Nat Commun. 2022. Jan 17; 13(1):342. doi: 10.1038/s41467-022-28034-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Loomba R, Seguritan V, Li W, Long T, Klitgord N, Bhatt A, et al. Gut Microbiome-Based Metagenomic Signature for Non-invasive Detection of Advanced Fibrosis in Human Nonalcoholic Fatty Liver Disease. Cell Metab. 2017;25:1054–1062.e5. doi: 10.1016/j.cmet.2017.04.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Hannigan GD, Duhaime MB, Ruffin MT 4th, Koumpouras CC, Schloss PD. Diagnostic Potential and Interactive Dynamics of the Colorectal Cancer Virome. mBio. 2018;9. doi: 10.1128/mBio.02248-18 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Sze MA, Schloss PD. Looking for a Signal in the Noise: Revisiting Obesity and the Microbiome. mBio. 2016;7. doi: 10.1128/mBio.01018-16 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Lloyd-Price J, Arze C, Ananthakrishnan AN, Schirmer M, Avila-Pacheco J, Poon TW, et al. Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature. 2019;569:655–62. doi: 10.1038/s41586-019-1237-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Gevers D, Kugathasan S, Denson LA, Vázquez-Baeza Y, Van Treuren W, Ren B, et al. The treatment-naive microbiome in new-onset Crohn’s disease. Cell Host Microbe. 2014;15:382–92. doi: 10.1016/j.chom.2014.02.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Baron RM, Kenny DA. The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. J Pers Soc Psychol. 1986;51:1173–82. doi: 10.1037//0022-3514.51.6.1173 [DOI] [PubMed] [Google Scholar]
  • 44.Fragoso TM, Neto FL. Bayesian model averaging: A systematic review and conceptual classification. Available from: http://arxiv.org/abs/math.PR/0000000. [Google Scholar]
  • 45.Armitage P. Fisher, Bradford Hill, and randomization. Int J Epidemiol. 2003;32: 925–8; discussion 945–8. doi: 10.1093/ije/dyg286 [DOI] [PubMed] [Google Scholar]
  • 46.Hill AB. THE ENVIRONMENT AND DISEASE: ASSOCIATION OR CAUSATION? Proc R Soc Med. 1965;58:295–300. [PMC free article] [PubMed] [Google Scholar]
  • 47.Höfler M. Causal inference based on counterfactuals. BMC Med Res Methodol. 2005;5:28. doi: 10.1186/1471-2288-5-28 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Ioannidis JPA. Exposure-wide epidemiology: revisiting Bradford Hill. Stat Med. 2016;35:1749–62. doi: 10.1002/sim.6825 [DOI] [PubMed] [Google Scholar]
  • 49.Abubucker S, Segata N, Goll J, Schubert AM, Izard J, Cantarel BL, et al. HUMAnN2: the HMP Unified Metabolic Analysis Network 2. 2017. [Google Scholar]
  • 50.Qin N, Yang F, Li A, Prifti E, Chen Y, Shao L, et al. Alterations of the human gut microbiome in liver cirrhosis. Nature. 2014;513:59–64. doi: 10.1038/nature13568 [DOI] [PubMed] [Google Scholar]
  • 51.Viechtbauer W. Conducting Meta-Analyses in R with the metafor Package. J Stat Softw. 2010;36:1–48. [Google Scholar]
  • 52.Wickham H. ggplot2: Elegant Graphics for Data Analysis. [cited 2019 Dec 19]. Available from: https://ggplot2-book.org. [Google Scholar]

Decision Letter 0

Roland G Roberts

7 Jul 2021

Dear Chirag,

Thank you for submitting your manuscript entitled "Evaluating 581 reported gut microbiome-disease associations across 1.1 million model specifications reveals sources of inconsistency in metagenomic research" for consideration as a Meta-Research Article by PLOS Biology.

Your manuscript has now been evaluated by the PLOS Biology editorial staff, as well as by an academic editor with relevant expertise, and I'm writing to let you know that we would like to send your submission out for external peer review.

However, before we can send your manuscript to reviewers, we need you to complete your submission by providing the metadata that is required for full assessment. To this end, please login to Editorial Manager where you will find the paper in the 'Submissions Needing Revisions' folder on your homepage. Please click 'Revise Submission' from the Action Links and complete all additional questions in the submission questionnaire.

Please re-submit your manuscript within two working days, i.e. by Jul 09 2021 11:59PM.

Login to Editorial Manager here: https://www.editorialmanager.com/pbiology

During resubmission, you will be invited to opt-in to posting your pre-review manuscript as a bioRxiv preprint. Visit http://journals.plos.org/plosbiology/s/preprints for full details. If you consent to posting your current manuscript as a preprint, please upload a single Preprint PDF when you re-submit.

Once your full submission is complete, your paper will undergo a series of checks in preparation for peer review. Once your manuscript has passed all checks it will be sent out for review.

Given the disruptions resulting from the ongoing COVID-19 pandemic, please expect delays in the editorial process. We apologise in advance for any inconvenience caused and will do our best to minimize impact as far as possible.

Feel free to email us at plosbiology@plos.org if you have any queries relating to your submission.

Kind regards,

Roli

Roland Roberts

Senior Editor

PLOS Biology

rroberts@plos.org

Decision Letter 1

Roland G Roberts

20 Aug 2021

Dear Dr. Patel,

Thank you for submitting your manuscript "Evaluating 581 reported gut microbiome-disease associations across 1.1 million model specifications reveals sources of inconsistency in metagenomic research" for consideration as a Meta-Research Article at PLOS Biology. Your manuscript has been evaluated by the PLOS Biology editors, an Academic Editor with relevant expertise, and by several independent reviewers.

You will see that reviewer #1 has a large number of queries, regretting your failure to pre-register the study, asking you to tone down some conclusions, and looking for improvements in both text and presentation. Reviewer #2 raises a large number of questions about the method, and some specific points about diseases and features of the microbiome.

In light of the reviews (below), we will not be able to accept the current version of the manuscript, but we would welcome re-submission of a much-revised version that takes into account the reviewers' comments. We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent for further evaluation by the reviewers.

We expect to receive your revised manuscript within 3 months.

Please email us (plosbiology@plos.org) if you have any questions or concerns, or would like to request an extension. At this stage, your manuscript remains formally under active consideration at our journal; please notify us by email if you do not intend to submit a revision so that we may end consideration of the manuscript at PLOS Biology.

**IMPORTANT - SUBMITTING YOUR REVISION**

Your revisions should address the specific points made by each reviewer. Please submit the following files along with your revised manuscript:

1. A 'Response to Reviewers' file - this should detail your responses to the editorial requests, present a point-by-point response to all of the reviewers' comments, and indicate the changes made to the manuscript.

*NOTE: In your point by point response to the reviewers, please provide the full context of each review. Do not selectively quote paragraphs or sentences to reply to. The entire set of reviewer comments should be present in full and each specific point should be responded to individually, point by point.

You should also cite any additional relevant literature that has been published since the original submission and mention any additional citations in your response.

2. In addition to a clean copy of the manuscript, please also upload a 'track-changes' version of your manuscript that specifies the edits made. This should be uploaded as a "Related" file type.

*Re-submission Checklist*

When you are ready to resubmit your revised manuscript, please refer to this re-submission checklist: https://plos.io/Biology_Checklist

To submit a revised version of your manuscript, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' where you will find your submission record.

Please make sure to read the following important policies and guidelines while preparing your revision:

*Published Peer Review*

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://blogs.plos.org/plos/2019/05/plos-journals-now-open-for-published-peer-review/

*PLOS Data Policy*

Please note that as a condition of publication PLOS' data policy (http://journals.plos.org/plosbiology/s/data-availability) requires that you make available all data used to draw the conclusions arrived at in your manuscript. If you have not already done so, you must include any data used in your manuscript either in appropriate repositories, within the body of the manuscript, or as supporting information (N.B. this includes any numerical values that were used to generate graphs, histograms etc.). For an example see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5

*Blot and Gel Data Policy*

We require the original, uncropped and minimally adjusted images supporting all blot and gel results reported in an article's figures or Supporting Information files. We will require these files before a manuscript can be accepted so please prepare them now, if you have not already uploaded them. Please carefully read our guidelines for how to prepare and upload this data: https://journals.plos.org/plosbiology/s/figures#loc-blot-and-gel-reporting-requirements

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Thank you again for your submission to our journal. We hope that our editorial process has been constructive thus far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Roland Roberts

Senior Editor

PLOS Biology

rroberts@plos.org

*****************************************************

REVIEWS:

Reviewer #1: Florian NAUDET

Reviewer #2: Katie Pollard & Annamarie Bustion

Reviewer #1: I think that this manuscript has the potential to become an important contribution to the field. I have however mixed feelings about some aspects that I will try to detail in my review.

Please note that I am also reviewer for another paper of this team in PLOS Biology and that I was also very enthusiastic about this other paper. Please also note that I am a reviewer involved in meta-research and that I strongly suggest that a reviewer involved in the specific field of microbiome research should be invited (a reviewer used to the methods used in this field).

First of all, this is a stellar piece of research performing various "case studies", applying the concept of vibration of effects in the field of gut microbiome disease associations. VoE is a form of multiverse analysis and is the subject of many research efforts across the world and the team submitting this paper is at the forefront of these efforts. The microbiome field is rapidly expanding and may have serious problem of reproduciblity due to somewhat small sample size and an universe of researcher's degrees of freedom. The application of VoE concept in this case is therefore timely and innovative. One may also note that VoE is proposed here in a very positive manner, as a tool to identify associations worth of interest and not as a "brute force" tool to disqualify any positive association. This paper represent an heroic effort not solely in running various models but also in being an attempt to synthesize certain aspects the field of microbiome research. It is, in my opinion an important contribution to the field of meta-research.

I however have two major concerns :

- First the research was not pre-registered (or perhaps I missed it). Of course, authors may acknowledge that this is not an hypothesis generating study. But, in my opinion, the paper

presents many specific examples as cases studies and there is room for cheery picking. I don't say that authors cherry picked the results but, rather that, without pre-registration, one cannot affirm that it was not the case ;

. Several choices were made in choosing the examples/case studies. Pre registration would have added more transparency on these choices ;

. I regret that the authors didn't adopt a systematic review approach :

A/ It could have helped again in providing more transparency concerning selection of the included topics / case studies ;

B/ Some aspect of the systematic review approach may have been useful to improve both reporting and reproducibility (e.g. a PRISMA flow chart may have been much more intuitive and easy to follow that the current information reported in the various web appendices) ;

. I anticipate that authors will say that registration may not be mandatory because this is not an hypothesis testing study, and I can understand this point of view. But I also think that there are some degrees of freedom in their study and that it is always best to fix those a priori, as far as it is possible of course ;

. At the very least, non-registration must be explicitly stated in the methods / and / possible shortcomings must be discussed in the limitations ;

. It seems that the authors used a protocol (p.13, l.377) but they do not provide this protocol / I would be interested in a specific pargraph detaillin any change to the protocol;

- My second major concern is rather conceptual :

. Authors suggest that their approach is an advance in identifying association worth of interest. At least this is the tone of the discussion and the conclusion. I do think that there is some spin here and that the design cannot garantee this.

. Indeed the present study nicely details a path for identifying associations that can be considered as robust regarding various criteria ;

. But it does not mean that the robust associations identified will necessarily be transformed in more robust discoveries ;

. It does not explore nor compare with other methods, how much of these associations will end in being false positive / false negative.

. At the very least the discussion section should be toned done, and further developments necessary to develop and adopt this new approach must be suggested;

I leave the editor judge whether these concerns rather qualify for a major revision or for a rejection. I may suggest major revision but I think that an in depth revision of the discussion will be needed.

The paper is clearly written, understandable for a large audience despite many technical aspects. I do think that authors did a good job in this regard. Of course it could be improved and I hope that the following minor comments will be helpful.

- COI disclosure: I would be interested to know more about the 2 companies (FitBiomics) and Micro Bioscience, and especially how links with these companies may represent a conflicting interest.

- Authors must give a close attention to their figures :

. In one hand figures are very nice to read and nicely summarize many very important information ;

. In the other hand, figures are very difficult to read (for each figure, it took me some times to understand the key concepts as these figures are not so intuitive) ;

. For example, in Figure 1, information about the search strategy is somewhat hard to understand, especially if you did not have looked at the method section. Figure 1 could be more aligned with the data reported in the text to make its reading easier.

A/ I appreciate that y axis are in a log scale / however, the figure should provide p-values (on a log scale) rather than log p-values. It would be easier to read. This could apply to all figures.

B/ In Figure 1 A, there is no information of the y and y axis for the VoE, taxon graphs. I think that it could be difficult to read.

C/ In figure 1B and 1G, there is no label for the x-axis. The number 1 to 6 could be difficult to follow. I would suggest to think in more user friendly representation.

. For example, in Figure 2 :

A/ The Figure use the term Janus Effect in its legend. It is of course defined in the figure but it does not appear in the text. I would suggest that authors could add this in the text;

B/ There is no information about the axes (no label for the y nor for the x axes). It could be difficult to follow. I don't have any good suggestion for improving the readility and adding more information about the associations being studied / but I suggest that authors may think in a more user friendly figure ;

. For example, in figure 3A, I would use the same scale for all effect size across all diseases (e.g. using a log scale) ;

- P.4 l.98 37.8 findings were from the datasets used in this analysis. I would be interested to know more about reproducibility of a finding in the dataset where this finding is from versus in other datasets (e.g. in line 109-110, I would be interested to break this by original versus new datasets). It could be out of scope but I would like to hear the authors'point of view about this point.

- P.4. l.111, I'm not sure that the title is adequate. Instead of "additional potential", I would suggest "different". In this paragraph I would also be interested in Venn's diagram showing the overlap between usual approach and the approach proposed by the authors ;

- p5. l.119 : "we hypothesized" OK : was a it a priori ? Or was it a posteriori (see my previous comment) in major concerns ;

- p.6. l.159-161. This is rather an interpretation and should be moved in the discussion section. And, also, following my second major concern, authors must provide convincing evidence to support such a statement. I'm not sure that it is possible based on their data.

- p7. l193: Here again I'm surprised that the author selected a specific example (i.e. F. Prausnitizii). Was this done a priori. Why this specific association? I would have preferred using specific criteria and a random selection of example using the predefined criteria. I may have missed something however.

. P.8 l.206: I think that the very first sentence of the discussion must summarize the findings. I would therefore suggest to delete (or move in the introduction) the very first sentences.

. P. 10 l.261-263: In line with my previous comment, I would be interested to know more about discrepancies between re-analysis in a given dataset and the original analysis.

. p.14, l.422: How the 20 adjusting variables were selected. I think that this point could benefit from more details.

Reviewer #2: This paper explores the microbiome field's need to account for the portion of bias introduced by confounding variables when choosing models of association between bacterial taxa and human disease. The authors chose to address this through a process called "Vibration of Effects." In addition to identifying associations that change when adjusting for potential confounding variables, the authors promise to:

Measure reproducibility of previous literature findings of association,

Identify new biological associations between bacterial taxa and human disease,

Provide recommendations on feature prioritization in future studies, based on their ability to recapitulate associations in this work.

Identify whether an adjuster is of the confounder or collider variety.

Major comments:

The authors state that VoE can be used to assess reproducibility, but the most non-robust associations were also those that had the most available covariates, and therefore yielded the highest number of models. This process seems highly dependent on the type and number of adjusters available in sequencing metadata. VoE seems better suited to evaluating associations after a study without attempting to narrow in on a one, best model.

In addition to the number of measured covariates, VoE appears to be sensitive to the number of cohorts, sample size, and variability in measurements. These differ across the studied diseases. When comparing results over diseases (Lines 140-145, for example) these factors should be discussed even more explicitly. Single-cohort associations are mentioned, but it is not clear if cohort is one of the covariates included in VoE or could be included. Providing simulations or down-sampling experiments on the larger data sets to show how these variables affect VoE in this application would be extremely useful.

The premise for using VoE needs further justification. Why would a marginal association be inherently better if it differs little from all possible conditional associations? Identifying confounding variables makes sense, but it seems like this method is broader than that. Justification needs to be provided for seeking reproducibility over models with all possible/measured variables given that the authors acknowledge that some covariates should not be adjusted for. Similarly, why is a microbe-disease association inherently more interesting if it is reproduced across multiple diseases (Lines 226-228 for example)?

An association-prioritization framework of variable adjusters that must be considered across multiple diseases and studies is promised as a result, but not provided by the manuscript.

How are the findings and measurements of statistical significance affected by only analyzing previously significant microbe-disease associations (Lines 252-257)? Regression to the mean type thinking and extreme value statistics suggest that these will be biased towards non-significance and smaller effect sizes. It would be interesting to include non-significant results and look at how many become more significant. What is known about VoE conditional on starting with previously reported associations versus vomiting this conditioning?

Other comments:

Line 63: Explain VoE in more detail (text on Lines 419-422 seems helpful, for example) and any statistical theory justifying it as a metric for evaluating measured associations. Is there a causal inference framework underlying VoE?

Line 85/102-110: Using a baseline with no covariates seems simplistic compared to the modeling used in the published studies. It would be great if the authors included whether or not their baseline model matched the models explored in the original studies. That is, it doesn't seem fair to mount the comparison in Figure 1B, unless the reader is assured that the baseline model is a fair point of reference.

Line 93: Provide a citation for the existing database, or use a different word here.

Line 96: Rheumatoid arthritis and obesity might be additional indications to explore. Related, IBD could be stratified by UC and CD.

Line 112: Provide more detail about the models in terms of parametric forms, outcomes, and covariates.

Lines 119-130: Figures 1B and 1G do not seem dissimilar, yet the authors state that VoE helped recover reported results lost by the baseline model. The purpose of the baseline model as a comparison merits discussion here. In Figure 1G, might it be better to report the proportion of findings matched, rather than the median association from all models generated?

Line 131: Could association robustness also be varying as a function of number of adjusters, rather than disease/cohort? T1D and T2D are reported to be the least robust, but these studies also resulted in the most models generated due to their high number of available covariates.

Line 148: What is an arbitrary model? It is not clear if this statement refers to a null distribution or probability under the empirical distribution or something else?

Line 150/296: Figure 2 does not have an A and B panel. Also, is the baseline comparison necessary to include?

Line 157-159: It would be helpful to explain why the marginal association is not significant.

Lines 155-161/337/403: The previously reported association between Roseburia and ACVD was found via an ACVD-microbiome association based on gene and KEGG content, rather than taxa. We wonder if the writers of this manuscript have considered exploring more than relative abundance of species. Gene content and pathway presence/absence could also be explored. Related, because this is already a known association, we are assuming the authors believe this to be potentially overlooked based on their baseline model in this work.

Lines 162-192: The authors present the influence of adjuster variables on these particular datasets, but they do not show that these can be prioritized generally in a so-called association-prioritization framework across multiple studies and diseases.

Lines 201-203: Is this a reference to confounders, colliders, or something else?

Lines 226-228: We disagree that an association-prioritization framework has been provided, but maybe VoE could be applied in the future to do so. In this work, there do not seem to be adjusters that are consistently significant across multiple diseases and studies.

Line 245: Geography didn't seem significant for F. prausnitzii associations.

Line 256/405: It might make sense to pick one time point from longitudinal studies, rather than averaging.

Line 258/403: We recommend performing CLR before comparing abundances across multiple studies of the same disease.

Line 311: We believe the authors meant to say "y-axis."

Line 312: Figure 3 does not have colors for adjusters used across multiple diseases, as written.

Line 325: Needs a y-axis label

Line 371: What is the justification for using the second publication?

Line 402: We expected the models to be set up as disease~microbial feature + covariates. What is the justification for making the microbial feature the outcome?

Line 428: 10,000 is a lot of models. This data set could be used to explore how the number of covariates affects the number of robust associations. What happens if a random 10, 100, 1000 models are used? Or if all models for a random subset of covariates are used?

Style comments:

In general, there is overuse of quotation marks throughout the manuscript that do not enhance understanding. Examples: "consistent" on line 25, "non-robust" on line 83, "recovered" on line 84, etc.

Line 59-62: This is a run-on sentence that hinders understanding.

Line 86: The phrase after the colon is missing words.

It is our policy to sign reviews: Annamarie Bustion and Katie Pollard

Decision Letter 2

Roland G Roberts

17 Jan 2022

Dear Chirag,

Thank you for submitting your revised Meta-Research Article entitled "Evaluating 581 reported gut microbiome-disease associations across millions of model specifications reveals sources of inconsistency in metagenomic research" for publication in PLOS Biology. I have now obtained advice from the original reviewers and have discussed their comments with the Academic Editor. 

Based on the reviews, we will probably accept this manuscript for publication, provided you satisfactorily address the remaining points raised by the reviewers. Please also make sure to address the following data and other policy-related requests.

IMPORTANT:

a) Please attend to the remaining requests from reviewer(s) #2.

b) Please re-name your supplementary Fig files "S1_Fig," "S2_Fig," etc.

c) Please address my Data Policy requests below; specifically, we need you to supply the numerical values underlying Figs 1BCDEF, 2AB, 3, 4ABCDEFGH, 5ABC, S1ABCD, S2, S3, S4AB. If these can all be generated from the data and code deposited in Github/Figshare, please cite the location of the data clearly in each relevant main and supplementary Fig legend, e.g. “This Figure can be generated using the data and code deposited in https://github.com/chiragjp/ubiome_reproducibility”).

As you address these items, please take this last chance to review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the cover letter that accompanies your revised manuscript.

We expect to receive your revised manuscript within two weeks.

To submit your revision, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' to find your submission record. Your revised submission must include the following:

-  a cover letter that should detail your responses to any editorial requests, if applicable, and whether changes have been made to the reference list

-  a Response to Reviewers file that provides a detailed response to the reviewers' comments (if applicable)

-  a track-changes file indicating any changes that you have made to the manuscript. 

NOTE: If Supporting Information files are included with your article, note that these are not copyedited and will be published as they are submitted. Please ensure that these files are legible and of high quality (at least 300 dpi) in an easily accessible file format. For this reason, please be aware that any references listed in an SI file will not be indexed. For more information, see our Supporting Information guidelines:

https://journals.plos.org/plosbiology/s/supporting-information  

*Published Peer Review History*

Please note that you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://blogs.plos.org/plos/2019/05/plos-journals-now-open-for-published-peer-review/

*Early Version*

Please note that an uncorrected proof of your manuscript will be published online ahead of the final version, unless you opted out when submitting your manuscript. If, for any reason, you do not want an earlier version of your manuscript published online, uncheck the box. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us as soon as possible if you or your institution is planning to press release the article.

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Please do not hesitate to contact me should you have any questions.

Sincerely,

Roli

Roland G Roberts, PhD,

Senior Editor,

rroberts@plos.org,

PLOS Biology

------------------------------------------------------------------------

DATA POLICY:

You may be aware of the PLOS Data Policy, which requires that all data be made available without restriction: http://journals.plos.org/plosbiology/s/data-availability. For more information, please also see this editorial: http://dx.doi.org/10.1371/journal.pbio.1001797 

Note that we do not require all raw data. Rather, we ask that all individual quantitative observations that underlie the data summarized in the figures and results of your paper be made available in one of the following forms:

1) Supplementary files (e.g., excel). Please ensure that all data files are uploaded as 'Supporting Information' and are invariably referred to (in the manuscript, figure legends, and the Description field when uploading your files) using the following format verbatim: S1 Data, S2 Data, etc. Multiple panels of a single or even several figures can be included as multiple sheets in one excel file that is saved using exactly the following convention: S1_Data.xlsx (using an underscore).

2) Deposition in a publicly available repository. Please also provide the accession code or a reviewer link so that we may view your data before publication. 

Regardless of the method selected, please ensure that you provide the individual numerical values that underlie the summary data displayed in the following figure panels as they are essential for readers to assess your analysis and to reproduce it: Figs 1BCDEF, 2AB, 3, 4ABCDEFGH, 5ABC, S1ABCD, S2, S3, S4AB. NOTE: the numerical data provided should include all replicates AND the way in which the plotted mean and errors were derived (it should not present only the mean/average values).

IMPORTANT: Please also ensure that figure legends in your manuscript include information on where the underlying data can be found, and ensure your supplemental data file/s has a legend.

Please ensure that your Data Statement in the submission system accurately describes where your data can be found.

------------------------------------------------------------------------

DATA NOT SHOWN?

- Please note that per journal policy, we do not allow the mention of "data not shown", "personal communication", "manuscript in preparation" or other references to data that is not publicly available or contained within this manuscript. Please either remove mention of these data or provide figures presenting the results and the data underlying the figure(s).

------------------------------------------------------------------------

REVIEWERS' COMMENTS:

Reviewer #1:

[identifies himself as Florian Naudet]

Thank you for your response. I found these responses very appropriate and all my previous concerns are now well addressed.

I really think that this manuscript is an important contribution to the field of meta-research.

Reviewer #2:

[identify themselves as Annamarie Bustion and Katie Pollard]

The paper is improved; the results are presented more clearly and the conclusions drawn are more metered. The authors re-identified their research goals as the following: identification of confounders, recovery and robustness assessments of previous literature findings, and identification of new associations. The text answers these goals, and the authors appropriately de-emphasized their method's ability to assess reproducibility or provide an umbrella feature prioritization framework. Also, the authors present the rationale for using VoE more comprehensively, and they better describe their rationale for using a simple baseline model rather than recapitulating the models from their literature findings.

Remaining critiques:

Authors' response to 2.1: The authors now provide an assessment of VoE sensitivity to number of variables available, using T2D as an example. This does provide confidence that VoE can provide information on non-robust associations independent of number of variables. However, a stronger response would have looked at larger values for the number of variables, not just a comparison of 3, 6, and 9 variables. Further, an assessment of only three variable sizes seems insufficient to state that there could be a "a potential leveling-off with increasing adjusting variables." We recommend adding results for some higher numbers of variables.

Authors' response to 2.2: It is now clear how the authors made use of multiple cohorts. But it would have been useful for the authors to instead use additional disease cohorts as a means of external validation in addition to the use of previous literature findings. If possible, we recommend adding this analysis.

Abstract: Before reading the main text, it is not clear what "model specifications" refers to. This could be different covariates, a different parametric form, or a data transformation, amongst other things. It would be very helpful to clearly state that the focus of this paper is exploring all the possible combinations of covariates. Note: In the Introduction (L74-76), this is pretty clear. Our recommendation is to include a similar definition / scope statement in the abstract.

Abstract/Introduction: We appreciate that the authors now also look at non-associations with VoE. We recommend that you check the places where you talk about focusing on 581 published associations: some of the text sounds like you only analyze the associations but not the non-associations. Alternative language might be "studies" or "cohorts" "in which disease-microbiome associations were previously detected".

Figure 2A: The provided Upset plot is not fully explained. What do the gray bars represent?

Figure 2B: These results are more interpretable now that the authors are exploring fraction of matching associations. But these graphs would be more legible as count histograms rather than density plots.

It is our policy to sign reviews: Annamarie Bustion and Katie Pollard

Decision Letter 3

Roland G Roberts

27 Jan 2022

Dear Chirag,

On behalf of my colleagues and the two Academic Editors, Jonathan Eisen and Marcus Munafo, I'm pleased to say that we can in principle accept your Meta-Research Article "Systematically assessing microbiome-disease associations identifies drivers of inconsistency in metagenomic research" for publication in PLOS Biology, provided you address any remaining formatting and reporting issues. These will be detailed in an email that will follow this letter and that you will usually receive within 2-3 business days, during which time no action is required from you. Please note that we will not be able to formally accept your manuscript and schedule it for publication until you have any requested changes.

Please take a minute to log into Editorial Manager at http://www.editorialmanager.com/pbiology/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production process.

PRESS: We frequently collaborate with press offices. If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximise its impact. If the press office is planning to promote your findings, we would be grateful if they could coordinate with biologypress@plos.org. If you have not yet opted out of the early version process, we ask that you notify us immediately of any press plans so that we may do so on your behalf.

We also ask that you take this opportunity to read our Embargo Policy regarding the discussion, promotion and media coverage of work that is yet to be published by PLOS. As your manuscript is not yet published, it is bound by the conditions of our Embargo Policy. Please be aware that this policy is in place both to ensure that any press coverage of your article is fully substantiated and to provide a direct link between such coverage and the published work. For full details of our Embargo Policy, please visit http://www.plos.org/about/media-inquiries/embargo-policy/.

Thank you again for choosing PLOS Biology for publication and supporting Open Access publishing. We look forward to publishing your study. 

Sincerely, 

Roli

Roland G Roberts, PhD 

Senior Editor 

PLOS Biology

rroberts@plos.org

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Table. Cohort summary statistics and adjusters present for each disease and references for species of interest from associations in the literature.

    This figure can be generated using the code deposited in https://github.com/chiragjp/ubiome_robustness and the data deposited in https://figshare.com/projects/Microbiome_robustness/127607.

    (XLSX)

    S2 Table. Output of initial associations and summary of VoE for findings reported in the literature.

    This figure can be generated using the code deposited in https://github.com/chiragjp/ubiome_robustness and the data deposited in https://figshare.com/projects/Microbiome_robustness/127607. VoE, vibration of effects.

    (XLSX)

    S1 Fig. Benchmarking data transformations and the number of adjusting variables using our T2D datasets.

    (A–C) The impact of the different numbers of vibrations and data transformation methods on VoE. We plot the number of features that were FDR significant at least once upon vibration with different numbers of adjusting variables considered as well as different data transformation strategies (i.e., logged versus raw abundances versus center log ratio transformations). (D) The number of p-value and (E) FDR significant findings for the T2D cohort with the largest number of possible adjusters (using only log-transformed data, as opposed to the previous 3 panels). (F) The robustness of associations as a function of number of vibration variables and modeling strategy. We computed the fraction of associations that were positive for any given microbial feature—a highly robust association is 100% positive or 0% positive (i.e., negative), whereas a nonrobust association is closer to 50% positive (i.e., inconsistent in direction). In this heatmap, we correlated these associations for all features to gauge if the different data transformations and numbers of adjusting variables yielded similar measures of robustness across all datasets. This figure can be generated using the code deposited in https://github.com/chiragjp/ubiome_robustness and the data deposited in https://figshare.com/projects/Microbiome_robustness/127607. FDR, false discovery rate; T2D, type 2 diabetes; VoE, vibration of effects.

    (PDF)

    S2 Fig. VoE for reported associations from the literature. Bolded species names on the x-axis correspond to organisms that were FDR significant in our study. N corresponds to the number of models that converged for a given organism.

    The middle bar describes the fraction of association sizes greater than 0 for a given association: A highly confounded association will be closer to 0.5 and pink, whereas more robust associations will be closer to 0 or 1 and blue. The gray bars in upper bar plot corresponds the fraction of models that were nominally (p-value < 0.05) significant for the microbial feature–disease association, whereas the black bars correspond to the fraction of models that were FDR significant. This figure can be generated using the code deposited in https://github.com/chiragjp/ubiome_robustness and the data deposited in https://figshare.com/projects/Microbiome_robustness/127607. FDR, false discovery rate; VoE, vibration of effects.

    (PDF)

    S3 Fig. The association between Roseburia and ACVD, which we could have overlooked if only fitting a single modeling strategy.

    Each point represents a different model specification, the x-axis is the beta coefficient on the binary disease variable, the y-axis is the −log10(p-value). The dotted line represents FDR adjusted significance. The solid line represents nominal significance. This figure can be generated using the code deposited in https://github.com/chiragjp/ubiome_robustness and the data deposited in https://figshare.com/projects/Microbiome_robustness/127607. ACVD, atherosclerotic cardiovascular disease; FDR, false discovery rate.

    (PDF)

    S4 Fig. Benchmarking the number of vibrations needed to estimate the effect of confounding on microbiome associations.

    (A) The output of our confounder analysis (e.g., in Fig 4). The x-axis is the number of vibrations. The y-axis is each possible adjusting variable in the T2D associations. The values correspond to the beta coefficient (from our mixed effects analysis) describing the average change in microbiome–disease associations when a given adjusting variable is present in a model. (B) The correlation between the values in panel A at different numbers of vibrations. This figure can be generated using the code deposited in https://github.com/chiragjp/ubiome_robustness and the data deposited in https://figshare.com/projects/Microbiome_robustness/127607. T2D, type 2 diabetes.

    (PDF)

    S1 Text. The impact of the number of adjusting variables and data transformation method on VoE. VoE, vibration of effects.

    (DOCX)

    Attachment

    Submitted filename: robustness_response_to_reviewers.pdf

    Attachment

    Submitted filename: robustness_response_to_reviewers2.pdf

    Data Availability Statement

    Scripts used in the analysis (i.e. to generate figures and download literature review information) can be found at https://github.com/chiragjp/ubiome_robustness. All the relevant microbiome datasets can be downloaded from the R package associated with curatedMetagenomicData [Pasolli et al. Accessible, curated metagenomic data through ExperimentHub. Nat Methods. 2017;14: 1023–1024. ]. All analyzed data from this study can be located at https://figshare.com/projects/Microbiome_robustness/127607.


    Articles from PLoS Biology are provided here courtesy of PLOS

    RESOURCES