Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

Research Square logoLink to Research Square
[Preprint]. 2023 May 2:rs.3.rs-2778207. [Version 1] doi: 10.21203/rs.3.rs-2778207/v1

Multi-group Analysis of Compositions of Microbiomes with Covariate Adjustments and Repeated Measures

Huang Lin 1, Shyamal Das Peddada 1,*
PMCID: PMC10187376  PMID: 37205444

Abstract

Microbiome differential abundance analysis methods for a pair of groups are well established in the literature. However, many microbiome studies involve multiple groups, sometimes even ordered groups, such as stages of a disease, and require different types of comparisons. Standard pairwise comparisons are not only inefficient in terms of power and false discovery rates, but they may not address the scientific question of interest. In this paper, we propose a general framework for performing a wide range of multi-group analyses with covariate adjustments and repeated measures. We demonstrate the effectiveness of our methodology through two real data sets. The first example explores the effects of aridity on the soil microbiome, and the second example investigates the effects of surgical interventions on the microbiome of IBD patients.

Introduction

Increasingly researchers are recognizing the role of the microbiome in human health and diseases1, 2. Accordingly, it has become a common practice for researchers to investigate differences in microbial compositions and to identify differentially abundant taxa among two or more study or experimental groups. Most existing methods for differential abundance (DA) analysis, such as ANCOM3, ANCOM-BC4, LOCOM5, etc., are designed for discovering differentially abundant taxa between two groups. A variety of strategies are used in the literature for performing multi-group differential abundance analysis. For example, some researchers use the above methods to analyze two groups at a time using a false discovery rate (FDR) threshold within each pairwise comparison and pool the results from all such pairwise comparisons to interpret the data. Such a strategy does not account for the fact that multiple tests and multiple pairwise comparisons are being performed when controlling for false discovery rates. Some researchers perform global beta diversity analyses for multiple groups and follow up with pairwise differential abundance analyses for groups that appear to be different. While all such approaches are intuitive and reasonable, they may not be designed to answer the specific scientific question of interest. Secondly, the statistical properties of such procedures are not well understood. For example, they may not control the overall FDR for multiple tests and multiple pairwise comparisons.

It is well recognized in gene expression studies that when there are more than two groups, the definitions of FDR depend upon the study design and the hypotheses of interest. Standard procedures, such as the BH procedure6, are designed for testing multiple hypotheses between two groups. When there are multiple groups, the standard concept of FDR, and methods controlling the corresponding error rates, need to be modified according to the study design and type of analyses to be performed79. A variety of multi-group studies are performed by researchers, such as: (1) Multiple pairwise comparisons: A dietitian may be interested in making all pairwise comparisons of the gut microbial compositions among subjects receiving diets D1,D2 or D3. Furthermore, for each pairwise comparison, D1 vs. D2,D1 vs. D3, and D2 vs. D3, the goal is often to identify taxa whose abundance increased (or decreased). (2) Multiple pairwise comparisons against a specific reference group: Same as in scenario (1), but the investigator is only interested in a subset of pairwise comparisons. For example, compare each group with a reference group. Suppose D1 is the standard diet, and the researcher may be interested in identifying taxa whose abundance may have increased (or decreased) for subjects receiving a new diet D2, compared to D1, and similarly new diet D3, compared to D1. (3) Pattern analysis over ordered study groups: In some instances, an investigator may be interested in discovering trends or patterns in abundances of taxa over ordered groups, such as the health of subjects, changes in climate, doses of a drug, and others. For instance, during normal pregnancy, women not only undergo major physiological and hormonal changes, but they also experience significant changes in their gut and vaginal microbiome10. These changes are necessary for maternal metabolism, immune response, and hormonal changes to support pregnancy and to provide healthy flora for babies at birth11, 12. In fact, the alpha diversity of the microbiome dramatically decreases temporally during pregnancy13. During the first trimester of a healthy pregnancy, the gut microbiota resembles that of a healthy non-pregnant woman13. However, as the pregnancy progresses from the first to the third trimester, there is a reduction in the relative abundance of anti-inflammatory butyrate-producing bacteria and an increase in the pro-inflammatory phylum Proteobacteria, Bifidobacterium spp. in the phylum Actinobacteria, and lactic acid-producing bacteria in preparation for energy demands of lactation12. Thus, in many scientific investigations, researchers may be interested in studying changes in the microbiome over an ordered set of conditions. The patterns of microbial abundance may not always be monotonic. They may display other shapes, such as an umbrella or an inverted umbrella with the location of the peak or trough unknown a priori. Additionally, depending upon the scientific question of interest, such as microbial changes during pregnancy, repeated measures are taken on the same subject. Although the pattern analyses mentioned here could be accomplished by conducting a sequence of pairwise tests over adjacent ordered groups, such a strategy may have lower power than a test designed for pattern analysis, as will be demonstrated in the analysis of soil aridity data described later in this paper.

The focus of this paper is to develop methodologies for performing multi-group differential abundance analyses for studies such as the above-noted ones. To the best of our knowledge, there does not exist a formal methodology for performing such analyses, with the exception of ANCOM-II14. While ANCOM-II considered some of the above testing problems, it does not develop a formal framework for bias correction and is heuristic. The more recent methodology LinDA15, which uses a model similar to the one developed in ANCOM-II, does not address the above multi-group testing problems. Thus, there is a major gap in the literature for analyzing multi-group microbiome studies, which will be filled by the methodology developed in this paper called ANCOM-BC2.

Before developing ANCOM-BC2, we first make some modifications to ANCOM-BC4 for testing multiple hypotheses regarding two groups which would result in a better control of FDR. This framework will then be extended to the multi-group testing problem considered in this paper. Although the ANCOM-BC methodology accounted for sample-specific bias, we now also account for taxon-specific bias. This is important because sequencing efficiencies can vary across taxa, leading to a taxon-specific bias when some taxa are preferentially measured over others during the sequencing experimental workflow. For example, gram-positive bacteria have stronger cell walls than gram-negative bacteria, making them harder to extract during the data preparation step. As a result, gram-positive bacteria may be underrepresented in the observed abundances, leading to biased results if not properly accounted for in the analysis16. Also, it is well-known in the analysis of other omics data that small signals are often associated with small variances. Consequently, in such cases, the value of the test statistics is inflated which potentially results in a highly significant p-value, even though the effect sizes are trivially small. Inspired by Significance Analysis of Microarrays (SAM)17 methodology, we regularize the variance to avoid inflated values for the test statistics and hence moderate the p-values to avoid false positives and false discoveries. Lastly, zeros are a common problem for many DA methods, including ANCOM-BC, that work with log-abundance data. Often such methods use pseudo-counts to deal with zero before taking logarithms. However, the choice of pseudo-count can impact the results of rare taxa containing excess zeros, potentially leading to an increase in false discoveries14, 18, 19 and inflate FDR. To mitigate this issue, we conduct a sensitivity analysis for pseudo-count addition and assign a sensitivity score to each taxon, indicating the likelihood of a false positive result for a particular taxon that is declared significant. A larger sensitivity score indicates a higher risk of a false positive, thus providing a tool for a researcher when interpreting the results.

Using constrained statistical inference-based methods7 and mixed directional false discovery rate (mdFDR) methods for multiple pairwise comparisons8, 9, along with the above-noted modifications to ANCOM-BC, we develop ANCOM-BC2 for multi-group microbiome studies in this paper. ANCOM-BC2 allows modeling covariates as well as repeated measures. Details of the method are described in the Methods Section. The performance of ANCOM-BC2 is evaluated using extensive simulation studies under a variety of settings. The results are described in the Results Section. The study designs for the simulation experiments are provided in the Supplementary Methods. ANCOM-BC2 is illustrated using two publicly available data, namely, soil microbiome data and irritable bowel disease data.

Results

Simulations: Continuous and Binary Exposures

In light of practical applications, we conducted simulation studies under two common scenarios, when the exposure variable is continuous or it is binary while adjusting for covariates in the model. To benchmark the performance of ANCOM-BC2, we compared it to its predecessor, ANCOM-BC4, which was originally developed for binary covariates, as well as two other state-of-the-art DA methods: 1) LinDA15, which is based on linear regression and employs centered log-ratio transformed abundance data, and 2) LOCOM5, a logistic regression-based approach that obviates the need for pseudo-counts and utilizes permutation methods to address overdispersion and small sample sizes.

We utilized a subset of data from the Quantitative Microbiome Project (QMP)20 consisting of 106 samples and 91 operational taxonomic units (OTUs) to generate simulated data. By using real data as a template, we ensured that the data-generating process did not favor our methods, enabling a fair comparison across all methods. We applied the effects of the exposure and adjusting covariates to the log abundance data based on the QMP data template. The log fold-changes (based on the natural log) of these variables ranged from −2 to 2, which corresponds to a fold-change between 0.14 to 7.4 in the original scale. To evaluate the robustness of differential abundance (DA) methods, we conducted a comprehensive simulation study that incorporated two sources of bias into the synthetic data. To account for sequencing efficiency differences, we applied a feature-specific bias that was sampled from a uniform distribution (U[0.1,1]) and applied to each taxon. Additionally, we included a sample-specific bias that ensured the presence of rare taxa (with more than 50% zeros across samples) to test the robustness of the DA methods to pseudo-count addition. The sample-specific bias was also highly correlated with the exposure of interest to assess the methods’ performance against batch effects, which are a concern in large-scale omics studies. To test the robustness of the DA methods regarding the violation of the assumption that most taxa are not differentially abundant in most DA methods, we considered four different cases of the underlying true proportion of DA taxa, ranging from 10% to 90%. For ANCOM-BC2, we used a sensitivity analysis to pseudo-count addition, setting the cutoff value at one, which means any taxon with a sensitivity score greater than one would be automatically declared not significant. To control the false discovery rate due to multiple testing, we utilized the Holm-Bonferroni method21 instead of the Benjamini-Hochberg (BH) procedure6, as it does not rely on assumptions about the dependence structure among the underlying p-values and is known to be more robust when dealing with inaccurate p-values22. Supplementary Figures 1 and 2 provide additional details on the simulation study design and the results obtained. More information is available in the Supplementary Methods.

Figure 1a displays the simulation results for DA analysis with respect to a continuous exposure. ANCOM-BC2 outperformed all other methods by maintaining a FDR control under the nominal level of 0.05, even with highly dynamic microbial environments, where up to 90% of taxa were differentially abundant. Also, as expected, ANCOM-BC2 demonstrated increased power with larger sample sizes, achieving comparable power to the other methods when the sample size was relatively large (> 50). In contrast, all competing methods had substantially higher FDR than ANCOM-BC2. LOCOM had severely inflated FDR and reduced power with limited sample sizes, reaching up to 50% FDR and as low as 20% power in simulations with 10 samples and 10% of taxa being differentially abundant. ANCOM-BC and LinDA had the highest power among all methods, but both methods exhibited FDR exceeding the nominal level in most simulation scenarios. This increase in FDR correlated with the increase in sample size, suggesting a systematic bias in their test statistics due to the addition of pseudo-counts. Proper sensitivity analysis regarding pseudo-count addition is necessary to prevent random false positives and FDR inflation for rare taxa in both methods.

Figure 1.

Figure 1.

Comparisons of FDR (mdFDR) and power in identifying DA taxa in (a) continuous, (b) binary, or (c) categorical exposure. Synthetic data were generated based on the QMP data20. The X-axis shows the sample size (or sample size per group for the categorical covariate), and the Y-axis shows the FDR (mdFDR) or power. The proportion of true DA taxa is indicated in the panel title. Results are represented by the average of the corresponding measure (FDR, mdFDR, or power) ± standard errors (shown as error bars) across 100 simulation runs for each setting. The results demonstrate that ANCOM-BC2 outperformed all competing methods in terms of uniformly small FDR (mdFDR) and comparable power.

Simulation results for differential abundance analysis (DA) with respect to a binary exposure are shown in Figure 1b. The results are consistent with those in Figure 1a. The competing methods had substantially inflated FDR as compared to ANCOM-BC2, and their FDRs increased monotonically with the increase in sample size. ANCOM-BC2 outperformed all other methods in terms of maintaining a uniformly small FDR and reasonable power. Note that given the striking differences in FDR, it may not be reasonable to compare powers.

Simulations: Multiple Groups

The simulation settings for multi-group comparisons were consistent with those outlined in the previous section. Similarly, we conducted a sensitivity analysis for ANCOM-BC2 with a cutoff value of one for pseudo-count addition and controlled the FDR due to multiple testing using the Holm-Bonferroni method21.

Multiple pairwise comparisons against a reference group.

In this simulation study, we evaluated the performance of ANCOM-BC2 in multiple comparisons against a reference group, where the exposure variable consisted of three groups and adjusting covariates were present. We compared ANCOM-BC2 against ANCOM-BC and LinDA. We did not include LOCOM because, currently, it is not designed for multiple groups. Figure 1c illustrates that ANCOM-BC2 effectively controlled the mdFDR8, 9, the combination of both Type I and the directional errors in the FDR framework (see the Methods section for details), below the nominal level of 0.05 and maintained a power greater than 0.8 when the sample size per group exceeded 30. On the other hand, neither ANCOM-BC nor LinDA controlled the FDR, highlighting the advantage of ANCOM-BC2 over existing methods in controlling mdFDR in multi-group comparisons.

Multiple pairwise comparisons.

In this simulation study, we evaluated the performance of ANCOM-BC2 for all possible pairwise comparisons, rather than focusing on comparisons against a specific group. As previously mentioned, the competing methods considered in this paper are not equipped to handle pairwise comparisons in their existing forms, which led to their exclusion from this simulation study. Figure 2a demonstrates that, on average, ANCOM-BC2 effectively controlled the mdFDR well below the nominal level of 0.05 while maintaining substantial power (>0.8) when the sample size per group was not exceedingly small (n>30).

Figure 2.

Figure 2.

Comparisons of FDR (mdFDR) and power in identifying DA taxa in (a) multiple pairwise comparisons, and (b) pattern analysis. Synthetic data were generated based on the QMP data20. The X-axis shows the sample size per group, and the Y-axis shows the FDR (mdFDR) or power. The proportion of true DA taxa is indicated in the panel title. The box plot shows the distribution of the corresponding measure (FDR or power) across 100 simulation runs for each setting. Each box represents the interquartile range (IQR) of the data, with the horizontal line in the box representing the median. The whiskers extend to the furthest data point which is within 1.5 times the IQR. Any data points beyond this range are shown as individual points. Additionally, the data points are jittered to avoid overlap and provide a better visualization of the data. The results demonstrate that ANCOM-BC2 controlled FDR (mdFDR) while maintaining adequate power.

Pattern analysis.

Similar to pairwise comparisons, pattern analysis stands out as an important feature of ANCOM-BC2. In this simulation study, we benchmarked the scenario of a monotonically increasing pattern, where the log fold-change between the second group and the reference group, denoted as δ, ranged from 0.5 to 2, and the third group had the log fold-change equal to δ+1 with respect to the reference group. The “discovery” in the pattern analysis was defined as the identification of a taxon that was monotonically increasing across the three groups. As shown in Figure 2b, ANCOM-BC2 was successful in controlling the false discovery rate (FDR) in this simulation setting and maintained high power (>0.8) in most cases. However, under the most extreme scenario where 90% of taxa were truly differentially abundant, ANCOM-BC2 experienced a loss of power. It is important to note that in this simulation study, ANCOM-BC2 pattern analysis controlled the FDR in a very conservative manner, with an FDR equal to 0. Increasing the cutoff value for the sensitivity analysis of pseudo-count addition or using a less stringent FDR controlling method instead of the Holm-Bonferroni method could improve the power and the FDR/power tradeoff (data not shown).

Simulations: Correlated Samples

We also performed benchmarks comparing ANCOM-BC2 to LinDA with respect to correlated microbiome data. Note that ANCOM-BC and LOCOM were not included since neither of them is designed for correlated experimental groups, such as repeated measures. We considered two scenarios in this simulation study. The first scenario contained only a random intercept effect in the data, while the second had both random intercept and random slope effects. Both random effects had variances equal to 1, and a correlation coefficient of 0.5 was set if both were present. The exposure variable had three levels, i.e., there were three experimental groups. The simulation study included a continuous covariate. The remaining simulation settings were the same as indicated in the previous section, and we provide further details in the Supplementary Methods. The results were similar in both scenarios (Fig. 3). ANCOM-BC2 controlled the FDR well below the nominal level of 0.05 in all simulation settings while maintaining adequate power as the sample size increased. Meanwhile, although LinDA had uniformly larger power than ANCOM-BC2, it suffered inflated FDR, particularly when the sample size was large (e.g., the FDR could reach 70% when the sample size per group was 100).

Figure 3.

Figure 3.

Comparisons of FDR and power in identifying DA taxa in (a) a random intercept model, and (b) a random coefficients model. Synthetic data were generated based on the QMP data20. The X-axis shows the sample size per group, and the Y-axis shows the FDR or power. The proportion of true DA taxa is indicated in the panel title. Results are represented by the average of the corresponding measure (FDR or power) ± standard errors (shown as error bars) across 100 simulation runs for each setting. The results demonstrate that ANCOM-BC2 outperformed LinDA in terms of uniformly small FDR and comparable power.

Illustration: Soil Microbiome and Aridity

The threat of climate change not only threatens the dryland ecosystems with increasing temperatures and aridity but there is a concern that some of the microbes native to such conditions may be trans-locating to other environments with the expansion of deserts and desert ecologies. There is growing literature on desert microbiome23. The translocation of such bacteria may affect vegetation, plant life, and human health. Most importantly, there are concerns regarding the spread of antimicrobial or antibacterial resistance-causing microbes, which may make infections harder to treat and increase the risk of spreading diseases. Accordingly, there is interest in identifying microbial species that are native to extremely hot conditions, such as hyper-arid soils.

Recently, Neilson et al.2 investigated the differences in soil microbiomes according to soil aridity in the Atacama Desert in Chile. They classified soil into three ordered categories based on aridity, namely, arid, margin, and hyper-arid, and sequenced data from 63 sample pits from 18 sites in the desert. Since they did not conduct formal differential abundance analyses of those data, we reanalyzed those data using the ANCOM-BC2 methodology. To begin with, we conducted a trend analysis of species richness with respect to the ordered aridity categories (arid to hyper-arid) (Fig. 4a). The trend test using the ORIOGEN24 software yielded a p-value <0.0001, suggesting a significant loss of species richness with the increase in aridity. This finding regarding the species richness is consistent with the original publication of Neilson et al.2, except that we are able to provide a p-value for trend using ORIOGEN.

Figure 4.

Figure 4.

Differential abundance analysis for the impact of aridity in desert soil samples. (a) Violin plot of the relationship between aridity and microbial richness. The black dot in the middle of each violin represents the median value, while the black bar represents the interquartile range (IQR). The width of each violin reflects the density of data points at each value of microbial richness. Individual data points are also shown as jittered dots. The ORIOGEN software was utilized to conduct a trend test, which revealed a significant reduction in species richness with increasing aridity with a p-value <0.0001. (b) Heatmap of ANCOM-BC2 pattern analysis with respect to aridity. Monotonic increasing and decreasing trends were evaluated across ordered soil categories, with arid soil as the reference category. The X-axis represents soil categories, while the Y-axis displays significant genera identified by ANCOM-BC2 pattern analysis. Each cell is color-coded with blue representing reduced abundance and red representing increased abundance, with the numbers on each cell indicating the log fold-change relative to the reference group (arid group). The Holm-Bonferroni method was utilized to correct for multiple testing, and genera colored in brown on the Y-axis were found to be significant after correcting for multiple testing.

Next, we performed a pattern analysis to discover patterns in microbial taxa abundances over the ordered soil categories using the arid soil as the reference category. We discovered Thermobaculum, Geodermatophilus, and Rubrobacter to increase in mean abundance with the aridity of the soil (p<0.05), and among these, Thermobaculum had a significant trend after correcting for multiple testing with a multiple testing adjusted p-value <0.0001 (Fig. 4b). Thermobaculum spp. is well-known to exist in temperatures as high as 90 degrees Celsius25, and it is documented to contain antibacterial or antimicrobial-resistant genes26, 27. Similarly, the two Actinobacteria genera, Geodermatophilus and Rubrobacter, are also known to be antibacterial-resistant genera. For a review of these taxa, one may refer to Montero-Calasanz et al.28 and Li et al.29, respectively. Thus, using our proposed methodology, we discover genera that have an increase in abundance with respect to aridity and which may have antibacterial resistance properties.

Not surprisingly, nitrogen-cycling microbes are significantly diminished by increasing aridity in desert soils. For example, the presence/absence results (Supplementary Table 1) show that Nitrobacter, a common contributor to nitrification, along with putative broadly distributed N2 fixers Sinorhizobium, Rhizobium, and Azospirillum, were not detected in hyper-arid samples. The results of the ANCOM-BC2 pattern analysis also show that increasing aridity correlates with significant reductions in the abundance of taxa typically associated with fertile soils (Fig. 4b). Genera such as Ca. Nitrososphaera, which are chemolithoautotrophs and have essential biogeochemical roles as nitrifying organisms30; Paenibacillus, which contains many species that promote crop growth through nitrogen fixation, phosphate solubilization, production of the phytohormone indole-3-acetic acid (IAA), and release of siderophores that enable iron acquisition31; and Pseudonocardia, which has been reported to achieve associative nitrogen fixation and protect their hosts against soil-borne pathogenic infection32, monotonically decreased with aridity.

ANCOM-BC2 sensitivity analysis revealed that none of the significant taxa exceeded the threshold of 1, indicating that these discoveries are likely true positives (Supplementary Fig. 3).

Illustration: Gut Microbial Composition of IBD Patients after Surgery

Numerous studies have delved into the role of the microbiome in inflammatory bowel disease (IBD); however, only a few have specifically addressed the impact of surgery on the gut microbiome and its subsequent consequences. We employ the ANCOM-BC2 to analyze a longitudinal dataset obtained from Fang et al.33 to investigate the changes in the gut microbiome following gastrointestinal surgery in IBD patients. The dataset utilized in this study consists of 322 stool samples collected from 125 patients. Of these, 46 patients were diagnosed with ulcerative colitis (UC) and 79 with Crohn’s disease (CD). Stool samples were obtained from each subject at approximately 6-month intervals, beginning at the baseline time point. Specifically, 21 patients provided one sample, 38 patients provided two samples, 41 patients provided three samples, 23 patients provided four samples, and 2 patients provided five samples. Of the total patient population, 87 (70.0%) had no history of intestinal surgery, while 22 CD patients had undergone ileocolonic resection, and 13 CD patients and 3 UC patients had undergone different types of colectomies. These surgeries had occurred prior to the collection of the baseline stool sample. For the purposes of this study, we focused on comparing the microbial compositions between patients who had not undergone gastrointestinal surgery, those who had undergone ileocolonic resection, and those who had undergone colectomies. We adjusted the ANCOM-BC2 model for IBD disease type (UC vs. CD) and two potential confounders, namely disease state (inactive vs. active) and antibiotic use (absent vs. present). The results are depicted in Fig. 5.

Figure 5.

Figure 5.

Heatmap of ANCOM-BC2 pairwise analysis for the effect of surgical resection in a cohort of patients with IBD. The analysis included multiple pairwise comparisons among the three groups: ileocolonic resection, colectomy, and no intestinal surgery, while controlling the overall mdFDR at 0.05. The X-axis represents the specific comparisons made between ileocolonic resection vs. no intestinal surgery, colectomy vs. no intestinal surgery, and ileocolonic resection vs. colectomy. The Y-axis displays the significant species identified by ANCOM-BC2. Each cell in the heatmap is color-coded with blue representing reduced abundance and red representing increased abundance, and the numbers on each cell indicate the log fold-change. The Holm-Bonferroni method was utilized to correct for multiple testing.

We performed multiple pairwise comparisons among the three groups controlling the overall mdFDR at 0.05 using ANCOM-BC2. Ileocoloic section is the surgical removal of the diseased section of the ileum, which is the junction area between the small and last intestines. In contrast, colectomy is the surgical removal of most or all of the large intestine. Interestingly, our analysis revealed that almost no microbial species were differentially abundant between the two surgical groups of patients, except for H. parainfluenzae and C. perfringens, which are more abundant in the colectomy group, and conversely, C. aldense, and B. producta are more abundant only in the ileocolonic section group.

We saw reduced abundances of several native gut bacterial species in patients who underwent either ileocolonic resection or colectomy compared to patients with no intestinal surgery. Specifically, These species include Bacteroides spp. (ovatus and uniformis), Butyricicoccus pullicaecorum, Dorea (formicigenerans and longicatena), Faecalibacterium prausnitzii, Roseburia spp. (faecis and inulinivorans), and Ruminococcus torques. Most of these bacterial species are involved in the production of short-chain fatty acids (SCFAs), such as butyrate, propionate, and acetate3440. SCFAs play essential roles in maintaining gut health, supporting gut barrier function, possessing anti-inflammatory properties, and providing energy sources for colonocytes. Thus, although the surgical intervention was necessary for these patients, the surgery may have impacted the host immune response and health of these patients. The IBD patients who underwent any of these surgeries may require probiotics or other forms of microbial supplementation.

We also conducted an ANCOM-BC2 sensitivity analysis, revealing that none of the significant taxa exceeded a threshold of 1. This suggests that the observed findings are likely true positives (Supplementary Fig. 4).

Discussion

In this article, we introduced a general framework called ANCOM-BC2 for performing differential abundance analysis when the exposure variable is continuous, binary, or (ordered) categorical. The proposed methodology allows for adjusting for covariates and repeated measures (longitudinal measures) while controlling for FDR or mdFDR when the exposure variable has more than two groups, and the researcher is interested in inferring if the abundance of a taxon increased or decreased for each pairwise comparison. Furthermore, using the theory of constrained statistical inference, ANCOM-BC2 allows researchers to infer patterns in microbial abundance over ordered categories of exposure variables. For example, it allows a researcher to test if a particular microbe increased (or decreased) in abundance over ordered disease categories (very healthy to least healthy). This is a unique feature of ANCOM-BC2. Lastly, ANCOM-BC2 overcomes some of the limitations of ANCOM-BC for controlling the FDR while maintaining high power, especially with the presence of rare taxa with excess zeros.

The power of ANCOM-BC2’s pattern analysis was demonstrated in the soil microbiome data analyzed in this paper. When standard pairwise analyses were performed, we only discovered Paenibacillus that was significantly differentially abundant across different groups (data not shown). However, using the pattern analysis, we discovered several taxa display increasing or decreasing trends over the ordered soil aridity groups. This is because, unlike pairwise comparisons, pattern analysis uses constrained inference methods, which “borrow” information from neighboring ordered groups. Thus, increasing the effective sample size and the power7, 41, 42.

Intrinsically, the ileocolonic section and colectomy are surgically removing different regions of the intestines, and yet based on our analysis of the IBD data, there were no significant differences in the abundance of the majority of the gut bacteria. Furthermore, the two groups of patients have similarly reduced abundances of these bacteria relative to those who did not undergo either of the two surgeries. Based on these findings, it may be reasonable to hypothesize that most species of gut microbiota are spatially uniformly distributed in the ileum and large intestines.

Methods

Notation

The notations described in ANCOM-BC2 methodology are summarized in Table 1.

Table 1.

Summary of notations

Notation Description

i Sample index, i=1,2,,n.
j Tax on index, j=1,2,,d.
k Index of fixed effects, k=1,2,,p.
l Index of Random effects, l=1,2,,q.
xik The kth fixed effect of interest for the ith sample.
zil The lth random effect of interest for the ith sample.
Aij Unobserved abundance of jth taxon in a unit volume of ecosystem of ith sample.
Oij Observed abundance of jth taxon in a random specimen taken from a unit volume of ecosystem of ith sample.
Eij Random error for taxon j in sample i.
Si Sample-specific sampling fraction.
Cj Taxon-specific sequencing efficiency.
aij log Aij.
oij log Oij.
eij Random error for taxon j in sample i in log scale.
si Sample-specific sampling fraction in log scale.
cj Taxon-specific sequencing efficiency in log scale.

Parameter;

Random variable.

ANCOM-BC2 for fixed effects models

Model assumptions

Assumption 1 (Multiplicative model for observed abundances):

Oij=SiCjAijEij.

Assumption 1 indicates that, in expectation, the observed abundance of a taxon in a random sample is in constant proportion to the abundance in a unit volume of the ecosystem of the sample. This proportion can be decomposed into two parts: (1) sample-specific sampling fraction, and (2) taxon-specific sequencing efficiency.

According to Assumption 1, for non-zero observed abundance, the above multiplicative model can be transformed into an additive model by log transformation

oij=si+cj+aij+eij.

Assumption 2 (Linear model for log abundances): For each taxon j,aij,i=1,,n are independently distributed, and

aij=bjTxi+eij(a),

where

  1. xi=1,xi1,xi2,,xipT are the covariates of interest (including the intercept) for the ith sample,

  2. bj=bj0,bj1,bj2,,bjpT are the corresponding coefficients for xi.

  3. eij(a),i=1,,n are independently distributed random errors for log abundances with E(eij(a))=0,Var(eij(a))=σjj(a).

Assumption 3 (Independent random error for log observed abundances): Assume there are random errors, eij(o),i=1,,n,j=1,,d, for log observed abundances oij, which are independently distributed with heteroskedasticity:

E(eij(o))=0,Var(eij(o))=σij(o),eij(o)eij(a).

Regression framework

Based on the Assumptions 2 and 3, oij can be modeled as:

oij=si+cj+bjTxi+eij(a)+eij(o):=si+cj+bjTxi+eij, (1)

with

E(oij)=si+cj+bjTxi,Var(oij)=Var(eij)=σjj(a)+σij(o):=σij(t).

where σij(t) denotes the total variance.

Model (1) can also be written in a vector notation as follows:

oj=s+cj1+Xbj+ej, (2)

with

E(ej)=(0,,0)T,E(oj)=s+cj1+Xbj,Cov(oj)=[σ1j(t)000σ2j(t)000σnj(t)].

where

  1. 1=(1,1,,1)T,

  2. oj=o1j,o2j,,onjT,

  3. s=s1,s2,,snT,

  4. bj=bj0,bj1,bj2,,bjpT,

  5. ej=e1j,e2j,,enjT,

  6. X=1x11x12x1p1x21x22x2p1xn1xn2xnp.

It is important to note that within each sample i, for taxa lm,oil and oim are not necessarily independent due to correlations between ail and aim. Thus vectors ol and om are not independent random vectors.

Remove the effect of taxon-specific sequencing efficiency

To eliminate the effect of cj, we first center the log observed abundances across samples, i.e.

yij:=oijo¯j=(sis¯)+bjT(xix¯)+(eije¯.j),:=θi+βj Txi+εij, (3)

where

  1. βjk=bjk for k=1,,p, and βj0=k=1pbjkxk,

  2. Var(εij)=(n1)2n2σij(t)+1n2iiσij(t):=σij.

Algorithm 1.

Iterative MLE

Initialize:
    For j=1,,d
    θ0
    yj(crt)yjθ=yj
    βj(XTX)1XTyj(crt)=(XTX)1XTyj
while not converge do
    θ1dj=1d(yjXβj)
    yj(crt)yjθ
    βj(XTX)1XTyj(crt)
end while
Estimation of sample-specific bias

As can be seen from (3), βj are not identifiable without determining the nuisance parameter θi. We define bias-corrected log abundance yij(crt)=yijθi, then the ordinary least squares (OLS) estimators of θi and βj can be obtained by iteratively solving the following equations. For ease of exposition, the algorithm is described in the vector form, i.e. yj=(y1j,y2j,,ynj)T,θ=(θ1,θ2,,θn)T, etc.

Upon convergence,

θ=1dj=1d(yjXβj),yj(crt)=yjθ,βj=(XTX)1XTyj(crt). (4)

Therefore

θ=1dj=1d(yjXβj)=1dj=1d(yjPyj(crt))=1dj=1d(yjPyj+Pθ)=1dj=1d[yj(crt)+θP(yj(crt)+θ)+Pθ]=(IP)θ+Pθ+1dj=1d(IP)yj(crt)=(IP)θ+Pθ+1dj=1dεj, (5)

where

  1. P=XXTX1XT is the projection matrix onto 𝒞(X), the column space of X,

  2. εj=(IP)yj(crt) with E(εj)=0.

Rearranging (5), we see that

(IP)θ=(IP)θ+1dj=1dεj.

Taking expectations on both sides leads to

(IP)[E(θ)θ]=0.

As IP is an orthogonal projector onto 𝒞(X), the above equation holds as long as either of the following is valid:

  1. E(θ)θ=0,

  2. E(θ)θ𝒞(X).

It is sufficient to consider (2) because (1) is the trivial case. If (1) were true then from (4) we deduce that there is no sample-specific effect and that E(βj)=βj. Suppose (2) is true, then there exists a vector δ0p, such that

E(θ)=θXδ. (6)

Then by combining with equation (4), we have

E(βj)=δ+βj. (7)

We shall denote θ and βj obtained from the above iterative algorithm as preliminary estimators of θ and βj, respectively. Without loss of generality, throughout this paper we assume XTX is a full rank matrix. If it is not a full rank matrix, then we may use any generalized inverse of XTX because Xβj in (5) is invariant of the choice of generalized inverse XTXg used in βj=XTXgXTyj (crt). Thus the preliminary estimator θ provided above is invariant of the choice of generalized inverse used in deriving βj. Furthermore, throughout this paper we are interested in testing hypothesis regarding linearly estimable parameters Aβj, i.e. 𝒞AT𝒞XT43. Consequently, the estimator Aβj is invariant of the generalized inverse used in the estimation of βj. Hence throughout this text, for simplicity of exposition, we shall assume XTX is of full rank.

For each taxon j=1,,d, by (7), βj is a biased estimator if δ0. Suppose we wish test the following hypothesis

H0:Aβj=Aβj0,H1:AβjAβj0.

Under the null hypothesis, E(Aβj)Aβj0=Aδ0 and hence biased. The next step is to estimate this bias δ and accordingly modify the estimator Aβj so that the resulting estimator is asymptotically centered at Aβj0 under the null hypothesis and hence the test statistic is asymptotically centered at zero.

First we make the following observations. Since E(βj)=δ+βj, we note that as n, for finite dimension d,

j12(βj(δ+βj))dNp(0,I), (8)

where

j=limn(XTX)1(i=1nσij2xixiT)(XTX)1. (9)

Since

E(θ+Xβj)=θXδ+X(δ+βj)=θ+Xβj,

i.e. θ+Xβj is an unbiased estimator of θ+Xβj, hence a possible estimator of Σj is given by

Σˆj=(XTX)1(i=1n(yijθiβjTxi)2xixiT)(XTX)1. (10)

Under some mild regularity conditions44, with finite d, we have the following consistency result

n(ΣˆjΣj)p0,asn. (11)

Therefore, replacing Σj with Σˆj in (8) and appealing to Slutsky’s theorem, we have

Σˆj12(βj(δ+βj))dNp(0,I),asn.

By (9) and (11), under some mild regularity conditions44, for finite d, we obtain

Σˆjp0,asn.

Consequently,

βjpδ+βj,asn. (12)

The above observation regarding the convergence of βj plays a critical role in the following. Since the sampling fraction is constant for all taxa within a sample, we pool information across taxa within each sample when estimating δ. We model each taxon abundance using the following Gaussian mixture model. For the jth taxon and the kth covariate, let C0 denote the set of taxa that are not differentially abundant with respect to xik, i.e. C0={j(1,2,,d):βjk=0},C1 denote the set of taxa whose abundance decreases with xik, i.e. C1={j(1,2,,d):βjk<0}, and let C2 denote the set of taxa whose abundance increases with xik, i.e. C2={j(1,2,,d):βjk>0}. Let πr denote the probability that a taxon belongs to set Cr,r=0,1,2. For simplicity of estimation of parameters, similar to GEE, we shall assume that βjk,j=1,2,,d, are independently distributed. As commonly done in the analyses of various ’omics data, we ignore the underlying correlation structure when estimating δ. Thus, we model the distribution of βjk by Gaussian mixture model as follows:

f(βjk)=π0ϕ(βjkδkvj0)+π1ϕ(βjk(δk+l1)vj1)+π2ϕ(βjk(δk+l2)vj2), (13)

where

  1. ϕ is the standard normal density function,

  2. δk,δk+l1, and δk+l2 are means for βjk|C0,βjk|C1, and βjkC2, respectively. l1<0,l2>0,

  3. vj0,vj1, and vj2 are variances of βjk|C0,βjk|C1, and βjkC2, respectively.

Note that instead of fitting a multivariate Gaussian mixture model for all covariates together, we choose to fit a univariate Gaussian mixture model repeatedly for every single covariate. This repetition is simply because the sets of taxa C0,C1,C2 are not necessarily the same for different covariates. Also, note that for a categorical covariate of s levels, it contains s coefficients, e.g. βj1,,βjs, and we shall fit the Gaussian mixture model for these s coefficients separately.

For computational simplicity, we assume that vj1>vj0,vj2>vj0. Thus, Without loss of generality for κ1,κ2>0, let vj1=vj0+κ1 and vj2=vj0+κ2. While this assumption is not a requirement for our method, it is reasonable to assume that variability among differentially abundant taxa is larger than that among the null taxa. By making this assumption, we simplify the computation.

Assuming samples are independent, we begin by first estimating vj02=Var(βjk). Note that vj02 is the function of heteroscedastic variances, a consistent estimator of vj02, which we refer to as vˆj02, is the kth diagonal element of Σˆj stated in (10). In all future calculations, we plug in vˆj02 for vj02. This is similar in spirit to many statistical procedures involving nuisance parameters. The following lemma is useful in the sequel.

Lemma 1 (Introducing the latent variable in calculating log-likelihood45):

logf(xθ)=Ef(zx,θ)[logf(zθ)+logf(xz,θ)].

Let Θ=δk,π1,π2,π3,l1,l2,κ1,κ2T denote the set of unknown parameters, then for each taxon the log-likelihood can be reformulated using Lemma 1, as follows:

Θarg maxΘj=1dr=02pr,j[logPr(jCr)+logf(βjkjCr)]. (14)

Then the E-M algorithm is described as follows:

  • E-step: Compute conditional probabilities of latent variables. Define pr,j=Pr(jCrβjk,Θ)=πrϕ(βjkδk+lrvj)rπrϕ(βjkδk+lrvjr),r=0,1,2;j=1,,d, which are conditional probabilities representing the probability that an observed value follows each distribution. Note that l0=0.

  • M-step: Maximize the likelihood function with respect to the parameters, given the conditional probabilities.

We shall denote the resulting estimator of δk upon convergence of the algorithm by δˆkEM.

As stated in Lin and Peddada4, compared to vˆj02, the variance and covariance contributed by δˆkEM is negligible when the number of non-differentially abundant taxa is large, such as when analyzing the microbiome data at the OTU/ASV or species level of the phylogenetic tree.

The above procedure is applied to every βjk,k=1,,p, eventually, we obtain the estimator of δ as

δˆEM=(δˆ1EM,δˆ2EM,,δˆpEM)T. (15)

Therefore, the final estimator of βj is defined as

βˆj=βjδˆEM, (16)

with

βˆjpβj,asn, (17)

given that δˆEM is a good approximation of δ.

The estimation procedure is summarized in Algorithm 2.

Algorithm 2.

E-M algorithm

  1:   Input:
  βj,j,j=1,,d
  2:   procedure EM(βj,j)
  3:   return δ^kEM,k=1,,p
  4:   end procedure
  5:   for k=1,,p do
  6:   β^jkβjkδ^kEM
  7:   end for

For taxon j, we now describe our methodology for testing the following hypotheses

H0:Aβj=Aβj0,H1:AβjAβj0.

From Slutsky’s theorem, as n, the following test statistic is approximately central chi-square distributed under the null hypothesis

Wj=(AβˆjAβj0)T(AΣˆjAT)1(AβˆjAβj0)=(AβjAδˆEMAβj0)T(AΣˆjAT)1(AβjAδˆEMAβj0)dχq2,

where q=rank(A).

To control the FDR due to multiple testing, we recommend applying Holm-Bonferroni method21 instead of Benjamini-Hochberg (BH) procedure6 because the Holm-Bonferroni method does not require any assumptions regarding the dependence structure in the underlying p-values, and is also known to be a better method to control FDR when p-values are not accurate22.

Sampling-specific biases estimation

After obtaining δˆEM, the estimator of sample-specific biases θ is defined as follows:

θˆ=1dj=1d(yjXβˆj). (18)

Let Σ(i)=[σlm(i)]l,m=1,,d denote the d×d covariance matrix of ε(i)=(ε1(i),ε2(i),,εd(i))T, where σlm(i) is the (l,m)th element of Σ(i) and σjj(i) is the jth diagonal element of Σ(i). Furthermore, suppose

Assumption 4 (Sparse correlations among taxa):

σjj(i)<K<,lmdσlm(i)d2=o(1).

From Assumption 4, we have

01TΣ(i)1=l=1dm=1dσlm(i)=j=1dσjj(i)+lmdσlm(i)dK+lmdσlm(i).

Hence

01TΣ(i)1d2Kd+lmdσlm(i)d2=o(1).

Thus, for each taxon j=1,2,,d, we have

1dj=1d(yj(θ+Xβj))p0,asd. (19)

Therefore, according to (17) and (19), as both n,d,

θˆθ. (20)

Remark 1. Regularization of variance: To avoid the spurious detection of significance due to extremely small standard errors, particularly for rare taxa, we incorporated a small positive constant in the denominator of the ANCOM-BC2 test statistic for each taxon. This approach was inspired by the Significance Analysis of Microarrays (SAM) methodology17. Specifically, the regularization factor was set as the 5th percentile of the distribution of standard errors for each fixed effect, unless otherwise specified.

Remark 2. Sensitivity analysis for the pseudo-count addition: To mitigate the risk of inflated false positive rates resulting from the choice of pseudo-count in ANCOM-BC2, we conducted a sensitivity analysis to assess the impact of varying pseudo-count values on differential abundance results. This is particularly important, as several studies have shown that the choice of pseudo-count can significantly influence the results of differential abundance analysis methods18, 19. Specifically, we surveyed a range of pseudo-count values from 0 to 1 with 0.01 increments, and a sensitivity score was assigned to each taxon to evaluate its sensitivity to the choice of pseudo-count. The sensitivity score was defined as the standard deviation of the resulting negative log p-values, with a larger score indicating greater sensitivity to the pseudo-count choice. Consequently, a significant result for a taxon with a high sensitivity score is likely to be a false positive.

Multi-group comparison

In some applications, for a given taxon, researchers are interested in drawing inferences regarding differential abundance among different pairs of experimental groups. We refer to this kind of problem as a multi-group comparison problem, and extra caution needs to be exercised to correct p-values due to multiple comparisons. For simplicity, we drop the subscript j (taxon index) in the following discussions.

Global test

For a given taxon and a total of g+1 experimental groups (including the reference group), researchers may want to test whether there exists at least one group that is significantly different from others. For ease of exposition, we split the covariates X into two parts, where X1 stands for the group assignment, and X2 denotes the remaining covariates. Note that the difference of group effects against the reference group is estimable, while the individual group effect is not. For simplicity, in the discussions of multi-group comparisons among group 0 to group g, we assume group 0 is the reference group. We use βk,k=1,,g to denote the group effect, but notice that it actually estimates βkβ0. We rewrite the model stated in (3) as

y=θ+X1β+X2γ+ε, (21)

where

  1. θ is the sample-specific bias,

  2. β is the vector of group effects (as compared to group 0) of the order g×1,

  3. X1 is the design matrix of the order n×g consisting of 0s and 1s,

  4. X2 is the known matrix of other covariates (including the intercept) of the order n×(pg+1) with the corresponding regression parameter vector γ of the order (pg+1)×1.

The global test intends to test

H0:k{1,,g}βk=0,H1:k{1,,g}βk0,

which can be reformulated as

H0:Aβ=0,H1:Aβ0,

where

A=Ig=[100001000001]

with the test statistic

W=(Aβˆ)T(AΣˆ(g)AT)1(Aβˆ)dχg2,asn,

where Σˆ(g) is the corresponding sub-matrix of Σˆ defined in equation (10).

Similarly, to control the FDR due to multiple testing, we recommend applying Holm-Bonferroni method21 instead of Benjamini-Hochberg (BH) procedure6 due to the underlying complex dependence structure between taxa.

Example 0.1. Suppose there are 3 groups, namely, groups 0 (reference), 1, and 2, and no other covariates. For each sample i,i=1,,n, we have:

yi=θi+μ+β1I{group=1}+β2I{group=2}+εi.

To test whether there is at least one group among 0, 1, and 2, that is significantly different from others, we would like to test:

H0:β1=β2=0,H1:β10β20,

which is the same as testing:

H0:Aβ=0,H1:Aβ0,

where A=1001, and β=β1,β2T.

Multiple pairwise comparisons

If we are interested in knowing whether the abundance increased or decreased between various pairs of groups, then it amounts to testing the following hypotheses:

H0,k,k:βk=βkH1,k,k:{βk<βk}{βk>βk},

where kk{1,,g}. Denote the test statistic for a given pairwise comparison as

Wkk=βˆkβˆkVar^(βˆk)+Var^(βˆk)dN(0,1),asn,

where Var^(βˆk),Var^(βˆk) are the kth and kth diagonal elements of Σˆ(g), respectively. Thus, the raw p-value for comparing group k and group k is defined as:

pkk=2[1ϕ(|Wkk|)].

For comparing with the reference group (group 0), the hypotheses become:

H0,k:βk=0H1,k:{βk<0}{βk>0}.

We also replace βˆk and Var^(βˆk) with 0s in the test statistic.

Note that the null and alternative hypotheses for the global test are denoted as H0 and H1, a Type I error might occur due to wrongly rejecting H0 or correctly rejecting H0 but wrongly rejecting H0,k,k. A directional error might occur due to correctly rejecting H0 but wrong assignment of the direction between βk and βk while correctly rejecting H0,k,k. In this case, we need to control the error rate combining both Type I and the directional errors in the FDR framework, which is referred to as mixed directional FDR (mdFDR)8, 9.

Definition 1 (mdFDR): Let V(j) denote the indicator function of at least one Type I error or directional error committed, i.e.

V(j)={1if Type I or directional error occurs0otherwise.

Then, mdFDR is defined as the expected proportion of Type I and directional errors among all discovered taxa.

mdFDR=E(j=1dV(j)max(R,1)),

where R denote the number of taxa discovered.

To control the mdFDR for all pairwise tests, we adopt the general mdFDR controlling procedure9, and do the following:

  1. Apply global test method stated above to obtain the p-value for each taxon. We denote these p-values as screening p-values. Apply BH procedure to identify taxa that are differentially abundant in at least one pairwise comparison. Let R denote the number of taxa discovered.

  2. For each taxon discovered in step 1, apply any mixed directional family wise error (mdFWER) controlling procedure, such as Holm-Bonferroni (default), Hochberg, etc., to the pairwise p-values (pkk) at level Rα/d.

  3. For a given taxon discovered in step 1, if a pairwise hypothesis is rejected in step 2, then we declare βk<βk or βk>βk according to Wkk<0or>0.

It has been proved that under the assumption of independence of p-values obtained from the global test, the mdFDR of the above procedure is strongly controlled at level α9.

Example 0.2. Suppose there are 3 groups, namely, groups 0 (reference), 1, and 2, and no other covariates. For each sample i,i=1,,n, we have:

yi=θi+μ+β1I{group=1}+β2I{group=2}+εi.

To test whether the taxon is differentially abundant between group 1 and 0 (reference), we would like to test:

H0:β1=0,H1:{β1<0}{β1>0},

with the test statistic:

W10=βˆ1Var^(βˆ1).

Additionally, if we want to test whether the taxon is differentially abundant between group 1 and 2:

H0:β1=β2,H1:{β1<β2}{β1>β2}.

The test statistic is:

W12=βˆ1βˆ2Var^(βˆ1)+Var^(βˆ2).
Test against a specific group

Often researchers are interested in knowing whether the abundance increased or decreased in an ecosystem relative a pre-specified group, say the control group. Again, assume group 0 is the reference group and β0=0, then one may be interested in testing the following hypotheses:

H0,k:βk=0,H1,k:{βk<0}{βk>0},

where k{1,,g}.

As before, the pairwise test statistic is defined as follows:

Wk=βˆkVar^(βˆk)dN(0,1),asn,

where Var^(βˆk) is the kth diagonal elements of Σˆ(g). Thus, the raw p-value for comparing group k and group 1 is defined as

pk=2[1ϕ(|Wk|)].

Likewise, we apply the mdFDR controlling procedure for all pairwise tests. To improve power, we modify the global test mentioned earlier to a Dunnet-based4648 test as described below:

  1. The test statistic W=maxk{1,,g}Wk,

  2. Generate Wk(b)N(0,1),k=1,,g.

  3. Compute W(b)=maxk{1,,g}|Wk(b)|.

  4. Repeat the above steps B times, we get the null distribution of W.

The screening p-value is calculated as:

p=1Bb=1BI(W(b)>W).

Pattern analysis

When the experimental groups are ordered naturally, such as doses of exposure or duration of exposure or stages of a disease, etc., for a given taxon, researchers may be interested in testing whether the abundance of the taxon is changing with the ordered experimental groups according to some specific pattern. Thus, the null and alternative hypotheses one wants to test become (assume group 0 is the reference group):

H0:β1=β2==βg=0,H1:β=(β1,,βg)T,

where is one or a collection of patterns. Examples of patterns are given below.

Example 0.3 (Simple order).

1={0β1β2βg}with at least one strict inequality. (22)

Example 0.4 (Tree order).

2={βk0,k=1,,g}with at least one strict inequality. (23)

Example 0.5 (Umbrella order).

4={0β1βk1βkβk+1βg}with at least one strict inequality. (24)

Estimation of β under a certain pattern (constraint) can be obtained by solving the following convex optimization problem49:

βˆopt=arg minβ(βˆβ)TΣˆ(g)1(βˆβ), (25)

where Σˆ(g) is the corresponding sub-matrix of Σˆ defined in (10). The solution to (25) can be numerically obtained by using a suitable convex optimization algorithm, such as CVRX50.

Example 0.6. Suppose there are 3 groups, namely, groups 0 (reference), 1, and 2, and no other covariates. For each sample i,i=1,,n, we have:

yi=θi+μ+β1I{group=1}+β2I{group=2}+εi.

To test whether the group effect is monotonically increasing, we would like to test:

H0:β1=β2=0,H1:β={0β1β2},with at least one strict inequality.

The estimation of β under can be obtained by solving:

βˆopt=arg minβ2(βˆβ)TΣˆ(g)1(βˆβ),s.t.Aβ0,

where A=1011, and β=(β1,β2)T.

Once the constrained estimator is obtained, there exist a variety of options to test the above hypotheses. For example, one may consider William’s type of statistic51. We adopt the following definitions from Peddada et al. (2002)7 to facilitate the construction of the test statistic.

Definition 2 (Linked parameters): Two parameters in a given pattern are said to be linked if the inequality between them is specified a priori.

Definition 3 (Nodal parameter): For a given pattern, a parameter is said to be nodal if it is linked with every other parameter in the profile.

For example, every parameter is a nodal parameter in 1; no nodal parameter in 2; and βk is the only nodal parameter in 3.

Definition 4 (Norm of maximum difference): Define the norm l() of pattern as the maximum difference between the estimates of two linked parameters.

For example, l(3)=max{βˆk,βˆkβˆg}.

Given a collection of potential patterns, 1,2,,T, the William’s type of test statistic is defined as:

W=max{l(t),t=1,,T},withtopt=arg max{l(t),t=1,,T},

where topt is regarded as the optimal pattern for the microbial abundance of a specific taxon.

Under null hypothesis, the expectations for βˆk,k=1,,g are 0s; thus, we can construct the null distribution of W as follows:

  1. Generate βˆk(b)Var^(βˆk)N(0,1),l=1,,g.

  2. Obtain constrained regression estimators for βˆkopt,(b) using the convex optimization problem described above.

  3. Compute W(b)=maxlt,t=1,,T using the simulated data under pre-specified patterns.

  4. Repeat the above steps B times, and we get the null distribution of W.

The raw p-value is calculated as

p=1Bb=1BI(W(b)>W).

We then apply the Holm-Bonferroni correction or BH procedure on raw p-values to control the FDR.

ANCOM-BC2 for mixed effects models

Similar to the fixed effects model stated in (3), for each taxon j,j=1,,d, and each sample i,i=1,,n, suppose each sample has ni observations and ini=n. The offset-based mixed effects log-linear model is set up as

yij=θi1ni+Xiβj+Ziαi+εij, (26)

where

  1. yij is the ni-vector centered observed abundances,

  2. 1ni=(1,,1)Tni is a vector of 1s,

  3. Xi is the ni×p design matrix for fixed effects,

  4. βj is the p-vector of fixed effects regression coefficients to be estimated,

  5. Zi is the ni×q design matrix for the random effects,

  6. αi is the q-vector random effects,

  7. εij is the ni-vector residuals.

The following distributional assumptions are made

αiN(0,Dq×q),εijN(0,σj2Ini),αiεijfori=1,,n.

Thus, for each taxon j,j=1,,d, and each sample i,i=1,,n, we have

yijN(θi1ni+Xiβj,Hij(τ)),

where Hij(τ)=ZiDZiT+σj2Ini (or Hij for short) denotes a general covariance matrix parametrized by τ.

Stack up observations across samples, we have:

yj=θ+Xβj+Zα+εj, (27)

where

yj=[y1jy2jynj], θ=[θ11n1θ21n2θn1nn], X=[X1X2Xn], βj=[βj1βj2βjp],Z=[Z1000Z10000Z1], α=[ε1jα2ε2jαn],εj=[ε1jε2jεnj].

That is,

yjN(θ+Xβj,Hj(τ)=[H1j(τ)000H2j(τ)0000Hnj(τ)]),

where Hj(τ) (or Hj for short) is a block diagonal matrix.

Similarly, we estimate θ and βj iteratively to obtain the corresponding preliminary estimators. As compared to Algorithm 1, the Maximum Likelihood (ML) is replaced with Restricted Maximum Likelihood (ReML)52, 53.

Algorithm 3.

Iterative ReMLE

  1:   Initialize:
  For j=1,,d
  θ0
  yj(crt)yjθ=yj
  βjReML(yj(crt))=ReML(yj)
  2:   while not converge do
  3:   θ1dj=1d(yjXβj)
  4:   yj(crt)yjθ
  5:   βjReML(yj(crt))
  6:   end while

Note that the estimators for regression coefficients βj and variance components τ are obtained iteratively by maximizing the following log-likelihood function:

(τyj)=i=1nlog|Hij|i=1nlog|XiTHij1Xi|i=1n(yijXiβj)THij1(yijXiβj), (28)

where βj(XTHj1X)1XTHj1yj. As close-form solutions of (28) do not exist, Newton-Raphson method is usually employed54.

Suppose on convergence, θθ,yj(crt)yj(crt),HH,βjβj, we have

θ=1dj=1d(yjXβj),yj(crt)=yjθ,βj=(XTHj1X)1XTHj1yj(crt).

It is easy to show that there exists a vector δp, such that

E(θ)=θXδ,E(βj)=δ+βj.

i.e., βj is a biased estimator for βj.

Similar to the case of fixed effects model, we fit the Gaussian mixture model to each βjk,k=1,,p separately, to correct the bias δ, and final estimators for βj and θ are given by

βˆj=βjδˆEM,θˆ=1dj=1d(yjXβˆj).

Thus, for taxon j, the Wald statistic for the following hypotheses

H0:Aβj=Aβj0,H1:AβjAβj0,

is given by

Wj=(AβˆjAβj0)T(AΣˆjAT)1(AβˆjAβj0)=(AβjAδˆEMAβj0)T(AΣˆjAT)1(AβjAδˆEMAβj0)dχq2,

where

  1. Σˆj=XTHj1X1,

  2. q=rank(A).

To draw statistical inference under inequality constraints in linear mixed effects model, we use William’s type test statistic as specified in the last section, and adopt the Constrainted Linear Mixed Effects (CLME) framework42 into ANCOM-BC. The procedure is summarized as follows:

  1. Obtain βˆj, the estimate of βj under the null hypothesis,

  2. Compute the observed values of residuals and random effects: εˆj=(IX(XTHˆj1X)1XTHˆj1)yj(crt), and αˆ=θˆZTHˆj1εˆj,

  3. Standardize the observed values of the random effects and residuals. Define aˆ=SE(αˆ)1αˆ, and eˆ=SE(εˆ)1εˆ, where SE() denotes the standard error,

  4. Obtain bootstrap samples. Let a(b) and e(b) denote the bootstrap samples of a and e, respectively. Then define αˆ(b)=SE(αˆ)a(b) and εˆ(b)=SE(εˆ)e(b). Finally construct the final bootstrap sample as: yj (b)=θˆ+Xβˆj+Zαˆ(b)+εˆ(b),

  5. Repeat B times (b=1,,B), we construct the null distribution for the test statistic.

  6. Compute raw p-values,

  7. Apply the Holm-Bonferroni correction (default) or BH procedure on raw p-values to control the FDR.

Acknowledgements

This research was supported [in part] by funding from NIEHS intramural program ZIA ES103389-01.

Footnotes

Additional Declarations: There is NO Competing Interest.

Code availability

ANCOM-BC2 has been implemented in the R package ANCOMBC, which is available on Bioconductor at https://www.bioconductor.org/packages/release/bioc/html/ANCOMBC.html. All analyses shown in the paper can be found on the corresponding GitHub repository55.

Competing interests

Authors declare no conflicts of interest.

Data availability

Simulation data can be found on the corresponding GitHub repository55. The soil microbiome data for aridity can be found in Qiita: https://qiita.ucsd.edu/study/description/10360. The gut microbiome data for IBD can be found in Qiita: https://qiita.ucsd.edu/study/description/11546.

References

  • 1.Tierney B. T. et al. Multidrug-resistant acinetobacter pittii is adapting to and exhibiting potential succession aboard the international space station. Microbiome 10, 1–14 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Neilson J. W. et al. Significant impacts of increasing aridity on the arid soil microbiome. MSystems 2, e00195–16 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Mandal S. et al. Analysis of composition of microbiomes: a novel method for studying microbial composition. Microb. ecology health disease 26, 27663 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Lin H. & Peddada S. D. Analysis of compositions of microbiomes with bias correction. Nat. communications 11, 1–11 (2020) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Hu Y., Satten G. A. & Hu Y.-J. Locom: A logistic regression model for testing differential abundance in compositional microbiome data with false discovery rate control. Proc. Natl. Acad. Sci. 119, e2122788119 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Benjamini Y. & Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Royal statistical society: series B (Methodological) 57, 289–300 (1995). [Google Scholar]
  • 7.Peddada S. D. et al. Gene selection and clustering for time-course and dose–response microarray experiments using order-restricted inference. Bioinformatics 19, 834–841 (2003). [DOI] [PubMed] [Google Scholar]
  • 8.Guo W., Sarkar S. K. & Peddada S. D. Controlling false discoveries in multidimensional directional decisions, with applications to gene expression data on ordered categories. Biometrics 66, 485–492 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Grandhi A., Guo W. & Peddada S. D. A multiple testing procedure for multi-dimensional pairwise comparisons with application to gene expression studies. BMC bioinformatics 17, 104 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Gohir W. et al. Pregnancy-related changes in the maternal gut microbiota are dependent upon the mother’s periconceptional diet. Gut microbes 6, 310–320 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Wu H.-J. & Wu E. The role of gut microbiota in immune homeostasis and autoimmunity. Gut microbes 3, 4–14 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Koren O. et al. Host remodeling of the gut microbiome and metabolic changes during pregnancy. Cell 150, 470–480 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Mesa M. D. et al. The evolving microbiome from pregnancy to early infancy: a comprehensive review. Nutrients 12, 133 (2020) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Kaul A., Mandal S., Davidov O. & Peddada S. D. Analysis of microbiome data in the presence of excess zeros. Front. microbiology 8, 2114 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Zhou H., He K., Chen J. & Zhang X. Linda: linear models for differential abundance analysis of microbiome compositional data. Genome biology 23, 1–23 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.McLaren M. R., Willis A. D. & Callahan B. J. Consistent and correctable bias in metagenomic sequencing experiments. Elife 8, e46923 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Tusher V. G., Tibshirani R. & Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. 98, 5116–5121 (2001). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Costea P. I., Zeller G., Sunagawa S. & Bork P. A fair comparison. Nat. methods 11, 359–359 (2014). [DOI] [PubMed] [Google Scholar]
  • 19.Paulson J. N., Bravo H. C. & Pop M. Reply to:" a fair comparison". Nat. methods 11, 359–360 (2014). [DOI] [PubMed] [Google Scholar]
  • 20.Vandeputte D. et al. Quantitative microbiome profiling links gut community variation to microbial load. Nature 551, 507–511 (2017). [DOI] [PubMed] [Google Scholar]
  • 21.Holm S. A simple sequentially rejective multiple test procedure. Scand. journal statistics 65–70 (1979). [Google Scholar]
  • 22.Lim C., Sen P. K. & Peddada S. D. Robust analysis of high throughput screening (hts) assay data. Technometrics 55, 150–160 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Belov A. A., Cheptsov V. S. & Vorobyova E. A. Soil bacterial communities of sahara and gibson deserts: Physiological and taxonomical characteristics. AIMS microbiology 4, 685 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Peddada S., Harris S., Zajd J. & Harvey E. Oriogen: order restricted inference for ordered gene expression data. Bioinformatics 21, 3933–3934 (2005). [DOI] [PubMed] [Google Scholar]
  • 25.Botero L. M. et al. Thermobaculum terrenum gen. nov., sp. nov.: a non-phototrophic gram-positive thermophile representing an environmental clone group related to the chloroflexi (green non-sulfur bacteria) and thermomicrobia. Arch. microbiology 181, 269–277 (2004). [DOI] [PubMed] [Google Scholar]
  • 26.Lau C. H.-F., van Engelen K., Gordon S., Renaud J. & Topp E. Novel antibiotic resistance determinants from agricultural soil exposed to antibiotics widely used in human medicine and animal farming. Appl. Environ. Microbiol. 83, e00989–17 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Oyejobi G. K., Sule W. F., Akinde S. B., Khan F. M. & Ogolla F. Multidrug-resistant enteric bacteria in nigeria and potential use of bacteriophages as biocontrol. Sci. The Total. Environ. 153842 (2022). [DOI] [PubMed] [Google Scholar]
  • 28.Montero-Calasanz M. d. C. et al. Geodermatophilus poikilotrophi sp. nov.: a multitolerant actinomycete isolated from dolomitic marble. BioMed research international 2014 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Li J. L. et al. Antichlamydial dimeric indole derivatives from marine actinomycete rubrobacter radiotolerans. Planta Medica 83, 805–811 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Walker C. B. et al. Nitrosopumilus maritimus genome reveals unique mechanisms for nitrification and autotrophy in globally distributed marine crenarchaea. Proc. Natl. Acad. Sci. 107, 8818–8823 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Grady E. N., MacDonald J., Liu L., Richman A. & Yuan Z.-C. Current knowledge and perspectives of paenibacillus: a review. Microb. cell factories 15, 1–18 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Chen H. et al. One-time nitrogen fertilization shifts switchgrass soil microbiomes within a context of larger spatial and temporal variation. Plos one 14, e0211310 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Fang X. et al. Gastrointestinal surgery for inflammatory bowel disease persistently lowers microbiome and metabolome diversity. Inflamm. bowel diseases 27, 603–616 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.López-Almela I. et al. Bacteroides uniformis combined with fiber amplifies metabolic and immune benefits in obese mice. Gut Microbes 13, 1–20 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Horvath T. D. et al. Bacteroides ovatus colonization influences the abundance of intestinal short chain fatty acids and neurotransmitters. Iscience 25, 104158 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Chang S.-C. et al. A gut butyrate-producing bacterium butyricicoccus pullicaecorum regulates short-chain fatty acid transporter and receptor to reduce the progression of 1, 2-dimethylhydrazine-associated colorectal cancer. Oncol. Lett. 20, 1–1 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Peterson C. T. et al. Short-chain fatty acids modulate healthy gut microbiota composition and functional potential. Curr. Microbiol. 79, 128 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Zhou Y. et al. F. prausnitzii and its supernatant increase scfas-producing bacteria to restore gut dysbiosis in tnbs-induced colitis. AMB express 11, 1–10 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Nie K. et al. Roseburia intestinalis: A beneficial gut organism from the discoveries in genus and species. Front. Cell. Infect. Microbiol. 1147 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Rau M. et al. Fecal scfas and scfa-producing bacteria in gut microbiome of human nafld as a putative link to systemic t-cell activation and advanced disease. United Eur. gastroenterology journal 6, 1496–1507 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Farnan L., Ivanova A. & Peddada S. Constrained inference in biological sciences: linear mixed effects models under constraints. PLoS ONE 9, e84778 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Jelsema C. M. & Peddada S. D. Clme: an r package for linear mixed effects models under inequality constraints. J. statistical software 75 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Rao C. R. Linear statistical inference and its applications, vol. 2 (Wiley New York, 1973). [Google Scholar]
  • 44.Peddada S. D. & Smith T. Consistency of a class of variance estimators in linear models under heteroscedasticity. Sankhyā: The Indian J. Stat. Ser. B 1–10 (1997). [Google Scholar]
  • 45.McLachlan G. & Krishnan T. The EM algorithm and extensions, vol. 382 (John Wiley & Sons, 2007). [Google Scholar]
  • 46.Dunnett C. W. A multiple comparison procedure for comparing several treatments with a control. J. Am. Stat. Assoc. 50, 1096–1121 (1955) [Google Scholar]
  • 47.Dunnett C. W. & Tamhane A. C. Step-down multiple tests for comparing treatments with a control in unbalanced one-way layouts. Stat. medicine 10, 939–947 (1991). [DOI] [PubMed] [Google Scholar]
  • 48.Dunnett C. W. & Tamhane A. C. A step-up multiple test procedure. J. Am. Stat. Assoc. 87, 162–170 (1992). [Google Scholar]
  • 49.Silvapulle M. J. & Sen P. K. Constrained statistical inference: Order, inequality, and shape constraints (John Wiley & Sons, 2011). [Google Scholar]
  • 50.Fu A., Narasimhan B. & Boyd S. Cvxr: An r package for disciplined convex optimization. arXiv preprint arXiv:1711.07582 (2017). [Google Scholar]
  • 51.Williams D. A. Some inference procedures for monotonically ordered normal means. Biometrika 64, 9–14 (1977). [Google Scholar]
  • 52.Patterson H. D. & Thompson R. Recovery of inter-block information when block sizes are unequal. Biometrika 58, 545–554 (1971) [Google Scholar]
  • 53.Harville D. A. Bayesian inference for variance components using only error contrasts. Biometrika 61, 383–385 (1974). [Google Scholar]
  • 54.Lindstrom M. J. & Bates D. M. Newton—raphson and em algorithms for linear mixed-effects models for repeated-measures data. J. Am. Stat. Assoc. 83, 1014–1022 (1988). [Google Scholar]
  • 55.Lin H. Multi-group analysis of compositions of microbiomes with covariate adjustments and repeated measures. https://github.com/FrederickHuangLin/ANCOM-BC2-Code-Archive (2023). [DOI] [PMC free article] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Simulation data can be found on the corresponding GitHub repository55. The soil microbiome data for aridity can be found in Qiita: https://qiita.ucsd.edu/study/description/10360. The gut microbiome data for IBD can be found in Qiita: https://qiita.ucsd.edu/study/description/11546.


Articles from Research Square are provided here courtesy of American Journal Experts

RESOURCES