Multigroup analysis of compositions of microbiomes with covariate adjustments and repeated measures

Huang Lin; Shyamal Das Peddada

doi:10.1038/s41592-023-02092-7

. 2023 Dec 29;21(1):83–91. doi: 10.1038/s41592-023-02092-7

Multigroup analysis of compositions of microbiomes with covariate adjustments and repeated measures

Huang Lin ^1,², Shyamal Das Peddada ^1,^✉

PMCID: PMC10776411 PMID: 38158428

Abstract

Microbiome differential abundance analysis methods for two groups are well-established in the literature. However, many microbiome studies involve more than two groups, sometimes even ordered groups such as stages of a disease, and require different types of comparison. Standard pairwise comparisons are inefficient in terms of power and false discovery rates. In this Article, we propose a general framework, ANCOM-BC2, for performing a wide range of multigroup analyses with covariate adjustments and repeated measures. We illustrate our methodology through two real datasets. The first example explores the effects of aridity on the soil microbiome, and the second example investigates the effects of surgical interventions on the microbiome of patients with inflammatory bowel disease.

Subject terms: Statistical methods, Medical research

ANCOM-BC2 is developed to perform multigroup differential abundance analysis and allows modeling of covariates and longitudinal measures while controlling false discovery rate (FDR) or mixed directional FDR.

Main

The differential abundance (DA) analysis of microbial taxa between two study groups is well-studied in the literature. Often two types of parameter are considered, namely the relative abundance and the absolute abundance of a taxon in a unit volume of an ecosystem. There exist several methods in the literature that can be used for performing differential relative abundance analysis between two groups such as count regression for correlated observations with the beta-binomial (CORNCOB)¹. While relative abundance (same as relative proportions) is a natural measure to consider, the DA of relative abundances has an important limitation. Specifically, differences in the absolute abundance of a single taxon between two groups may result in differential relative abundances of all taxa between the two groups^2,3. While this is mathematically correct, it does not help the researcher to discover the specific taxon that was DA between the two groups.

As an alternative to differential relative abundance analysis, several methods proposed in the literature can be used for differential absolute abundance analysis (hereafter referred to as DA analysis), which is the focus of this Article. Some examples include analysis of composition of microbiomes (ANCOM)², analysis of compositions of microbiomes with bias correction (ANCOM-BC)³, linear models for DA analysis (LinDA)⁴ and logistic compositional analysis (LOCOM)⁵. However, the methodology for multigroup DA analysis is not well-developed in the literature. Some researchers perform a series of pairwise tests with a false discovery rate (FDR) control within each pairwise comparison and pool the results from all such pairwise comparisons to interpret the data. Such a strategy does not account for the fact that multiple tests and multiple pairwise comparisons are being performed and hence the overall FDR is not controlled.

Standard procedures, such as the Benjamini–Hochberg procedure⁶, are designed for testing multiple hypotheses between two groups. When there are more than two groups, the standard concept of FDR, and methods controlling the corresponding error rates, need to be modified according to the study design and type of analyses to be performed^7–9. Some examples of interest include the following. (1) Multiple pairwise comparisons, in which a dietitian may be interested in making all pairwise comparisons of the gut microbial compositions among participants receiving diets D₁, D₂ or D₃. Furthermore, for each pairwise comparison, the goal is often to identify taxa whose abundance increased (or decreased). (2) Multiple pairwise comparisons against a specific reference group, the same as in scenario (1), but the investigator is only interested in comparing groups D₂ and D₃ against D₁, the reference group. (3) Pattern analysis over ordered study groups, where, in some instances, an investigator may be interested in discovering trends or patterns in abundances of taxa over ordered groups, such as the health of participants, changes in climate, doses of a drug and so on. For instance, during normal pregnancy, women experience major changes in their gut and vaginal microbiome¹⁰. These changes are necessary for maternal metabolism, immune response and hormonal changes to support pregnancy and to provide healthy flora for babies at birth^11,12. Thus, as the pregnancy progresses from the first to the third trimester, a researcher may be interested in discovering temporal changes in microbiota. Thus, in many scientific investigations, researchers are interested in studying changes in the microbiome over ordered conditions. The patterns of microbial abundance may not always be monotonic. They may display other shapes, such as an umbrella or an inverted umbrella with the location of the peak or trough unknown a priori. Additionally, depending on the scientific question of interest, repeated measures are taken on the same participant. Although the pattern analyses mentioned here could be accomplished by conducting a sequence of pairwise tests over adjacent ordered groups, such a strategy may have lower power than a test designed for pattern analysis, as will be demonstrated in the analysis of soil aridity data described later in this Article.

The objective of this Article is to develop methodologies for performing multigroup DA analyses. A formal methodology for performing such analyses does not appear to be available in the literature, with a few exceptions, such as ANCOM-II (ref. ¹³). While ANCOM-II considered the above testing problems, it does not develop a formal framework for bias correction. The more recent methodology LinDA⁴, which uses a model similar to the one developed in ANCOM-II, does not address the above multigroup testing problems. Thus, there is a major gap in the literature for analyzing multigroup microbiome studies, which will be filled by the methodology developed in this Article called analysis of compositions of microbiomes with bias correction 2 (ANCOM-BC2).

Although the ANCOM-BC methodology accounted for sample-specific bias, for better control of FDR, ANCOM-BC2 also accounts for taxon-specific bias. This is important because sequencing efficiencies can vary across taxa, leading to a taxon-specific bias when some taxa are preferentially measured over others during sequencing. For example, gram-positive bacteria have stronger cell walls than gram-negative bacteria, making them harder to extract during the data preparation step. Consequently, gram-positive bacteria may be underrepresented in the observed counts, leading to biased results if taxon-specific biases are not properly accounted for in the analysis¹⁴. Also, it is well-known that small effect sizes are associated with small variances in high throughput data¹⁵. Consequently, in such cases, the value of the test statistics is inflated, resulting in a highly significant P value. Inspired by the significance analysis of microarrays (SAM)¹⁵ methodology, we regularize the variance to avoid inflated values for the test statistics and hence moderate the P values for a better control of FDR. Lastly, zeros are a common problem for log-abundance based DA methods, including ANCOM-BC. Often such methods use pseudo-counts to deal with zero before taking logarithms. However, the choice of pseudo-count can affect the results for rare taxa containing excess zeros, which potentially leads to an inflated FDR^13,16,17. To mitigate this issue, we conduct a sensitivity analysis to filter a DA taxon that potentially is a false positive. Details of the procedure are provided in the Methods section.

Using constrained statistical inference-based methods⁷ and mixed directional FDR (mdFDR) methods for multiple pairwise comparisons^8,9, along with the above-noted modifications to ANCOM-BC, in this Article we develop ANCOM-BC2 for multigroup microbiome studies. ANCOM-BC2 allows modeling covariates as well as repeated measures. The performance of ANCOM-BC2 is evaluated using extensive simulation studies under a variety of settings. ANCOM-BC2 is also illustrated using two publicly available data, namely soil microbiome data and irritable bowel disease data.

Results

Simulations: settings

Inspired by applications, we conducted simulation studies under various scenarios incorporating different exposure types and covariate adjustments. We compared the performance of ANCOM-BC2, with ANCOM-BC (ref. ³), as well as state-of-the-art DA methods for absolute abundances: (1) LinDA⁴ and (2) LOCOM⁵. Although designed for relative abundances, CORNCOB, a DA method based on beta-binomial regression model, was also included in the simulation studies.

The absolute abundances were simulated using the Poisson log-normal (PLN) model as done in linear decomposition model framework¹⁸. The PLN model postulates that absolute abundance follows a Poisson distribution with a multivariate log-normal distribution for the mean. The population mean and covariance matrix for absolute abundance in the PLN model were derived from the upper respiratory tract (URT) microbiome data, featuring 60 samples and 382 operational taxonomic units (OTUs), extracted from the original 856-OTU dataset¹⁹. OTUs present in less than 5% of samples were omitted. It is important to note that ANCOM-BC2 is not based on PLN model and thus, this simulation set-up does not inherently favor ANCOM-BC2 over the competing methods described in this Article.

Motivated by the limitations of ANCOM-BC identified through our experience and in the literature, we conducted an exhaustive simulation study that includes edge cases where ANCOM-BC performs poorly. Additional details regarding the simulation design are provided in Extended Data Fig. 1. Many DA methods implicitly assume that many taxa (for example, more than 50%) are not DA. To understand the breakdown point of various methods, we varied the proportion of DA taxa from 5 to 90%. Our evaluation of pseudo-count effects on zeros led to two ANCOM-BC2 versions: ANCOM-BC2 (no filter) and ANCOM-BC2 (SS filter, where SS denotes sensitivity score), detailed in the Methods section. Notably, ANCOM-BC2 (SS filter) is intrinsically more conservative. For the control of FDR due to multiple testing, we favored the Holm–Bonferroni method²⁰ over the Benjamini–Hochberg procedure⁶ for all DA methods. The Holm–Bonferroni method, which allows arbitrary dependence structure among the underlying P values, is recognized to be robust to some extent for inaccurate P values²¹, a common problem with all DA methods. Further information regarding the simulation study set-up is provided in the Supplementary Methods.

Extended Data Fig. 1 — **(a)** Continuous exposure versus sampling fractions. Scatter plot for 150 simulated samples reveals the positive linear relationship between continuous exposure (X-axis) and sampling fractions (Y-axis). The regression fit is shown in blue. The strong correlation is emphasized by a Pearson’s R of 1 and a two-sided p value < 2.2 × 10⁻¹⁶**(b)** Binary exposure versus sampling fractions. Box plots detail distributions of sampling fractions (Y-axis) across two groups (X-axis) based on 150 simulated samples (75 per group). Each box signifies the interquartile range (IQR) of the data, the median is indicated by the interior line, and whiskers extend to the maximum and minimum values within 1.5 times the IQR from the box. Potential outliers are represented as points outside the whiskers, and jittered points indicate individual data points. A two-sided p-value < 2.2 × 10⁻¹⁶ from a Wilcoxon rank-sum test denotes significant group differences. **(c)** Categorical exposure versus sampling fractions. Box plots showcase distributions of sampling fractions (Y-axis) for three groups (X-axis) using 150 samples (50 per group). Each box, line, whisker, and point represents the same elements as in **(b)**. Pairwise significant differences are denoted by two-sided p-values < 2.2 × 10⁻¹⁶ following a Wilcoxon rank-sum test.

Simulations: continuous and binary exposures

Figure 1a presents the simulation results when the exposure variable is continuous. Both versions of ANCOM-BC2 had smaller FDR compared to other methods. ANCOM-BC2 (SS filter) consistently controlled FDR below the nominal level of 0.05. By contrast, the FDR of ANCOM-BC2 (no filter) increased with sample size, a consequence of excess zeros across the distribution of the exposure variable, which is more likely to generate false positives with a larger sample size. Both versions of ANCOM-BC2 generally outperformed all other methods, with ANCOM-BC2 (no filter) achieving the highest power. Conversely, all competing methods had considerably higher FDR than both versions of ANCOM-BC2. For instance, the FDR of LOCOM ranged from 5 to 40%. Similarly, LinDA and ANCOM-BC had FDRs ranging from 5 to 70%. LOCOM experienced a substantial decrease in power for small sample sizes. For example, the power was as low as 20% for n = 10. Although ANCOM-BC and LinDA had larger powers, they suffered from high FDR, exceeding the nominal level in most scenarios. We further note that as the sample size increased, the FDR of ANCOM-BC, LinDA and LOCOM increased. This suggests a systematic bias within these test statistics. The FDR of CORNCOB, a method designed for DA of relative abundances, consistently exceeded the nominal level and reached its maximum when a large number of taxa were differentially abundant (between 20 and 50%). This is attributed to the fact that differential absolute abundance in a single taxon could induce differential relative abundance of many null taxa^2,22.

Fig. 1 — a,b, The FDR and power of various DA methods for continuous (a) and binary exposures (b) are summarized. Synthetic datasets were generated using the PLN model¹⁸ based on the mean vector and covariance matrix estimated from the URT dataset¹⁹. The x axis represents the sample size (or sample size per group for the binary exposure), and the y axis shows the FDR or power. The dashed lines denote the nominal level of FDR (FDR = 0.05). The proportion of true DA taxa are provided in the top of each panel. The mean estimated FDR (or power) ± standard errors (indicated by error bars) derived from 100 simulation runs are provided in each panel.

Figure 1b presents the simulation results for DA analysis for a binary exposure. These results are generally consistent with those presented in Fig. 1a. The FDRs of competing methods were substantially inflated compared to the two versions of ANCOM-BC2, and those FDRs monotonically increased with sample size. The two versions of ANCOM-BC2 consistently maintained lower FDR than all competing methods. Similar to the continuous exposure variable case, ANCOM-BC2 (SS filter) always controlled the FDR at the nominal level, whereas ANCOM-BC2 (no filter) controlled FDR at the nominal for small to moderate sample sizes. For large sample sizes (for example, more than 50), it failed to control FDR within the nominal level but still had substantially lower FDR than LOCOM, LinDA, ANCOM-BC and CORNCOB. However, ANCOM-BC2 (no filter) had the highest power among all the methods. On the other hand, ANCOM-BC2 (SS filter) sacrificed about 10% of power, a concession that enables the control of FDR across all simulation settings.

To evaluate the power and FDR trade-off across the diverse DA methods, we computed the FDR adjusted power (FAP), as detailed in the Supplementary Methods. This measure (not a probability) is represented in relation to power in Extended Data Fig. 2. An elevated FAP indicates a superior power and FDR trade-off for a given power. Extended Data Fig. 2a corresponds to the continuous exposure case and Extended Data Fig. 2b pertains to the binary exposure case. From the cumulative distribution plots, we see that for any given power, both versions of ANCOM-BC2 have stochastically larger FAP values than all other methods (that is, their cumulative distribution functions are more to the right), with ANCOM-BC2 (SS filter) being stochastically the largest. Since, in practice not all methods have the same FDR, hence to account for the power and FDR trade-off, we advocate the use of FAP as a measure for comparing DA methods.

Extended Data Fig. 2 — FAP, defined as the log ratio of power and FDR, was employed to illustrate the power/FDR trade-off among all DA methods. FAP values were calculated using power and FDR metrics obtained from the simulation studies carried out for both **(a)** continuous and **(b)** binary exposure scenarios utilizing the URT dataset¹⁹. The far left panels of this figure present scatter plots of FAP (Y-axis) corresponding to the power (X-axis) for all DA methods considered in the simulation study reported in Fig. 1 in the main text. FAPs are expressed as mean values deduced from 100 simulation iterations per setting, with the linear regression line of FAP against power superimposed over the points. On the right of the scatter plots in each panel are the three cumulative density function (CDF) plots of FAP scores of various DA methods corresponding to powers exceeding 0.5, 0.8, and 0.9, respectively. These results underscore that both versions of ANCOM-BC2 have stochastically larger FAP scores than the competitors, with ANCOM-BC2 (SS Filter) being stochastically the largest.

Simulations: multiple groups

The simulation settings for multigroup comparisons mimic those outlined in the previous section.

Multiple pairwise comparisons against a reference group

We assessed the performance of ANCOM-BC2 (SS filter) and ANCOM-BC2 (no filter), ANCOM-BC and LinDA across three experimental groups with covariate adjustments. LOCOM and CORNCOB were not included because they are not designed for multiple groups. As illustrated in Fig. 2a, both versions of ANCOM-BC2 yielded smaller mixed directional FDR (mdFDR)^8,9, compared to other methods. Note that mdFDR accounts for errors due to multiple testing, multiple comparisons and directional errors. Specifically, ANCOM-BC2 (SS filter) effectively controlled mdFDR below the nominal level of 0.05. Although in some cases it results in a loss of about 10–20% power, it ensures more stringent mdFDR control. Even with this power reduction, ANCOM-BC2 (SS filter) maintains a robust power (more than 0.8) in most scenarios. Without the filter, ANCOM-BC2 (no filter) remains to be the most powerful DA method of all. Despite its mdFDR occasionally surpassing 0.05 for larger sample sizes (more than 50), it was still markedly better than both LinDA and ANCOM-BC, which struggled to control mdFDR efficiently.

Fig. 2 — a–c, The FDR (mdFDR) and power of various DA methods for multiple pairwise comparisons against a reference group (a), multiple pairwise comparisons (b) and pattern analysis (c) are summarized. Synthetic datasets were generated using the PLN model¹⁸ based on the mean vector and covariance matrix estimated from the URT dataset¹⁹. The x axis represents the sample size per group, and the y axis shows the FDR (mdFDR) or power. The dashed lines denote the nominal level of FDR (FDR = 0.05) or mdFDR (mdFDR = 0.05). The proportion of true DA taxa are provided in the top of each panel. The mean estimated FDR (or power) ± standard errors (indicated by error bars) derived from 100 simulation runs are provided in each panel. Within the context of multiple pairwise comparisons, ANCOM-BC2 (SS filter) effectively controlled FDR (mdFDR) while maintaining power similar to ANCOM-BC2 (no filter).

Multiple pairwise comparisons

We assessed ANCOM-BC2’s performance when making all possible pairwise comparisons instead of comparing against a specific reference group as done above. Since the competing methods considered in this Article are not currently designed for multiple pairwise comparisons, they are excluded. As depicted in Fig. 2b, ANCOM-BC2 (SS filter) effectively controlled the mdFDR below the nominal level of 0.05 while maintaining substantial power (more than 0.8) in most scenarios. However, as seen above, ANCOM-BC2 (no filter) controlled mdFDR within the nominal level for small sample sizes or when a large proportion of taxa are differentially abundant. However, when the sample sizes are large (for example, more than 50), it had an inflated mdFDR exceeding the nominal level.

Pattern analysis

Pattern analysis is another unique feature of ANCOM-BC2. In this simulation study, we modeled a scenario demonstrating a monotonically increasing pattern. Here, the log fold-change (denoted by δ) among the DA (or nonnull) taxa between the second group and the reference group ranged from 0.5 to 2.0, and the log fold-change of the third group relative to the first group was taken to be δ + 1. In this setting, a ‘discovery’ in pattern analysis refers to the identification of a taxon that displays a monotonically increasing pattern across all three groups. As described in Fig. 2c, both versions of ANCOM-BC2 controlled the FDR while maintaining high power exceeding 0.8 in most scenarios. Nonetheless, under the most extreme scenario where 90% of taxa were truly differentially abundant, ANCOM-BC2 encountered a power loss. The observed power loss is largely due to ANCOM-BC2’s built-in bias correction, which assumes that there is a sufficient number of null taxa.

Simulations: correlated samples

In this section, we evaluated the performance of ANCOM-BC2 in comparison to LinDA when the samples across experimental groups were correlated, such as in a repeated measurement design. We also considered linear mixed model (LMM) on CLR-transformed data (LMM-CLR), a method commonly used for repeated measurements. The interpretation of LMM-CLR results differs from the previously mentioned DA methods. According to LMM-CLR, a taxon is nonnull if it is differentially abundant relative to the geometric mean of all taxa, not its absolute. We included this method in our simulation study due to its frequent application in repeated measures analyses of microbiome data. ANCOM-BC, LOCOM and CORNCOB were excluded in this simulation as none of them are equipped to handle correlated experimental groups. We considered mixed-effects models with: (1) a random intercept and (2) a random intercept and a random slope. The random intercept had a standard deviation of 1 and the random slope had a standard deviation of 1.5, and both had mean zero. If both random effects were present, the correlation coefficient between them was set to 0.5. In each of these scenarios, the exposure variable consisted of three levels (that is, three experimental groups). The simulation study also included a continuous covariate. The remaining simulation settings adhered to those described in the previous sections (details in Supplementary Methods section). The simulation results for both scenarios are provided in Fig. 3. In each case, as in all previous settings, ANCOM-BC2 (SS filter) effectively controlled the mdFDR at or below the nominal level of 0.05, while maintaining substantial power (more than 0.8) in most of the simulation settings. On the other hand, ANCOM-BC2 (no filter) consistently exceeded the nominal mdFDR level of 0.05. Despite this, it had a larger power and smaller mdFDR than LinDA and LMM-CLR across all settings. LMM-CLR, generally exhibited the lowest power among all methods while having inflated mdFDR across all simulation scenarios. Notably, LMM-CLR’s rate of mdFDR rise was the most rapid with increasing sample size relative to the other methodologies.

Fig. 3 — a,b, The mdFDR and power of various DA methods in a random intercept model (a) and a random coefficients model (b) are summarized. Synthetic datasets were generated using the PLN model¹⁸ based on the mean and covariance estimated from the URT dataset¹⁹. The x axis represents the sample size per group, and the y axis shows the mdFDR or power. The dashed lines denote the nominal level of FDR (FDR = 0.05) or mdFDR (mdFDR = 0.05). The proportion of true DA taxa are provided in the top of each panel. The mean estimated mdFDR (or power) ± standard errors (indicated by error bars) derived from 100 simulation runs are provided in each panel.

Additional simulation studies

In addition to the URT data, we also analyzed a subset from the Quantitative Microbiome Project²³, comprising 106 samples and 91 OTUs. The findings paralleled those from the URT dataset (Extended Data Figs. 3–5).

Extended Data Fig. 3 — Synthetic datasets were generated using the PLN model¹⁸ based on the mean and covariance estimated from the QMP dataset²³. The X-axis shows the sample size (or sample size per group for the categorical covariate), and the Y-axis shows the FDR (mdFDR) or power. Each panel title designates the proportion of true DA taxa. The depicted metrics represent mean values ± standard errors (indicated by error bars) derived from 100 simulation runs for each setting. This visualization underscores the superiority of ANCOM-BC2-both with and without the sensitivity score (SS) filter-in consistently preserving minimal FDR or mdFDR while attaining satisfactory power, outpacing all other assessed methods.

Extended Data Fig. 5 — Synthetic datasets were generated using the PLN model¹⁸ based on the mean and covariance estimated from the QMP dataset²³. The X-axis shows the sample size per group, and the Y-axis shows the FDR (mdFDR) or power. Each panel title designates the proportion of true DA taxa. The depicted metrics represent mean values ± standard errors (indicated by error bars) derived from 100 simulation runs for each setting. The outcomes accentuate that, when integrated with the SS filter, ANCOM-BC2 effectively moderates FDR (mdFDR) while retaining power parallel to its performance without the SS filter. In the absence of the SS filter, ANCOM-BC2 surpasses LinDA and LMM-CLR in maintaining consistently low FDR and equivalent power.

Soil microbiome and aridity

Recently, Neilson et al.²⁴ investigated the differences in soil microbiomes according to soil aridity in the Atacama Desert in Chile. They classified soil samples into three ordered categories based on aridity, namely, arid, margin and hyper-arid, and sequenced data from 63 sample pits from 18 sites in the desert. Since they did not perform DA analyses of those data, we reanalyzed those data using the ANCOM-BC2 methodology. To begin with, we conducted a pattern analysis of richness with respect to the ordered aridity categories (arid to hyper-arid) (Fig. 4a). Using a constrained inference-based trend test⁷, executed using ORIOGEN²⁵ with 10,000 bootstraps, we discovered a significant loss of richness with the increase in aridity (P = 0.0001). This finding is consistent with Neilson et al.²⁴.

Fig. 4 — a, Violin plot illustrating the relationship between aridity and microbial richness. Samples encompass 63 biologically independent pits obtained from 18 distinct Atacama Desert sites in Chile²⁴. Each violin’s median value is signified by a central black dot, while the interquartile range is represented by a black bar. The violin’s width mirrors the density of data points at each richness value. Individual data points are also displayed as jittered dots. A trend test using the constrained inference-based approach⁷ suggests a significant decline in richness with increase in soil aridity (P = 0.0001). b, ANCOM-BC2 (no filter) pattern analysis heatmap in relation to aridity. Monotonically increasing and decreasing trends were evaluated across ordered soil categories, with arid soil as the reference. The columns denote soil categories and the significant genera identified by ANCOM-BC2 pattern analysis are provided in the rows. Each cell color represents abundance change: blue indicates reduction and red signifies increase. The log fold-changes relative to the reference group (arid group) are noted in each cell. The Holm–Bonferroni method was used for multiple testing correction. Genera represented in black are significant without a multiple testing correction, whereas those highlighted in green are significant after multiple testing correction. Additionally, genera marked with an asterisk are also significant after applying the ANCOM-BC2 (SS filter).

Next, we conducted a pattern analysis using ANCOM-BC2 (no filter) to identify trends in microbial abundance across the ordered soil categories, with arid soil serving as the reference group. Significant genera are presented in Fig. 4b. Genera in green were determined to be significant after adjusting for multiple testing. Additionally, genera denoted by an asterisk were also identified as significant when the conservative ANCOM-BC2 (SS filter) was applied. Blastococcus, Rubrobacter and Thermobaculum increased in mean absolute abundance with soil aridity (P < 0.05). The trend in Blastococcus was significant even after adjusting for multiple testing (adjusted P < 0.05) (Fig. 4b). Thermobaculum is known for its thermophilic properties, with some species thriving in temperatures up to 90 °C (ref. ²⁶). It has also been documented to possess antimicrobial-resistant genes^27,28. Similarly, the two Actinobacteria genera, Blastococcus and Rubrobacter, are also known for their antibacterial resistance^29,30. Thus, using ANCOM-BC2, we discovered genera that increased in abundance with aridity and may be antibacterial-resistant.

Elevated aridity in desert ecosystems has profound implications on soil health. For instance, increasing aridity in desert soils has been found to significantly diminish nitrogen-cycling microbes. Notable among the affected microbial taxa are Nitrobacter, a common contributor to nitrification, and potential widespread nitrogen fixers such as Sinorhizobium, Rhizobium and Azospirillum. These taxa were not detected in samples obtained from hyper-arid environments based on the results of the presence and absence test (Supplementary Table 1). In agreement with these findings, the ANCOM-BC2 (no filter) pattern analysis also revealed that increasing aridity correlates with significant reductions in beneficiary genera (Fig. 4b). The ANCOM-BC2 trend analysis revealed a significant decrease in the mean absolute abundance of Jiangella, Kaistobacter, Planctomyces and Pseudonocardia in relation to soil aridity (P < 0.05). Among them, Kaistobacter and Pseudonocardia remained significant after adjusting for multiple testing, and the result for Pseudonocardia did not change when the conservative ANCOM-BC2 (SS filter) was used. Pseudonocardia has been recognized for its nitrogen-fixing properties³¹ and its significance to biotechnology stems from its ability to synthesize secondary metabolites with antibacterial, antifungal and antitumor properties³². Likewise, Kaistobacter is known to foster homeostasis within soil microbial communities and acts as a suppressor of soil-borne pathogens³³. Moreover, Jiangella, a halotolerant actinobacterium, is distinguished by its association with nitrate solution, sulfonate transport systems, nitrite reductase and nitrogen fixation³⁴.

Gut microbial composition of patients with IBD

We illustrate ANCOM-BC2 using a longitudinal inflammatory bowel disease (IBD) dataset obtained from Fang et al.³⁵ to investigate the changes in the gut microbiome following gastrointestinal surgery in patients with IBD. The data in this study are based on 322 stool samples collected from 125 patients. Of these, 46 patients were diagnosed with ulcerative colitis and 79 with Crohn’s disease. Stool samples were obtained from each participant at approximately 6-month intervals, beginning at the baseline time point. Specifically, 21 patients provided one sample, 38 patients provided two samples, 41 patients provided three samples, 23 patients provided four samples and two patients provided five samples. Of the total patient population, 87 (70.0%) had no history of intestinal surgery, while 22 patients with Crohn’s disease had undergone ileocolonic resection and 13 patients with Crohn’s disease and three patients with ulcerative colitis had undergone different types of colectomy. These surgeries occurred before the collection of the baseline stool sample. For the purposes of this study, we focused on comparing the microbial compositions between patients who had not undergone gastrointestinal surgery, those who had undergone ileocolonic resection and those who had undergone colectomies. We adjusted the ANCOM-BC2 model for IBD disease type (ulcerative colitis versus Crohn’s disease) and two potential confounders, namely disease state (inactive versus active) and antibiotic use (absent versus present).

We performed multiple pairwise comparisons among the three groups controlling the overall mdFDR at 0.05 using ANCOM-BC2 (no filter). The results are depicted in Fig. 5. The log fold-changes emphasized in green are significant after adjusting for mdFDR. Further, changes marked with an asterisk were also significant by ANCOM-BC2 (SS filter) method. Ileocolonic section is the surgical removal of the diseased section of the ileum, which is the junction area between the small and last intestines. By contrast, colectomy is the surgical removal of most or all of the large intestine. Our analysis revealed that almost no microbial species were differentially abundant between the two surgical groups of patients, except for F. prausnitzii, which is more abundant in the colectomy group.

Fig. 5 — In a cohort of patients with IBD³⁵, the analysis entailed multiple pairwise comparisons among three distinct groups: ileocolonic resection, colectomy and no intestinal surgery, while maintaining an overall mdFDR at 0.05. The columns denote the specific comparisons: ileocolonic resection versus no intestinal surgery, colectomy versus no intestinal surgery and ileocolonic resection versus colectomy. The rows list significant species as identified by ANCOM-BC2. Each cell is color-coded to represent significant changes in absolute abundance: blue represents reduced abundance and red indicates increased abundance. Multiple testing corrections were performed using the Holm–Bonferroni method. The text within each cell represents the log fold-change value. The log fold-change values displayed in black represent significant changes without adjustment for mdFDR, whereas those in green are significant after applying mdFDR control. Furthermore, values with an asterisk are significant following the application of the ANCOM-BC2 (SS filter).

We observed marked reductions in the absolute abundance of several commensal gut bacterial species in patients who had undergone either ileocolonic resection or colectomy, in comparison to patients without any history of intestinal surgery. The affected species included Bacteroides spp. (ovatus and uniformis), Faecalibacterium prausnitzii and Roseburia faecis. Of particular note is the significant decrease in Faecalibacterium prausnitzii in patients subjected to ileocolonic resection. This reduction remained noteworthy even after using the conservative ANCOM-BC2 (SS filter) together with multiple testing corrections. A crucial aspect to consider is that most of these bacterial species are intrinsically involved in the production of short-chain fatty acids such as acetate, propionate and butyrate^36–42. These short-chain fatty acids are essential for maintaining gut health, bolstering gut barrier function, exhibiting anti-inflammatory properties and serving as energy sources for colonocytes. Thus, the surgical intervention on these patients, which was necessary, may have unintended effects on the host’s immune response and overall health due to the reduction of some important gut microbiota.

Discussion

In this article, we introduced a general framework called ANCOM-BC2 for performing DA analysis when the exposure variable is continuous, binary or (ordered) categorical. The proposed methodology allows for adjusting for covariates and repeated measures (longitudinal measures) while controlling for FDR, or mdFDR when the exposure variable has more than two groups and the researcher is interested in inferring whether the absolute abundance of a taxon increased or decreased within each pairwise comparison. Furthermore, using the theory of constrained statistical inference, ANCOM-BC2 allows researchers to infer patterns in microbial absolute abundance over ordered categories of exposure variables. For example, it allows a researcher to test whether a particular microbe increased (or decreased) in absolute abundance over ordered disease categories (very healthy to least healthy). This is a unique feature of ANCOM-BC2.

Driven by observed shortcomings of ANCOM-BC in specific edge cases, highlighted in our work and recent literature, we tailored our simulation study to evaluate ANCOM-BC2’s performance in these scenarios as well. The results of our simulation study demonstrate that ANCOM-BC2 provides a better FDR control over competing methods tested here while maintaining high power. In particular, ANCOM-BC2 (SS filter) consistently controlled the FDR or mdFDR below the nominal level in all simulation settings considered in this Article while maintaining high power. By contrast, ANCOM-BC2 (no filter) emerged as the DA method with the highest power, displaying a smaller FDR or mdFDR when compared with competing methods other than ANCOM-BC2 (SS filter). According to the FAP score introduced in this Article, ANCOM-BC2 (SS filter) and ANCOM-BC2 (no filter) had stochastically larger FAP scores than competitors with ANCOM-BC2 (SS filter) having the highest score. In terms of practical application, we endorse the use of ANCOM-BC2 (no filter) for small to moderate sample sizes (for example, n ≤ 50) when repeated measurements are absent. For larger sample sizes (for example, n > 50) or in cases of repeated measures, ANCOM-BC2 (SS filter) is recommended due to its superior FDR control. In pattern analyses, both ANCOM-BC2 (no filter) and ANCOM-BC2 (SS filter) perform equally well in terms of FDR control within the nominal level, although ANCOM-BC2 (no filter) demonstrates a marginally superior power.

The power of ANCOM-BC2’s pattern analysis was demonstrated in the soil microbiome data analyzed in this Article. When standard pairwise analyses were performed, only Pseudonocardia was differentially abundant across different groups (data not shown). However, using the pattern analysis, we discovered several taxa display increasing or decreasing trends over the ordered soil aridity groups. This is because, unlike pairwise comparisons, pattern analysis uses constrained inference methods, which ‘borrow’ information from ordered groups, thus increasing the effective sample size and the power^7,43,44.

The ileocolonic section and colectomy are procedures that surgically remove different regions of the intestines, and yet based on our analysis of the IBD data, there were no significant differences in the absolute abundance of most of the gut bacteria in these two groups. Furthermore, the two groups of patients have similarly reduced absolute abundances of certain bacteria relative to those who did not undergo either of the two surgeries. Based on these findings, it may be reasonable to hypothesize that most species of gut microbiota are spatially uniformly distributed in the ileum and large intestines.

Methods

Notation

Notations used in the ANCOM-BC2 methodology are summarized in Table 1. The overall procedure of the ANCOM-BC2 methodology is summarized in Extended Data Fig. 6.

Table 1.

Summary of notation

Notation	Description
i	Sample index, i = 1, 2, …, n.
j	Taxon index, j = 1, 2, …, d.
k	Index of fixed effects, k = 1, 2, …, p.
l	Index of random effects, l = 1, 2, …, q.
x_ik	The kth fixed effect of interest for the ith sample.
z_il	The lth random effect of interest for the ith sample.
A_ij^b	True absolute abundance of jth taxon in a unit volume of ecosystem of ith sample.
O_ij ^b	Observed count of jth taxon in a random specimen taken from a unit volume of ecosystem of ith sample.
E_ij ^b	Random error for taxon j in sample i.
S_i ^a	Sample-specific sampling fraction.
C_j ^a	Taxon-specific sequencing efficiency.
a_ij ^b	$\log A_{i j}$ .
o_ij ^b	$\log O_{i j}$ .
e_ij ^b	Random error for taxon j in sample i in log scale.
s_i ^a	Sample-specific sampling fraction in log scale.
c_j ^a	Taxon-specific sequencing efficiency in log scale.

Open in a new tab

^aParameter.

^bRandom variable.

Extended Data Fig. 6 — **Flowchart of the ANCOM-BC2 overall procedure**.

ANCOM-BC2 for fixed-effects models

Model assumptions

Assumption 1

Multiplicative model for observed counts:

O_{i j} = S_{i} C_{j} A_{i j} E_{i j} .

Assumption 1 indicates that, in expectation, the observed counts of a taxon in a random sample is in constant proportion to the true absolute abundance in a unit volume of the ecosystem of the sample. This proportion can be decomposed into two parts: (1) sample-specific sampling fraction and (2) taxon-specific sequencing efficiency.

According to Assumption 1, for nonzero observed count, the above multiplicative model can be transformed into an additive model by log transformation

o_{i j} = s_{i} + c_{j} + a_{i j} + e_{i j}^{(o)} .

Assumption 2

Linear model for log true absolute abundances: for each taxon j, a_ij, i = 1, …, n are independently distributed, and

a_{i j} = {b_{j}}^{T} x_{i} + e_{i j}^{(a)},

where

$x_{i} = {(1, x_{i 1}, x_{i 2}, \dots, x_{i p})}^{T}$ are the covariates of interest (including the intercept) for the ith sample,
$b_{j} = {(b_{j 0}, b_{j 1}, b_{j 2}, \dots, b_{j p})}^{T}$ are the corresponding coefficients for x_i.
$e_{i j}^{(a)}, i = 1, \dots, n$ are independently distributed random errors for log true absolute abundances with $E (e_{i j}^{(a)}) = 0, Var (e_{i j}^{(a)}) = σ_{j j}^{(a)}$ .

Assumption 3

(Independent random error for log observed counts): assume there are random errors, $e_{i j}^{(o)}, i = 1, \dots, n, j = 1, \dots, d$ , for log observed counts o_ij, which are independently distributed with heteroskedasticity:

E (e_{i j}^{(o)}) = 0, Var (e_{i j}^{(o)}) = σ_{i j}^{(o)}, e_{i j}^{(o)} ⊥ ⊥ e_{i j}^{(a)} .

Regression framework

Based on the Assumptions 2 and 3, o_ij can be modeled as:

o_{i j} = s_{i} + c_{j} + {b_{j}}^{T} x_{i} + e_{i j}^{(a)} + e_{i j}^{(o)} := s_{i} + c_{j} + {b_{j}}^{T} x_{i} + e_{i j},

with

E (o_{i j}) = s_{i} + c_{j} + {b_{j}}^{T} x_{i}, V a r (o_{i j}) = V a r (e_{i j}) = σ_{j j}^{(a)} + σ_{i j}^{(o)} := σ_{i j}^{(t)} .

where $σ_{i j}^{(t)}$ denotes the total variance.

Equation (1) can also be written in a vector notation as follows:

o_{j} = s + c_{j} 1 + X b_{j} + e_{j},

with

\begin{matrix} E (e_{j}) = {(0, \dots, 0)}^{T}, \\ E (o_{j}) = s + c_{j} 1 + X b_{j}, \\ Cov (o_{j}) = [\begin{matrix} σ_{1 j}^{(t)} & 0 & \dots & 0 \\ 0 & σ_{2 j}^{(t)} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & σ_{n j}^{(t)} \end{matrix}] . \end{matrix}

where

1 = (1, 1, …, 1)^T,
$o_{j} = {(o_{1 j}, o_{2 j}, \dots, o_{n j})}^{T}$ ,
$s = {(s_{1}, s_{2}, \dots, s_{n})}^{T}$ ,
$b_{j} = {(b_{j 0}, b_{j 1}, b_{j 2}, \dots, b_{j p})}^{T}$ ,
$e_{j} = {(e_{1 j}, e_{2 j}, \dots, e_{n j})}^{T}$ ,
$X = [\begin{matrix} 1 & x_{11} & x_{12} & \dots & x_{1 p} \\ 1 & x_{21} & x_{22} & \dots & x_{2 p} \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ 1 & x_{n 1} & x_{n 2} & \dots & x_{n p} \end{matrix}]$ .

It is important to note that within each sample i, for taxa l ≠ m, o_il and o_im are not necessarily independent due to correlations between a_il and a_im. Thus vectors o_l and o_m are not independent random vectors.

Remove the effect of taxon-specific sequencing efficiency

To eliminate the effect of c_j, we first center the log observed counts across samples, that is

\begin{matrix} y_{i j} : = o_{i j} - ō_{\cdot j} & = (s_{i} - \bar{s}) + {b_{j}}^{T} (x_{i} - \bar{x}) + (e_{i j} - ē_{\cdot j}), \\ : = θ_{i} + 𝛃_{j}^{T} x_{i} + ϵ_{i j}, \end{matrix}

where

β_jk = b_jk for k = 1, …, p, and $β_{j 0} = {b_{j}}^{T} \bar{x}$ ,
$Var (ϵ_{i j}) = \frac{{(n - 1)}^{2}}{n^{2}} σ_{i j}^{(t)} + \frac{1}{n^{2}} \sum_{i^{'} \neq i} σ_{i^{'} j}^{(t)} : = σ_{i j}$ .

Estimation of sample-specific bias

As can be seen from equation (3), β_j are not identifiable without determining the nuisance parameter θ_i. We define bias-corrected log absolute abundance $y_{i j}^{(c r t)} = y_{i j} - θ_{i}$ , then the ordinary least squares estimators of θ_i and β_j can be obtained by iteratively solving the following equations. For ease of exposition, the algorithm is described in the vector form, that is $y_{j} = {(y_{1 j}, y_{2 j}, \dots, y_{n j})}^{T}, 𝛉 = {(θ_{1}, θ_{2}, \dots, θ_{n})}^{T}$ and so on.

Algorithm 1. Iterative maximum likelihood estimation

Initialize:

For j = 1, …, d

θ ← 0

$y_{j}^{(crt)} \leftarrow y_{j} - 𝛉 = y_{j}$

$𝛃_{j} \leftarrow {(X^{T} X)}^{- 1} X^{T} y_{j}^{(crt)} = {(X^{T} X)}^{- 1} X^{T} y_{j}$

While not converge do

$𝛉 \leftarrow \frac{1}{d} \sum_{j = 1}^{d} (y_{j} - X 𝛃_{j})$

$y_{j}^{(crt)} \leftarrow y_{j} - 𝛉$

$𝛃_{j} \leftarrow {(X^{T} X)}^{- 1} X^{T} y_{j}^{(crt)}$

end while

On convergence,

𝛉^{*} = \frac{1}{d} \sum_{j = 1}^{d} (y_{j} - X 𝛃_{j}^{*}), {y_{j}^{(crt)}}^{*} = y_{j} - 𝛉^{*}, 𝛃_{j}^{*} = {(X^{T} X)}^{- 1} X^{T} {y_{j}^{(crt)}}^{*} .

Therefore

\begin{matrix} 𝛉^{*} & = \frac{1}{d} \sum_{j = 1}^{d} (y_{j} - X 𝛃_{j}^{*}) = \frac{1}{d} \sum_{j = 1}^{d} (y_{j} - P {y_{j}^{(crt)}}^{*}) \\ = \frac{1}{d} \sum_{j = 1}^{d} (y_{j} - P y_{j} + P 𝛉^{*}) = \frac{1}{d} \sum_{j = 1}^{d} [y_{j}^{(crt)} + 𝛉 - P (y_{j}^{(crt)} + 𝛉) + P 𝛉^{*}] \\ = (I - P) 𝛉 + P 𝛉^{*} + \frac{1}{d} \sum_{j = 1}^{d} (I - P) y_{j}^{(crt)} \\ = (I - P) 𝛉 + P 𝛉^{*} + \frac{1}{d} \sum_{j = 1}^{d} 𝛆_{j}, \end{matrix}

where

$P = X {(X^{T} X)}^{- 1} X^{T}$ is the projection matrix onto $C (X)$ , the column space of X,
$ε_{j} = (I - P) y_{j}^{(c r t)}$ with E(ε_j) = 0.

Rearranging equation (5), we see that

(I - P) 𝛉^{*} = (I - P) 𝛉 + \frac{1}{d} \sum_{j = 1}^{d} 𝛆_{j} .

Taking expectations on both sides leads to

(I - P) [E (𝛉^{*}) - 𝛉] = 0 .

As I − P is an orthogonal projector onto $C (X)$ , the above equation holds as long as either of the following is valid:

E(θ*) − θ = 0,
$E (θ^{*}) - θ \in C (X)$ .

It is sufficient to consider (2) because (1) is the trivial case. If (1) were true then from (4) we deduce that there is no sample-specific effect and that $E(\;{{{{\bf {\upbeta }}}}}_{j}^{* })={{{{\{\upbeta }}}}}_{j}$ . Suppose (2) is true, then there exists a vector $𝛅 \neq 0 \in R^{p}$ , such that

E({{{{\bf {\uptheta }}}}}^{* })={{{\bf {\uptheta }}}}-{{{{X}}}}{{{\{\updelta }}}}.

Then by combining with equation (4), we have

E (β_{j}^{*}) = δ + β_{j} .

We shall denote θ* and ${{{{\{\upbeta }}}}}_{j}^{* }$ obtained from the above iterative algorithm as preliminary estimators of θ and β_j, respectively. Without loss of generality, throughout this Article we assume X^TX is a full rank matrix. If it is not a full rank matrix, then we may use any generalized inverse of X^TX because ${{{{X}}}}{{{{{{\{\upbeta }}}}}}}_{j}^{* }$ in equation (5) is invariant of the choice of generalized inverse ${(X^{T} X)}^{g}$ used in $β_{j}^{*} = {(X^{T} X)}^{g} X^{T} y_{j}^{(c r t)}$ . Thus the preliminary estimator θ* provided above is invariant of the choice of generalized inverse used in deriving ${{{{\{\upbeta }}}}}_{j}^{* }$ . Furthermore, throughout this Article, we are interested in testing a hypothesis regarding linearly estimable parameters Aβ_j, that is $C (A^{T}) \subset C (X^{T})$ (ref. ⁴⁵). Consequently, the estimator $A β_{j}^{*}$ is invariant of the generalized inverse used in the estimation of $β_{j}^{*}$ . Hence, throughout this text, for simplicity of exposition, we shall assume X^TX is of full rank.

For each taxon j = 1, …, d, by equation (7), $β_{j}^{*}$ is a biased estimator if δ ≠ 0. Suppose we wish to test the following hypothesis

\begin{matrix} H_{0} : A β_{j} = A β_{j}^{0}, \\ H_{1} : A β_{j} \neq A β_{j}^{0} . \end{matrix}

Under the null hypothesis, $E({{{{A}}}}{{{{\mathbf{\upbeta }}}}}_{j}^{* })-{{{{A}}}}{{{{\mathbf{\upbeta }}}}}_{j}^{0}={{{{A}}}}{{{\mathbf{\updelta }}}}\ne {{{\{0}}}}$ and hence biased. The next step is to estimate this bias δ and accordingly modify the estimator $A β_{j}^{*}$ so that the resulting estimator is asymptotically centered at $A β_{j}^{0}$ under the null hypothesis and hence the test statistic is asymptotically centered at zero.

First we make the following observations. As $E (β_{j}^{*}) = δ + β_{j}$ , we note that as n → ∞, for finite dimension d,

{Σ_{j}}^{- \frac{1}{2}} (𝛃_{j}^{*} - (𝛅 + 𝛃_{j})) \to_{d} N_{P} (0, I),

where

Σ_{j} = \lim_{n \to \infty} {(X^{T} X)}^{- 1} (\sum_{i = 1}^{n} σ_{i j}^{2} x_{i} {x_{i}}^{T}) {(X^{T} X)}^{- 1} .

E (θ^{*} + X β_{j}^{*}) = θ - X δ + X (δ + β_{j}) = θ + X β_{j},

that is $θ^{*} + X β_{j}^{*}$ is an unbiased estimator of θ + Xβ_j, hence a possible estimator of Σ_j is given by

{\hat{Σ}}_{j} = {(X^{T} X)}^{- 1} (\sum_{i = 1}^{n} {(y_{i j} - θ_{i}^{*} - {𝛃_{j}^{*}}^{T} x_{i})}^{2} x_{i} {x_{i}}^{T}) {(X^{T} X)}^{- 1} .

Under some mild regularity conditions⁴⁶, with finite d, we have the following consistency result

n ({\hat{Σ}}_{j} - Σ_{j}) \to_{P} 0, as n \to \infty .

Therefore, replacing Σ_j with ${\hat{Σ}}_{j}$ in equation (8) and appealing to Slutsky’s theorem, we have

{\hat{Σ}}_{j}^{- \frac{1}{2}} (β_{j}^{*} - (δ + β_{j})) \to_{d} N_{P} (\b 0, I), a s n \to \infty .

By equations (9) and (11), under some mild regularity conditions, for finite d, we obtain

{\hat{Σ}}_{j} \to_{p} 0, as n \to \infty .

Consequently,

β_{j}^{*} \to_{P} δ + β_{j}, a s n \to \infty .

The above observation regarding the convergence of ${{{{\{\upbeta }}}}}_{j}^{* }$ plays a critical role in the following. Since the sampling fraction is constant for all taxa within a sample, we pool information across taxa within each sample when estimating δ. We model each taxon abundance using the following Gaussian mixture model. For the jth taxon and the kth covariate, let C₀ denote the set of taxa that are not differentially abundant with respect to x_ik, that is, C₀ = {j ∈ (1, 2, …, d): β_jk = 0}; let C₁ denote the set of taxa whose abundance decreases with x_ik, that is, C₁ = {j ∈ (1, 2, …, d): β_jk < 0}, and let C₂ denote the set of taxa whose abundance increases with x_ik, that is, C₂ = {j ∈ (1, 2, …, d): β_jk > 0}. Let π_r denote the probability that a taxon belongs to set C_r, r = 0, 1, 2. For simplicity of estimation of parameters, similar to generalized estimating equations, we shall assume that $β_{j k}^{*}, j = 1, 2, \dots, d$ , are independently distributed. As commonly done in the analyses of various omics data, we ignore the underlying correlation structure when estimating δ. Thus, we model the distribution of $β_{j k}^{*}$ by Gaussian mixture model as follows:

f (β_{j k}^{*}) = π_{0} ϕ (\frac{β_{j k}^{*} - δ_{k}}{ν_{j 0}}) + π_{1} ϕ (\frac{β_{j k}^{*} - (δ_{k} + l_{1})}{ν_{j 1}}) + π_{2} ϕ (\frac{β_{j k}^{*} - (δ_{k} + l_{2})}{ν_{j 2}}),

where

ϕ is the standard normal density function,
δ_k, δ_k + l₁ and δ_k + l₂ are means for $β_{j k}^{*} ∣ C_{0}, β_{j k}^{*} ∣ C_{1}$ and $β_{j k}^{*} ∣ C_{2}$ , respectively. l₁ < 0, l₂ > 0,
ν_j0, ν_j1 and ν_j2 are variances of $β_{j k}^{*} ∣ C_{0}, β_{j k}^{*} ∣ C_{1}$ and $β_{j k}^{*} ∣ C_{2}$ , respectively.

Note that instead of fitting a multivariate Gaussian mixture model for all covariates together, we choose to fit a univariate Gaussian mixture model repeatedly for every single covariate. This repetition is simply because the sets of taxa {C₀, C₁, C₂} are not necessarily the same for different covariates. Also, note that for a categorical covariate of s + 1 levels, this contains s coefficients, for example β_j1, …, β_js, and we shall fit the Gaussian mixture model for these s coefficients separately.

For computational simplicity, we assume that ν_j1 > ν_j0, ν_j2 > ν_j0. Thus, without loss of generality for κ₁, κ₂ > 0, let ν_j1 = ν_j0 + κ₁ and ν_j2 = ν_j0 + κ₂. While this assumption is not a requirement for our method, it is reasonable to assume that variability among differentially abundant taxa is larger than that among the null taxa. By making this assumption, we simplify the computation.

Assuming samples are independent, we begin by first estimating $ν_{j 0}^{2} = Var (β_{j k}^{*})$ . Note that $ν_{j 0}^{2}$ is the function of heteroscedastic variances, a consistent estimator of $ν_{j 0}^{2}$ , which we refer to as ${\hat{ν}}_{j 0}^{2}$ , is the kth diagonal element of ${\hat{Σ}}_{j}$ stated in equation (10). In all future calculations, we plug in ${\hat{ν}}_{j 0}^{2}$ for $ν_{j 0}^{2}$ . This is similar in spirit to many statistical procedures involving nuisance parameters. The following lemma⁴⁷ is useful in the sequel.

Lemma 1

Introducing the latent variable in calculating log-likelihood:

\log f (x ∣ θ) = E_{f (z ∣ x, θ)} [\log f (z ∣ θ) + \log f (x ∣ z, θ)] .

Let $Θ = {(δ_{k}, π_{1}, π_{2}, π_{3}, l_{1}, l_{2}, κ_{1}, κ_{2})}^{T}$ denote the set of unknown parameters, then for each taxon the log-likelihood can be reformulated using Lemma 1, as follows:

Θ \leftarrow \arg \max_{Θ} \sum_{j = 1}^{d} \sum_{r = 0}^{2} P_{r, j} [\log \Pr (j \in C_{r}) + \log f (β_{j k} ∣ j \in C_{r})] .

Then the EM algorithm is described as follows:

E step: compute conditional probabilities of latent variables. Define $P_{r, j} = \Pr (j \in C_{r} ∣ β_{j k}, Θ) = \frac{π_{r} ϕ (\frac{β_{j k} - (δ_{k} + l_{r})}{ν_{j r}})}{\sum_{r} π_{r} ϕ (\frac{β_{j k} - (δ_{k} + l_{r})}{ν_{j r}})}, r = 0, 1, 2; j = 1, \dots, d$ , which are conditional probabilities representing the probability that an observed value follows each distribution. Note that l₀ = 0.
M step: maximize the likelihood function with respect to the parameters, given the conditional probabilities.

We shall denote the resulting estimator of δ_k on convergence of the algorithm by ${\hat{δ}}_{k}^{EM}$ .

As stated in Lin and Peddada³, compared to ${\hat{ν}}_{j 0}^{2}$ , the variance and covariance contributed by ${\hat{δ}}_{k}^{EM}$ is negligible when the number of nondifferentially abundant taxa is large, such as when analyzing the microbiome data at the OTU, amplicon sequence variant (ASV) or species level of the phylogenetic tree.

The above procedure is applied to every β_jk, k = 1, …, p, eventually, we obtain the estimator of δ as

{\hat{δ}}^{E M} = {({\hat{δ}}_{1}^{E M}, {\hat{δ}}_{2}^{E M}, \dots, {\hat{δ}}_{P}^{E M})}^{T} .

Therefore, the final estimator of β_j is defined as

{\hat{β}}_{j} = β_{j}^{*} - {\hat{δ}}^{E M},

with

{\hat{β}}_{j} \to_{P} β_{j}, a s n \to \infty,

given that ${\hat{δ}}^{E M}$ is a good approximation of δ.

The estimation procedure is summarized in Algorithm 2.

Algorithm 2. EM algorithm

(1) input:

$β_{j}^{*}, Σ_{j}, j = 1, \dots, d$

(2) procedure EM $(β_{j}^{*}, Σ_{j})$

(3) return ${\hat{δ}}_{k}^{E M}, k = 1, \dots, P$

(4) end procedure

(5) for k = 1, …, p do

(6) ${\hat{β}}_{j k} \leftarrow β_{j k}^{*} - {\hat{δ}}_{k}^{EM}$

(7) end for

For taxon j, we now describe our methodology for testing the following hypotheses

\begin{matrix} H_{0} : A β_{j} = A β_{j}^{0}, \\ H_{1} : A β_{j} \neq A β_{j}^{0} . \end{matrix}

From Slutsky’s theorem, as n → ∞, the following test statistic is approximately central chi-square distributed under the null hypothesis

\begin{matrix} W_{j} = {(A {\hat{β}}_{j} - A β_{j}^{0})}^{T} {(A {\hat{Σ}}_{j} A^{T})}^{- 1} (A {\hat{β}}_{j} - A β_{j}^{0}) \\ = {(A β_{j}^{*} - A {\hat{δ}}^{E M} - A β_{j}^{0})}^{T} {(A {\hat{Σ}}_{j} A^{T})}^{- 1} (A β_{j}^{*} - A {\hat{δ}}^{E M} - A β_{j}^{0}) \\ \to_{d} χ_{q}^{2}, \end{matrix}

where q = rank(A).

To control the FDR due to multiple testing, we recommend applying Holm–Bonferroni method²⁰ instead of Benjamini–Hochberg procedure⁶ because the Holm–Bonferroni method does not require any assumptions regarding the dependence structure in the underlying P values, and is also known to be a better method to control FDR when P values are not accurate²¹.

Sample-specific biases estimation

After obtaining ${\hat{δ}}^{E M}$ , the estimator of sample-specific biases θ is defined as follows:

\hat{𝛉} = \frac{1}{d} \sum_{j = 1}^{d} (y_{j} - X {\hat{𝛃}}_{j}) .

Let $Σ^{(i)} = {[σ_{l m}^{(i)}]}_{l, m = 1, \dots, d}$ denote the d × d covariance matrix of $ε^{(i)} = (ϵ_{1}^{(i)}, ϵ_{2}^{(i)}, \dots, ϵ_{d}^{(i)})^{T}$ , where $σ_{l m}^{(i)}$ is the (l, m)th element of Σ⁽ⁱ⁾ and $σ_{j j}^{(i)}$ is the jth diagonal element of Σ⁽ⁱ⁾. Furthermore, suppose

From Assumption 4, we have

0 \leq 1^{T} Σ^{(i)} 1 = \sum_{l = 1}^{d} \sum_{m = 1}^{d} σ_{l m}^{(i)} = \sum_{j = 1}^{d} σ_{j j}^{(i)} + \sum_{l \neq m}^{d} σ_{l m}^{(i)} \leq d K + \sum_{l \neq m}^{d} σ_{l m}^{(i)} .

Hence

0 \leq \frac{1^{T} Σ^{(i)} 1}{d^{2}} \leq \frac{K}{d} + \frac{\sum_{l \neq m}^{d} σ_{l m}^{(i)}}{d^{2}} = o (1) .

Thus, for each taxon j = 1, 2, …, d, we have

\frac{1}{d} \sum_{j = 1}^{d} (y_{j} - (𝛉 + X 𝛃_{j})) \to_{P} 0, as d \to \infty .

Therefore, according to equations (17) and (19), as both n, d → ∞,

\hat{𝛉} \to 𝛉 .

Assumption 4

Sparse correlations among taxa:

\begin{matrix} σ_{j j}^{(i)} < K < \infty, \\ \frac{\sum_{l \neq m}^{d} σ_{l m}^{(i)}}{d^{2}} = o (1) . \end{matrix}

Remark 1

Regularization of variance: to avoid the spurious detection of significance due to extremely small standard errors, particularly for rare taxa, we incorporated a small positive constant in the denominator of the ANCOM-BC2 test statistic for each taxon. This approach was inspired by the significance analysis of microarray methodology¹⁵. Specifically, the regularization factor was set as the fifth percentile of the distribution of standard errors for each fixed effect, unless otherwise specified.

Remark 2

Sensitivity analysis for the pseudo-count addition: to mitigate the risk of inflated false-positive rates resulting from the choice of pseudo-count in ANCOM-BC2, we conducted a sensitivity analysis to assess the impact of varying pseudo-count values on DA results. This is particularly important, as several studies have shown that the choice of pseudo-count can significantly influence the results of DA analysis methods^16,17. For details regarding the sensitivity analysis and the definitions of the two version of ANCOM-BC2, refer to the section ‘Strategies implemented in ANCOM-BC2 to handle zeros’ below.

Multigroup comparison

In some applications, for a given taxon, researchers are interested in drawing inferences regarding DA among different pairs of experimental groups. We refer to this kind of problem as a multigroup comparison problem, and extra caution needs to be exercised to correct P values due to multiple comparisons. For simplicity, we drop the subscript j (taxon index) in the following discussions.

Global test

For a given taxon and a total of g + 1 experimental groups (including the reference group), researchers may want to test whether there exists at least one group that is significantly different from others. For ease of exposition, we split the covariates X into two parts, where X₁ stands for the group assignment and X₂ denotes the remaining covariates. Note that the difference of group effects against the reference group is estimable, while the individual group effect is not. For simplicity, in the discussions of multigroup comparisons among group 0 to group g, we assume group 0 is the reference group. We use β_k, k = 1, …, g to denote the group effect, but notice that it actually estimates β_k − β₀. We rewrite the model stated in equation (3) as

y = θ + X_{1} β + X_{2} γ + ε,

where

θ is the sample-specific bias,
β is the vector of group effects (as compared to group 0) of the order g × 1,
X₁ is the design matrix of the order n × g consisting of 0s and 1s,
X₂ is the known matrix of other covariates (including the intercept) of the order n × (p − g + 1) with the corresponding regression parameter vector γ of the order (p − g + 1) × 1.

The global test intends to test

\begin{matrix} H_{0} : \cap_{k \in {1, \dots, g}} β_{k} = 0, \\ H_{1} : \cup_{k \in {1, \dots, g}} β_{k} \neq 0, \end{matrix}

which can be reformulated as

\begin{matrix} H_{0} : A β = 0, \\ H_{1} : A β \neq 0, \end{matrix}

where

A = I_{g} = [\begin{matrix} 1 & 0 & 0 & \dots & 0 \\ 0 & 1 & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ & ⋮ \\ 0 & 0 & \dots & 0 & 1 \end{matrix}]

with the test statistic

W = (A \hat{𝛃})^{T} (A {\hat{Σ}}^{(g)} A^{T})^{- 1} (A \hat{𝛃}) \to_{d} χ_{g}^{2}, as n \to \infty,

where ${\hat{Σ}}^{(g)}$ is the corresponding submatrix of $\hat{Σ}$ defined in equation (10).

Similarly, to control the FDR due to multiple testing, we recommend applying Holm–Bonferroni method²⁰ instead of the Benjamini–Hochberg procedure⁶ due to the underlying complex dependence structure between taxa.

Example 1

Suppose there are three groups, namely, groups 0 (reference), 1 and 2, and no other covariates. For each sample i, i = 1, …, n, we have:

y_{i} = θ_{i} + μ + β_{1} I {group = 1} + β_{2} I {group = 2} + ϵ_{i} .

To test whether there is at least one group among 0, 1 and 2, that is significantly different from others, we test:

\begin{matrix} H_{0} : β_{1} = β_{2} = 0, \\ H_{1} : β_{1} \neq 0 \cup β_{2} \neq 0, \end{matrix}

which is the same as testing:

\begin{matrix} H_{0} : A β = 0, \\ H_{1} : A β \neq 0, \end{matrix}

where $A = [\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}]$ , and $β = {(β_{1}, β_{2})}^{T}$ .

Multiple pairwise comparisons

If we are interested in knowing whether the abundance increased or decreased between various pairs of groups, then it amounts to testing the following hypotheses:

\begin{matrix} H_{0, k, k^{'}} : β_{k} = β_{k^{'}} \\ H_{1, k, k^{'}} : {β_{k} < β_{k^{'}}} \cup {β_{k} > β_{k^{'}}}, \end{matrix}

where $k \neq k^{'} \in {1, \dots, g}$ . Denote the test statistic for a given pairwise comparison as

W_{k k^{'}} = \frac{{\hat{β}}_{k} - {\hat{β}}_{k^{'}}}{\sqrt{\hat{Var} ({\hat{β}}_{k}) + \hat{Var} ({\hat{β}}_{k^{'}})}} \to_{d} N (0, 1), as n \to \infty,

where $\hat{Var} ({\hat{β}}_{k})$ , $\hat{Var} ({\hat{β}}_{k^{'}})$ are the kth and $k^{'} th$ diagonal elements of ${\hat{Σ}}^{(g)}$ , respectively. Thus, the raw P value for comparing group k and group $k^{'}$ is defined as:

P_{k k^{'}} = 2 [1 - ϕ (∣ W_{k k^{'}} ∣)] .

For comparing with the reference group (group 0), the hypotheses become:

\begin{matrix} H_{0, k} : β_{k} = 0 \\ H_{1, k} : {β_{k} < 0} \cup {β_{k} > 0} . \end{matrix}

We also replace ${\hat{β}}_{k^{'}}$ and $\hat{Var} ({\hat{β}}_{k^{'}})$ with 0s in the test statistic.

Note that the null and alternative hypotheses for the global test are denoted as H₀ and H₁, a Type I error might occur due to wrongly rejecting H₀ or correctly rejecting H₀ but wrongly rejecting $H_{0, k, k^{'}}$ . A directional error might occur due to correctly rejecting H₀ but wrong assignment of the direction between β_k and $β_{k^{'}}$ while correctly rejecting $H_{0, k, k^{'}}$ . In this case, we need to control the error rate combining both type I and the directional errors in the FDR framework, which is referred to as mixed directional FDR (mdFDR)^8,9.

Definition 1

mdFDR: let V(j) denote the indicator function of at least one type I error or directional error committed, that is

V (j) = \{\begin{matrix} 1 & if Type I or directional error occurs, \\ 0 & otherwise. \end{matrix})

Then, mdFDR is defined as the expected proportion of Type I and directional errors among all discovered taxa.

mdFDR = E (\frac{\sum_{j = 1}^{d} V (j)}{\max (R, 1)}),

where R denotes the number of taxa discovered.

To control the mdFDR for all pairwise tests, we adopt the general mdFDR controlling procedure⁹, and do the following:

Apply the global test method stated above to obtain the P value for each taxon. We denote these P values as screening P values. Apply the Benjamini–Hochberg procedure to identify taxa that are differentially abundant in at least one pairwise comparison. Let R denote the number of taxa discovered.
For each taxon discovered in step (1), apply any mixed directional family wise error controlling procedure, such as Holm–Bonferroni (default), Hochberg and so on, to the pairwise P values ( $P_{k k^{'}}$ ) at level Rα/d.
For a given taxon discovered in step 1, if a pairwise hypothesis is rejected in step (2), then we declare $β_{k} < β_{k^{'}}$ or $β_{k} > β_{k^{'}}$ according to $W_{k k^{'}} < 0$ or more than 0.

It has been proved that under the assumption of independence of P values obtained from the global test, the mdFDR of the above procedure is strongly controlled at level α (ref. ⁹).

Example 2

Suppose there are three groups, namely, groups 0 (reference), 1 and 2, and no other covariates. For each sample i, i = 1, …, n, we have:

y_{i} = θ_{i} + μ + β_{1} I {group = 1} + β_{2} I {group = 2} + ϵ_{i} .

To test whether the taxon is differentially abundant between group 1 and 0 (reference), we test:

\begin{matrix} H_{0} : β_{1} = 0, \\ H_{1} : {β_{1} < 0} \cup {β_{1} > 0}, \end{matrix}

with the test statistic:

W_{10} = \frac{{\hat{β}}_{1}}{\sqrt{\hat{Var} ({\hat{β}}_{1})}} .

Additionally, if we want to test whether the taxon is differentially abundant between group 1 and 2:

\begin{matrix} H_{0} : β_{1} = β_{2}, \\ H_{1} : {β_{1} < β_{2}} \cup {β_{1} > β_{2}} . \end{matrix}

The test statistic is:

W_{12} = \frac{{\hat{β}}_{1} - {\hat{β}}_{2}}{\sqrt{\hat{Var} ({\hat{β}}_{1}) + \hat{Var} ({\hat{β}}_{2})}} .

Test against a specific group

Often, researchers are interested in knowing whether the abundance increased or decreased in an ecosystem relative a prespecified group, say the control group. Again, assume group 0 is the reference group and β₀ = 0, then one may be interested in testing the following hypotheses:

\begin{matrix} H_{0, k} : β_{k} = 0, \\ H_{1, k} : {β_{k} < 0} \cup {β_{k} > 0}, \end{matrix}

where k ∈ {1, …, g}.

As before, the pairwise test statistic is defined as follows:

W_{k} = \frac{{\hat{β}}_{k}}{\sqrt{\hat{Var} ({\hat{β}}_{k})}} \to_{d} N (0, 1), as n \to \infty,

where $\hat{Var} ({\hat{β}}_{k})$ is the kth diagonal elements of ${\hat{Σ}}^{(g)}$ . Thus, the raw P value for comparing group k and group 1 is defined as

P_{k} = 2 [1 - ϕ (∣ W_{k} ∣)] .

Likewise, we apply the mdFDR controlling procedure for all pairwise tests. To improve power, we modify the global test mentioned earlier to a Dunnet-based test^48–50 as described below:

The test statistic $W = \max_{k \in {1, \dots, g}} ∣ W_{k} ∣$ ,
Generate $W_{k}^{(b)} \approx N (0, 1), k = 1, \dots, g$ .
Compute $W^{(b)} = \max_{k \in {1, \dots, g}} ∣ W_{k}^{(b)} ∣$ .
Repeat the above steps B times, we get the null distribution of W.

The screening P value is calculated as:

P = \frac{1}{B} \sum_{b = 1}^{B} I (W^{(b)} > W) .

Pattern analysis

When the experimental groups are ordered naturally, such as doses of exposure or duration of exposure or stages of a disease and so on, for a given taxon, researchers may be interested in testing whether the abundance of the taxon is changing with the ordered experimental groups according to some specific pattern. Thus, the null and alternative hypotheses one wants to test become (assume group 0 is the reference group):

\begin{matrix} H_{0} : β_{1} = β_{2} = \dots = β_{g} = 0, \\ H_{1} : 𝛃 = {(β_{1}, \dots, β_{g})}^{T} \in C, \end{matrix}

where $C$ is one or a collection of patterns. Examples of patterns are given below.

Example 3

Simple order

C_{1} = {0 \leq β_{1} \leq β_{2} \leq \dots \leq β_{g}} with at least one strict inequality.

Example 4

Tree order

C_{2} = {β_{k} \geq 0, k = 1, \dots, g} with at least one strict inequality.

Example 5

Umbrella order

\begin{matrix} C_{4} = {0 \leq β_{1} \leq \dots \leq β_{k - 1} \leq β_{k} \geq β_{k + 1} \dots \geq β_{g}} \\ with at least one strict inequality. \end{matrix}

Estimation of β under a certain pattern (constraint) can be obtained by solving the following convex optimization (opt) problem⁵¹:

{\hat{β}}^{o p t} = a r g \underset{β \in C}{m i n} {(\hat{β} - β)}^{T} {\hat{Σ}}^{{(g)}^{- 1}} (\hat{β} - β),

where ${\hat{Σ}}^{(g)}$ is the corresponding submatrix of $\hat{Σ}$ defined in equation (10). The solution to equation (25) can be numerically obtained by using a suitable convex optimization algorithm, such as CVXR (ref. ⁵²).

Example 6

Suppose there are three groups, namely, groups 0 (reference), 1 and 2, and no other covariates. For each sample i, i = 1, …, n, we have:

y_{i} = θ_{i} + μ + β_{1} I {group = 1} + β_{2} I {group = 2} + ϵ_{i} .

To test whether the group effect is monotonically increasing, we test:

\begin{matrix} H_{0} : β_{1} = β_{2} = 0, \\ H_{1} : β \in C = {0 \leq β_{1} \leq β_{2}}, with at least one strict inequality. \end{matrix}

The estimation of β under $C$ can be obtained by solving:

\begin{matrix} {\hat{𝛃}}^{opt} = \arg \min_{𝛃 \in R^{2}} {(\hat{𝛃} - 𝛃)}^{T} {\hat{Σ}}^{{(g)}^{- 1}} (\hat{𝛃} - 𝛃), \\ s.t. A 𝛃 \geq 0, \end{matrix}

where $A = [\begin{matrix} 1 & 0 \\ - 1 & 1 \end{matrix}]$ , and $β = {(β_{1}, β_{2})}^{T}$ .

Once the constrained estimator is obtained, there exist a variety of options to test the above hypotheses. For example, one may consider William’s type of statistic⁵³. We adopt the following definitions from Peddada et al.⁷ to facilitate the construction of the test statistic.

Definition 2

Linked parameters: two parameters in a given pattern are said to be linked if the inequality between them is specified a priori.

Definition 3

Nodal parameter: for a given pattern, a parameter is said to be nodal if it is linked with every other parameter in the profile.

For example, every parameter is a nodal parameter in $C_{1}$ ; no nodal parameter in $C_{2}$ and β_k is the only nodal parameter in $C_{3}$ .

Definition 4

Norm of maximum difference: define the norm $l_{\infty} (C)$ of pattern $C$ as the maximum difference between the estimates of two linked parameters.

For example, $l_{\infty} (C_{3}) = \max {{\hat{β}}_{k}, {\hat{β}}_{k} - {\hat{β}}_{g}}$ .

Given a collection of potential patterns, $C_{1}, C_{2}, \dots, C_{T}$ , the William’s type of test statistic is defined as:

\begin{matrix} W = \max {l_{\infty} (C_{t}), t = 1, \dots, T}, \\ with t^{opt} = \arg \max {l_{\infty} (C_{t}), t = 1, \dots, T}, \end{matrix}

where t^opt is regarded as the optimal pattern for the microbial abundance of a specific taxon.

Under null hypothesis, the expectations for ${\hat{β}}_{k}, k = 1, \dots, g$ are 0s; thus, we can construct the null distribution of W as follows:

Generate ${\hat{β}}_{k}^{(b)} \approx \sqrt{\hat{Var} ({\hat{β}}_{k})} N (0, 1), k = 1, \dots, g$ .
Obtain constrained regression estimators for ${\hat{β}}_{k}^{opt, (b)}$ using the convex optimization problem described above.
Compute $W^{(b)} = \max {l_{\infty} (C_{t}), t = 1, \dots, T}$ using the simulated data under prespecified patterns.
Repeat the above steps B times, and we get the null distribution of W.

The raw P value is calculated as

P = \frac{1}{B} \sum_{b = 1}^{B} I (W^{(b)} > W) .

We then apply the Holm–Bonferroni correction or Benjamini–Hochberg procedure on raw P values to control the FDR.

ANCOM-BC2 for mixed-effects models

Similar to the fixed-effects model stated in equation (3), for each taxon j, j = 1, …, d, and each sample i, i = 1, …, n, suppose each sample has n_i observations and ∑_in_i = n. The offset-based mixed-effects log-linear model is set up as

y_{i j} = θ_{i} 1_{n_{i}} + X_{i} 𝛃_{j} + Z_{i} 𝛂_{i} + 𝛆_{i j},

where

y_ij is the n_i vector-centered observed counts,
$1_{n_{i}} = {(1, \dots, 1)}^{T} \in R^{n_{i}}$ is a vector of 1s,
X_i is the n_i × p design matrix for fixed effects,
β_j is the p vector of fixed-effects regression coefficients to be estimated,
Z_i is the n_i × q design matrix for the random effects,
α_i is the q vector random effects,
ϵ_ij is the n_i vector residuals.

The following distributional assumptions are made

\begin{matrix} α_{i} \sim N (0, D_{q \times q}), \\ ε_{i j} \sim N (0, σ_{j}^{2} 1_{n_{i}}), \\ α_{i} ⊥ ⊥ ε_{i j} f o r i = 1, \dots, n . \end{matrix}

Thus, for each taxon j, j = 1, …, d, and each sample i, i = 1, …, n, we have

y_{i j} \sim N (θ_{i} 1_{n_{i}} + X_{i} β_{j}, H_{i j} (τ)),

where $H_{i j} (τ) = Z_{i} D {Z_{i}}^{T} + σ_{j}^{2} I_{n_{i}}$ (or H_ij for short) denotes a general covariance matrix parametrized by τ.

Stack up observations across samples, we have:

y_{j} = θ + X β_{j} + Z α + ε_{j},

where

\begin{matrix} y_{j} = [\begin{matrix} y_{1 j} \\ y_{2 j} \\ ⋮ \\ y_{n j} \end{matrix}], θ = [\begin{matrix} θ_{1} 1_{n_{1}} \\ θ_{2} 1_{n_{2}} \\ ⋮ \\ θ_{n} 1_{n_{n}} \end{matrix}], X = [\begin{matrix} X_{1} \\ X_{2} \\ ⋮ \\ X_{n} \end{matrix}], β_{j} = [\begin{matrix} β_{j 1} \\ β_{j 2} \\ ⋮ \\ β_{j p} \end{matrix}], \\ Z = [\begin{matrix} Z_{1} & 0 & \dots & 0 \\ 0 & Z_{1} & 0 & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & Z_{1} \end{matrix}], α = [\begin{matrix} α_{1} \\ α_{2} \\ ⋮ \\ α_{n} \end{matrix}], ε_{j} = [\begin{matrix} ϵ_{1 j} \\ ϵ_{2 j} \\ ⋮ \\ ϵ_{n j} \end{matrix}] . \end{matrix}

That is,

y_{j} \sim N (θ + X β_{j}, H_{j} (τ) = [\begin{matrix} H_{1 j} (τ) & 0 & \dots & 0 \\ 0 & H_{2 j} (τ) & 0 & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & H_{n j} (τ) \end{matrix}]),

where H_j(τ) (or H_j for short) is a block diagonal matrix.

Similarly, we estimate θ and β_j iteratively to obtain the corresponding preliminary estimators. Compared to Algorithm 1, the maximum likelihood is replaced with restricted maximum likelihood (ReML)^54,55.

Algorithm 3. Iterative ReML estimation

1: Initialize:

For j = 1, …, d

θ ← 0

$y_{j}^{(c r t)} \leftarrow y_{j} - θ = y_{j}$

$β_{j} \leftarrow R e M L (y_{j}^{(c r t)}) = R e M L (y_{j})$

(2) While not converge do

(3) $𝛉 \leftarrow \frac{1}{d} \sum_{j = 1}^{d} (y_{j} - X 𝛃_{j})$

(4) $y_{j}^{(crt)} \leftarrow y_{j} - 𝛉$

(5) $β_{j} \leftarrow R e M L (y_{j}^{(c r t)})$

(6) end while

Note that the estimators for regression coefficients β_j and variance components τ are obtained iteratively by maximizing the following log-likelihood function:

L (τ ∣ y_{j}) = - \sum_{i = 1}^{n} \log ∣ H_{i j} ∣ - \sum_{i = 1}^{n} \log ∣ {X_{i}}^{T} H_{i j}^{- 1} X_{i} ∣ - \sum_{i = 1}^{n} {(y_{i j} - X_{i} 𝛃_{j})}^{T} H_{i j}^{- 1} (y_{i j} - X_{i} 𝛃_{j}),

where $\ubeta_{j} \leftarrow {(X^{T} {H_{j}}^{- 1} X)}^{- 1} X^{T} {H_{j}}^{- 1} y_{j}$ . As close-form solutions of equation (28) do not exist, the Newton–Raphson method⁵⁶ is usually used.

Suppose on convergence, $θ \leftarrow θ^{*}, y_{j}^{(c r t)} \leftarrow {y_{j}^{(c r t)}}^{*}, H \leftarrow H^{*}, β_{j} \leftarrow β_{j}^{*}$ , we have

\begin{matrix} 𝛉^{*} = \frac{1}{d} \sum_{j = 1}^{d} (y_{j} - X 𝛃_{j}^{*}), \\ {y_{j}^{(crt)}}^{*} = y_{j} - 𝛉^{*}, \\ 𝛃_{j}^{*} = {(X^{T} {H_{j}^{*}}^{- 1} X)}^{- 1} X^{T} {H_{j}^{*}}^{- 1} y_{j}^{{(crt)}^{*}} . \end{matrix}

It is easy to show that there exists a vector $𝛅 \in R^{P}$ , such that

\begin{matrix} E (𝛉^{*}) & = 𝛉 - X 𝛅, \\ E (𝛃_{j}^{*}) & = 𝛅 + 𝛃_{j} . \end{matrix}

that is, $β_{j}^{*}$ is a biased estimator for β_j.

Similar to the case of fixed-effects model, we fit the Gaussian mixture model to each β_jk, k = 1, …, p separately, to correct the bias δ, and final estimators for β_j and θ are given by

\begin{matrix} {\hat{𝛃}}_{j} = 𝛃_{j}^{*} - {\hat{𝛅}}^{EM}, \\ \hat{𝛉} = \frac{1}{d} \sum_{j = 1}^{d} (y_{j} - X {\hat{𝛃}}_{j}) . \end{matrix}

The statistical inference, including multi-group comparisons, for mixed-effects models, aligns with those outlined in previous sections for fixed-effects models, and therefore, it is not repeated here.

Strategies implemented in ANCOM-BC2 to handle zeros

ANCOM-BC2 deals with zero-related challenges in microbiome data as follows. (1) Structural zero identification: taxa that are exclusively present in one ecosystem but absent in another, result in structural zeros. For example, some taxa are exclusive to desert regions but entirely absent in rainforests. Hence, they are structural zeros in rainforests. Those zeros should not be imputed or ignored, and such taxa are DA between the two regions. As the first step, using ANCOM-II (ref. ¹³), ANCOM-BC2 identifies all DA taxa that are due to structural zeros, and no further analysis is performed on such taxa and they are cataloged separately in the software output. (2) Prevalence-based filtration: after filtering structural zeros, ANCOM-BC2 applies a prevalence-based filtration, akin to other DA methods. By default, taxa that feature in less than 10% of all samples are removed from further analysis. (3) Sensitivity analysis for pseudo-count addition to zeros: for the remaining taxa with some zeros, we perform a sensitivity analysis to assess their robustness to pseudo-counts as follows. Much like many DA analysis methodologies, since ANCOM-BC2 log transforms the observed counts, the counts need to be positive. Often pseudo-counts are added to deal with zeros. However, it is well-known that the choice of the pseudo-count can considerably influence the false-positive as well as false-negative rates^13,16,17. To mitigate this concern, we conduct a sensitivity analysis to evaluate the effect of varying pseudo-counts on zeros for each taxon. This procedure incorporates the addition of an array of pseudo-counts (ranging from 0.01 to 0.5 in increments of 0.01) to the zero counts for each taxon. Corresponding to each pseudo-count, ANCOM-BC2 is used for each taxon and P values for DA analysis are derived. The sensitivity score for each taxon is the proportion of instances where the P values exceed the specified significance level. If the proportion of significant (or non-significant) results is 1 and the significance (or non-significance) aligns with significance (or non-significance) using complete data (excluding zeros), then the taxon is regarded as insensitive to the pseudo-count addition. Otherwise, it is deemed sensitive. This step remains a recommendation and is at the discretion of the users. We offer two versions of ANCOM-BC2 for flexibility: (1) ANCOM-BC2 (no filter): this version only uses the first two steps for handling zeros and uses complete data (that is, excludes zeros by treating them as missing completely at random) for bias correction and inference. While it has larger power, it might display an inflated FDR, especially with larger sample sizes or repeated measures. (2) ANCOM-BC2 (SS filter): this version uses all three aforementioned steps for dealing with zeros and also uses complete data for both bias correction and inference. Specifically, if a taxon is found to be sensitive to pseudo-counts then it is declared as non-significant taxon. While more conservative, it provides rigorous control of FDR, albeit with a possible decrease in power.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Online content

Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41592-023-02092-7.

Supplementary information

Supplementary Information^{(369KB, pdf)}

Supplementary Methods and Table 1.

Reporting Summary^{(1.3MB, pdf)}

Peer Review File^{(1.1MB, pdf)}

Acknowledgements

This research by H.L. and S.D.P. was supported (in part) by funding from the National Institute of Environmental Health Sciences (NIEHS) intramural program no. ZIA ES103389-01.

Extended data

Author contributions

S.D.P. and H.L. contributed equally to the theory and methodology described in this Article. They also contributed equally to writing and editing the article. All numerical works and computations were conducted by H.L. who developed ANCOM-BC2 pipeline in R that is freely and publicly available. Please contact H.L. for software requests.

Peer review

Peer review information

Nature Methods thanks Christian Diener and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editors: Lei Tang and Hui Hua, in collaboration with the Nature Methods team.

Data availability

The URT data were sourced from the LOCOM R package https://github.com/yijuanhu/LOCOM-Archive. The Quantitative Microbiome Project data are accessible via the SPRING R package (https://github.com/GraceYoon/SPRING) or the ANCOMBC package (https://www.bioconductor.org/packages/release/bioc/html/ANCOMBC.html). Data pertaining to soil microbiome for aridity and gut microbiome in patients with IBD are hosted on Qiita, with respective links available at https://qiita.ucsd.edu/study/description/10360 and https://qiita.ucsd.edu/study/description/11546, respectively. Please note that accessing data on Qiita requires account registration and sign-in.

Code availability

ANCOM-BC2 has been implemented in the R package ANCOMBC, which is available on Bioconductor at https://www.bioconductor.org/packages/release/bioc/html/ANCOMBC.html. The code used for all analyses, with the exception of the trend test related to soil microbiome richness, in this Article is available in the associated GitHub repository and the corresponding Code Ocean capsule 10.24433/CO.0628172.v1. The specific trend test was conducted using ORIOGEN 4.01, obtainable at https://www.niehs.nih.gov/research/resources/software/biostatistics/oriogen/index.cfm.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

is available for this paper at 10.1038/s41592-023-02092-7.

Supplementary information

The online version contains supplementary material available at 10.1038/s41592-023-02092-7.

References

1.Martin BD, Witten D, Willis AD. Modeling microbial abundances and dysbiosis with beta-binomial regression. Annals. Appl. Stats. 2020;14:94. doi: 10.1214/19-aoas1283. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Mandal S, et al. Analysis of composition of microbiomes: a novel method for studying microbial composition. Microbial Ecol. Health Dis. 2015;26:27663. doi: 10.3402/mehd.v26.27663. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Lin H, Peddada SD. Analysis of compositions of microbiomes with bias correction. Nat. Commun. 2020;11:3514. doi: 10.1038/s41467-020-17041-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Zhou H, He K, Chen J, Zhang X. LinDA: linear models for differential abundance analysis of microbiome compositional data. Genome Biol. 2022;23:95. doi: 10.1186/s13059-022-02655-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Hu Y, Satten GA, Hu Y-J. Locom: a logistic regression model for testing differential abundance in compositional microbiome data with false discovery rate control. Proc. Natl Acad. Sci. USA. 2022;119:e2122788119. doi: 10.1073/pnas.2122788119. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodol.) 1995;57:289–300. [Google Scholar]
7.Peddada SD, et al. Gene selection and clustering for time-course and dose–response microarray experiments using order-restricted inference. Bioinformatics. 2003;19:834–841. doi: 10.1093/bioinformatics/btg093. [DOI] [PubMed] [Google Scholar]
8.Guo W, Sarkar SK, Peddada SD. Controlling false discoveries in multidimensional directional decisions, with applications to gene expression data on ordered categories. Biometrics. 2010;66:485–492. doi: 10.1111/j.1541-0420.2009.01292.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Grandhi A, Guo W, Peddada SD. A multiple testing procedure for multi-dimensional pairwise comparisons with application to gene expression studies. BMC Bioinform. 2016;17:104. doi: 10.1186/s12859-016-0937-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Gohir W, et al. Pregnancy-related changes in the maternal gut microbiota are dependent upon the mother’s periconceptional diet. Gut Microbes. 2015;6:310–320. doi: 10.1080/19490976.2015.1086056. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Wu H-J, Wu E. The role of gut microbiota in immune homeostasis and autoimmunity. Gut Microbes. 2012;3:4–14. doi: 10.4161/gmic.19320. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Koren O, et al. Host remodeling of the gut microbiome and metabolic changes during pregnancy. Cell. 2012;150:470–480. doi: 10.1016/j.cell.2012.07.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Kaul A, Mandal S, Davidov O, Peddada SD. Analysis of microbiome data in the presence of excess zeros. Front. Microbiol. 2017;8:2114. doi: 10.3389/fmicb.2017.02114. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.McLaren MR, Willis AD, Callahan BJ. Consistent and correctable bias in metagenomic sequencing experiments. eLife. 2019;8:e46923. doi: 10.7554/eLife.46923. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl Acad. Sci. USA. 2001;98:5116–5121. doi: 10.1073/pnas.091062498. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Costea PI, Zeller G, Sunagawa S, Bork P. A fair comparison. Nat. Methods. 2014;11:359–359. doi: 10.1038/nmeth.2897. [DOI] [PubMed] [Google Scholar]
17.Paulson JN, Bravo HC, Pop M. Reply to: ‘a fair comparison’. Nat. Methods. 2014;11:359–360. doi: 10.1038/nmeth.2898. [DOI] [PubMed] [Google Scholar]
18.Hu Y-J, Satten GA. Testing hypotheses about the microbiome using the linear decomposition model (LDM) Bioinformatics. 2020;36:4106–4115. doi: 10.1093/bioinformatics/btaa260. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Charlson ES, et al. Disordered microbial communities in the upper respiratory tract of cigarette smokers. PLoS ONE. 2010;5:e15216. doi: 10.1371/journal.pone.0015216. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Holm, S. A simple sequentially rejective multiple test procedure. Scandi. J. Stat.6, 65–70 (1979).
21.Lim C, Sen PK, Peddada SD. Robust analysis of high throughput screening (HTS) assay data. Technometrics. 2013;55:150–160. doi: 10.1080/00401706.2012.749166. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Morton JT, et al. Establishing microbial composition measurement standards with reference frames. Nat. Commun. 2019;10:2719. doi: 10.1038/s41467-019-10656-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Vandeputte D, et al. Quantitative microbiome profiling links gut community variation to microbial load. Nature. 2017;551:507–511. doi: 10.1038/nature24460. [DOI] [PubMed] [Google Scholar]
24.Neilson JW, et al. Significant impacts of increasing aridity on the arid soil microbiome. MSystems. 2017;2:e00195–16. doi: 10.1128/mSystems.00195-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Peddada S, Harris S, Zajd J, Harvey E. Oriogen: order restricted inference for ordered gene expression data. Bioinformatics. 2005;21:3933–3934. doi: 10.1093/bioinformatics/bti637. [DOI] [PubMed] [Google Scholar]
26.Botero LM, et al. Thermobaculum terrenum gen. nov., sp. nov.: a non-phototrophic gram-positive thermophile representing an environmental clone group related to the chloroflexi (green non-sulfur bacteria) and thermomicrobia. Archiv. Microbiol. 2004;181:269–277. doi: 10.1007/s00203-004-0647-7. [DOI] [PubMed] [Google Scholar]
27.Lau CH-F, van Engelen K, Gordon S, Renaud J, Topp E. Novel antibiotic resistance determinants from agricultural soil exposed to antibiotics widely used in human medicine and animal farming. Appl. Environmental Microbiol. 2017;83:e00989–17. doi: 10.1128/AEM.00989-17. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Oyejobi, G. K., Sule, W. F., Akinde, S. B., Khan, F. M. & Ogolla, F. Multidrug-resistant enteric bacteria in nNgeria and potential use of bacteriophages as biocontrol. Sci. Total Environ.824, 153842 (2022). [DOI] [PubMed]
29.Chouaia, B. et al. Genome sequence of Blastococcus saxobsidens DD2, a stone-inhabiting bacterium. J. Bacteriol.194, 2752–2753 (2012). [DOI] [PMC free article] [PubMed]
30.Li JL, et al. Antichlamydial dimeric indole derivatives from marine actinomycete Rubrobacter radiotolerans. Planta Medica. 2017;83:805–811. doi: 10.1055/s-0043-100382. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Chen H, et al. One-time nitrogen fertilization shifts switchgrass soil microbiomes within a context of larger spatial and temporal variation. PLoS ONE. 2019;14:e0211310. doi: 10.1371/journal.pone.0211310. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Riahi HS, Heidarieh P, Fatahi-Bafghi M. Genus Pseudonocardia: what we know about its biological properties, abilities and current application in biotechnology. J. Appl. Microbiol. 2022;132:890–906. doi: 10.1111/jam.15271. [DOI] [PubMed] [Google Scholar]
33.Liu X, et al. Using community analysis to explore bacterial indicators for disease suppression of tobacco bacterial wilt. Sci. Rep. 2016;6:36773. doi: 10.1038/srep36773. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Jiao J-Y, et al. Complete genome sequence of Jiangella gansuensis strain YIM 002T (DSM 44835T), the type species of the genus Jiangella and source of new antibiotic compounds. Standards Genom. Sci. 2017;12:21. doi: 10.1186/s40793-017-0226-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Fang X, et al. Gastrointestinal surgery for inflammatory bowel disease persistently lowers microbiome and metabolome diversity. Inflam. Bowel Dis. 2021;27:603–616. doi: 10.1093/ibd/izaa262. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.López-Almela I, et al. Bacteroides uniformis combined with fiber amplifies metabolic and immune benefits in obese mice. Gut Microbes. 2021;13:1–20. doi: 10.1080/19490976.2020.1865706. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Horvath TD, et al. Bacteroides ovatus colonization influences the abundance of intestinal short chain fatty acids and neurotransmitters. Iscience. 2022;25:104158. doi: 10.1016/j.isci.2022.104158. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Chang S-C, et al. A gut butyrate-producing bacterium Butyricicoccus pullicaecorum regulates short-chain fatty acid transporter and receptor to reduce the progression of 1,2-dimethylhydrazine-associated colorectal cancer. Oncol. Lett. 2020;20:327. doi: 10.3892/ol.2020.12190. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Peterson CT, et al. Short-chain fatty acids modulate healthy gut microbiota composition and functional potential. Curr. Microbiol. 2022;79:128. doi: 10.1007/s00284-022-02825-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Zhou Y, et al. F. prausnitzii and its supernatant increase SCFAs-producing bacteria to restore gut dysbiosis in tnbs-induced colitis. AMB Expr. 2021;11:33. doi: 10.1186/s13568-021-01197-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Nie, K. et al. Roseburia intestinalis: a beneficial gut organism from the discoveries in genus and species. Front. Cellul. Infect. Microbiol.11, 757718 (2021). [DOI] [PMC free article] [PubMed]
42.Rau M, et al. Fecal SCFAs and SCFA-producing bacteria in gut microbiome of human NAFLD as a putative link to systemic T-cell activation and advanced disease. United Euro. Gastroenterol. J. 2018;6:1496–1507. doi: 10.1177/2050640618804444. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Farnan L, Ivanova A, Peddada S. Constrained inference in biological sciences: linear mixed effects models under constraints. PLoS ONE. 2014;9:e84778. doi: 10.1371/journal.pone.0084778. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Jelsema, C. M. & Peddada, S. D. CLME: an R package for linear mixed effects models under inequality constraints. J. Stat. Softw.10.18637/jss.v075.i01 (2016). [DOI] [PMC free article] [PubMed]
45.Rao, C. R. Linear Statistical Inference and its Applications 2nd edn Wiley Series in Probability and Statistics (John Wiley & Sons, 1973).
46.Peddada, S. D. & Smith, T. Consistency of a class of variance estimators in linear models under heteroscedasticity. Sankhyā:Indian J. Stat. Ser. B 1–10 (1997).
47.McLachlan, G. & Krishnan, T. The EM Algorithm and Extensions 2nd edn Wiley Series in Probability and Statistics (John Wiley & Sons, 2007).
48.Dunnett, C. W. A multiple comparison procedure for comparing several treatments with a control. J. Am. Stat. Assoc.50, 1096–1121 (1955).
49.Dunnett, C. W. & Tamhane, A. C. Step-down multiple tests for comparing treatments with a control in unbalanced one-way layouts. Stat. Med.10, 939–947 (1991). [DOI] [PubMed]
50.Dunnett, C. W. & Tamhane, A. C. A step-up multiple test procedure. J. Am. Stat. Assoc.87, 162–170 (1992).
51.Silvapulle, M. J. & Sen, P. K. Constrained Statistical Inference: Order, Inequality, and Shape Constraints Wiley Series in Probability and Statistics (John Wiley & Sons, 2011).
52.Fu, A., Narasimhan, B. & Boyd, S. CVXR: An R package for disciplined convex optimization. J. Stat. Softw.10.18637/jss.v094.i14 (2020).
53.Williams, D. A. Some inference procedures for monotonically ordered normal means. Biometrika64, 9–14 (1977).
54.Patterson, H. D. & Thompson, R. Recovery of inter-block information when block sizes are unequal. Biometrika58, 545–554 (1971).
55.Harville, D. A. Bayesian inference for variance components using only error contrasts. Biometrika61, 383–385 (1974).
56.Lindstrom, M. J. & Bates, D. M. Newton—Raphson and EM algorithms for linear mixed-effects models for repeated measures data. J. Am. Stat. Assoc.83, 1014–1022 (1988).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information^{(369KB, pdf)}

Supplementary Methods and Table 1.

Reporting Summary^{(1.3MB, pdf)}

Peer Review File^{(1.1MB, pdf)}

Data Availability Statement

[CR1] 1.Martin BD, Witten D, Willis AD. Modeling microbial abundances and dysbiosis with beta-binomial regression. Annals. Appl. Stats. 2020;14:94. doi: 10.1214/19-aoas1283. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR2] 2.Mandal S, et al. Analysis of composition of microbiomes: a novel method for studying microbial composition. Microbial Ecol. Health Dis. 2015;26:27663. doi: 10.3402/mehd.v26.27663. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR3] 3.Lin H, Peddada SD. Analysis of compositions of microbiomes with bias correction. Nat. Commun. 2020;11:3514. doi: 10.1038/s41467-020-17041-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.Zhou H, He K, Chen J, Zhang X. LinDA: linear models for differential abundance analysis of microbiome compositional data. Genome Biol. 2022;23:95. doi: 10.1186/s13059-022-02655-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR5] 5.Hu Y, Satten GA, Hu Y-J. Locom: a logistic regression model for testing differential abundance in compositional microbiome data with false discovery rate control. Proc. Natl Acad. Sci. USA. 2022;119:e2122788119. doi: 10.1073/pnas.2122788119. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR6] 6.Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodol.) 1995;57:289–300. [Google Scholar]

[CR7] 7.Peddada SD, et al. Gene selection and clustering for time-course and dose–response microarray experiments using order-restricted inference. Bioinformatics. 2003;19:834–841. doi: 10.1093/bioinformatics/btg093. [DOI] [PubMed] [Google Scholar]

[CR8] 8.Guo W, Sarkar SK, Peddada SD. Controlling false discoveries in multidimensional directional decisions, with applications to gene expression data on ordered categories. Biometrics. 2010;66:485–492. doi: 10.1111/j.1541-0420.2009.01292.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Grandhi A, Guo W, Peddada SD. A multiple testing procedure for multi-dimensional pairwise comparisons with application to gene expression studies. BMC Bioinform. 2016;17:104. doi: 10.1186/s12859-016-0937-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Gohir W, et al. Pregnancy-related changes in the maternal gut microbiota are dependent upon the mother’s periconceptional diet. Gut Microbes. 2015;6:310–320. doi: 10.1080/19490976.2015.1086056. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Wu H-J, Wu E. The role of gut microbiota in immune homeostasis and autoimmunity. Gut Microbes. 2012;3:4–14. doi: 10.4161/gmic.19320. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Koren O, et al. Host remodeling of the gut microbiome and metabolic changes during pregnancy. Cell. 2012;150:470–480. doi: 10.1016/j.cell.2012.07.008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Kaul A, Mandal S, Davidov O, Peddada SD. Analysis of microbiome data in the presence of excess zeros. Front. Microbiol. 2017;8:2114. doi: 10.3389/fmicb.2017.02114. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.McLaren MR, Willis AD, Callahan BJ. Consistent and correctable bias in metagenomic sequencing experiments. eLife. 2019;8:e46923. doi: 10.7554/eLife.46923. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl Acad. Sci. USA. 2001;98:5116–5121. doi: 10.1073/pnas.091062498. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Costea PI, Zeller G, Sunagawa S, Bork P. A fair comparison. Nat. Methods. 2014;11:359–359. doi: 10.1038/nmeth.2897. [DOI] [PubMed] [Google Scholar]

[CR17] 17.Paulson JN, Bravo HC, Pop M. Reply to: ‘a fair comparison’. Nat. Methods. 2014;11:359–360. doi: 10.1038/nmeth.2898. [DOI] [PubMed] [Google Scholar]

[CR18] 18.Hu Y-J, Satten GA. Testing hypotheses about the microbiome using the linear decomposition model (LDM) Bioinformatics. 2020;36:4106–4115. doi: 10.1093/bioinformatics/btaa260. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Charlson ES, et al. Disordered microbial communities in the upper respiratory tract of cigarette smokers. PLoS ONE. 2010;5:e15216. doi: 10.1371/journal.pone.0015216. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Holm, S. A simple sequentially rejective multiple test procedure. Scandi. J. Stat.6, 65–70 (1979).

[CR21] 21.Lim C, Sen PK, Peddada SD. Robust analysis of high throughput screening (HTS) assay data. Technometrics. 2013;55:150–160. doi: 10.1080/00401706.2012.749166. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Morton JT, et al. Establishing microbial composition measurement standards with reference frames. Nat. Commun. 2019;10:2719. doi: 10.1038/s41467-019-10656-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Vandeputte D, et al. Quantitative microbiome profiling links gut community variation to microbial load. Nature. 2017;551:507–511. doi: 10.1038/nature24460. [DOI] [PubMed] [Google Scholar]

[CR24] 24.Neilson JW, et al. Significant impacts of increasing aridity on the arid soil microbiome. MSystems. 2017;2:e00195–16. doi: 10.1128/mSystems.00195-16. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Peddada S, Harris S, Zajd J, Harvey E. Oriogen: order restricted inference for ordered gene expression data. Bioinformatics. 2005;21:3933–3934. doi: 10.1093/bioinformatics/bti637. [DOI] [PubMed] [Google Scholar]

[CR26] 26.Botero LM, et al. Thermobaculum terrenum gen. nov., sp. nov.: a non-phototrophic gram-positive thermophile representing an environmental clone group related to the chloroflexi (green non-sulfur bacteria) and thermomicrobia. Archiv. Microbiol. 2004;181:269–277. doi: 10.1007/s00203-004-0647-7. [DOI] [PubMed] [Google Scholar]

[CR27] 27.Lau CH-F, van Engelen K, Gordon S, Renaud J, Topp E. Novel antibiotic resistance determinants from agricultural soil exposed to antibiotics widely used in human medicine and animal farming. Appl. Environmental Microbiol. 2017;83:e00989–17. doi: 10.1128/AEM.00989-17. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Oyejobi, G. K., Sule, W. F., Akinde, S. B., Khan, F. M. & Ogolla, F. Multidrug-resistant enteric bacteria in nNgeria and potential use of bacteriophages as biocontrol. Sci. Total Environ.824, 153842 (2022). [DOI] [PubMed]

[CR29] 29.Chouaia, B. et al. Genome sequence of Blastococcus saxobsidens DD2, a stone-inhabiting bacterium. J. Bacteriol.194, 2752–2753 (2012). [DOI] [PMC free article] [PubMed]

[CR30] 30.Li JL, et al. Antichlamydial dimeric indole derivatives from marine actinomycete Rubrobacter radiotolerans. Planta Medica. 2017;83:805–811. doi: 10.1055/s-0043-100382. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Chen H, et al. One-time nitrogen fertilization shifts switchgrass soil microbiomes within a context of larger spatial and temporal variation. PLoS ONE. 2019;14:e0211310. doi: 10.1371/journal.pone.0211310. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] 32.Riahi HS, Heidarieh P, Fatahi-Bafghi M. Genus Pseudonocardia: what we know about its biological properties, abilities and current application in biotechnology. J. Appl. Microbiol. 2022;132:890–906. doi: 10.1111/jam.15271. [DOI] [PubMed] [Google Scholar]

[CR33] 33.Liu X, et al. Using community analysis to explore bacterial indicators for disease suppression of tobacco bacterial wilt. Sci. Rep. 2016;6:36773. doi: 10.1038/srep36773. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] 34.Jiao J-Y, et al. Complete genome sequence of Jiangella gansuensis strain YIM 002T (DSM 44835T), the type species of the genus Jiangella and source of new antibiotic compounds. Standards Genom. Sci. 2017;12:21. doi: 10.1186/s40793-017-0226-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Fang X, et al. Gastrointestinal surgery for inflammatory bowel disease persistently lowers microbiome and metabolome diversity. Inflam. Bowel Dis. 2021;27:603–616. doi: 10.1093/ibd/izaa262. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.López-Almela I, et al. Bacteroides uniformis combined with fiber amplifies metabolic and immune benefits in obese mice. Gut Microbes. 2021;13:1–20. doi: 10.1080/19490976.2020.1865706. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Horvath TD, et al. Bacteroides ovatus colonization influences the abundance of intestinal short chain fatty acids and neurotransmitters. Iscience. 2022;25:104158. doi: 10.1016/j.isci.2022.104158. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR38] 38.Chang S-C, et al. A gut butyrate-producing bacterium Butyricicoccus pullicaecorum regulates short-chain fatty acid transporter and receptor to reduce the progression of 1,2-dimethylhydrazine-associated colorectal cancer. Oncol. Lett. 2020;20:327. doi: 10.3892/ol.2020.12190. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR39] 39.Peterson CT, et al. Short-chain fatty acids modulate healthy gut microbiota composition and functional potential. Curr. Microbiol. 2022;79:128. doi: 10.1007/s00284-022-02825-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR40] 40.Zhou Y, et al. F. prausnitzii and its supernatant increase SCFAs-producing bacteria to restore gut dysbiosis in tnbs-induced colitis. AMB Expr. 2021;11:33. doi: 10.1186/s13568-021-01197-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR41] 41.Nie, K. et al. Roseburia intestinalis: a beneficial gut organism from the discoveries in genus and species. Front. Cellul. Infect. Microbiol.11, 757718 (2021). [DOI] [PMC free article] [PubMed]

[CR42] 42.Rau M, et al. Fecal SCFAs and SCFA-producing bacteria in gut microbiome of human NAFLD as a putative link to systemic T-cell activation and advanced disease. United Euro. Gastroenterol. J. 2018;6:1496–1507. doi: 10.1177/2050640618804444. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR43] 43.Farnan L, Ivanova A, Peddada S. Constrained inference in biological sciences: linear mixed effects models under constraints. PLoS ONE. 2014;9:e84778. doi: 10.1371/journal.pone.0084778. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR44] 44.Jelsema, C. M. & Peddada, S. D. CLME: an R package for linear mixed effects models under inequality constraints. J. Stat. Softw.10.18637/jss.v075.i01 (2016). [DOI] [PMC free article] [PubMed]

[CR45] 45.Rao, C. R. Linear Statistical Inference and its Applications 2nd edn Wiley Series in Probability and Statistics (John Wiley & Sons, 1973).

[CR46] 46.Peddada, S. D. & Smith, T. Consistency of a class of variance estimators in linear models under heteroscedasticity. Sankhyā:Indian J. Stat. Ser. B 1–10 (1997).

[CR47] 47.McLachlan, G. & Krishnan, T. The EM Algorithm and Extensions 2nd edn Wiley Series in Probability and Statistics (John Wiley & Sons, 2007).

[CR48] 48.Dunnett, C. W. A multiple comparison procedure for comparing several treatments with a control. J. Am. Stat. Assoc.50, 1096–1121 (1955).

[CR49] 49.Dunnett, C. W. & Tamhane, A. C. Step-down multiple tests for comparing treatments with a control in unbalanced one-way layouts. Stat. Med.10, 939–947 (1991). [DOI] [PubMed]

[CR50] 50.Dunnett, C. W. & Tamhane, A. C. A step-up multiple test procedure. J. Am. Stat. Assoc.87, 162–170 (1992).

[CR51] 51.Silvapulle, M. J. & Sen, P. K. Constrained Statistical Inference: Order, Inequality, and Shape Constraints Wiley Series in Probability and Statistics (John Wiley & Sons, 2011).

[CR52] 52.Fu, A., Narasimhan, B. & Boyd, S. CVXR: An R package for disciplined convex optimization. J. Stat. Softw.10.18637/jss.v094.i14 (2020).

[CR53] 53.Williams, D. A. Some inference procedures for monotonically ordered normal means. Biometrika64, 9–14 (1977).

[CR54] 54.Patterson, H. D. & Thompson, R. Recovery of inter-block information when block sizes are unequal. Biometrika58, 545–554 (1971).

[CR55] 55.Harville, D. A. Bayesian inference for variance components using only error contrasts. Biometrika61, 383–385 (1974).

[CR56] 56.Lindstrom, M. J. & Bates, D. M. Newton—Raphson and EM algorithms for linear mixed-effects models for repeated measures data. J. Am. Stat. Assoc.83, 1014–1022 (1988).

PERMALINK

Multigroup analysis of compositions of microbiomes with covariate adjustments and repeated measures

Huang Lin

Shyamal Das Peddada

Abstract

Main

Results

Simulations: settings

Extended Data Fig. 1. Illustration of batch effects in simulation studies where sampling fractions were programmed to correlate highly with the exposure of interest.

Simulations: continuous and binary exposures

Fig. 1. FDR and power comparisons for continuous and binary exposures.

Extended Data Fig. 2. FDR Adjusted Power (FAP) among various DA methods.

Simulations: multiple groups

Multiple pairwise comparisons against a reference group

Fig. 2. FDR (mdFDR) and power comparisons for multiple exposure groups.

Multiple pairwise comparisons

Pattern analysis

Simulations: correlated samples

Fig. 3. mdFDR and power comparisons for correlated samples.

Additional simulation studies

Extended Data Fig. 3. Evaluations of FDR (mdFDR) and power in identifying DA taxa in (a) continuous or (b) binary exposure.

Extended Data Fig. 5. Evaluations of FDR (mdFDR) and power in identifying DA taxa in (a) a random intercept model, and (b) a random coefficients model.

Soil microbiome and aridity

Fig. 4. DA analysis of desert soil microbial genera with increasing aridity.

Gut microbial composition of patients with IBD

Fig. 5. Heatmap of ANCOM-BC2 (no filter) pairwise analysis evaluating the impact of surgical resection on microbial species.

Discussion

Methods

Notation

Table 1.

Extended Data Fig. 6.

ANCOM-BC2 for fixed-effects models

Model assumptions

Assumption 1

Assumption 2

Assumption 3

Regression framework

Lemma 1

Sample-specific biases estimation

Assumption 4

Remark 1

Remark 2

Multigroup comparison

Example 1

Definition 1

Example 2

Test against a specific group

Pattern analysis

Example 3

Example 4

Example 5

Example 6

Definition 2

Definition 3

Definition 4

ANCOM-BC2 for mixed-effects models

Strategies implemented in ANCOM-BC2 to handle zeros

Reporting summary

Online content

Supplementary information

Acknowledgements

Extended data

Extended Data Fig. 4. Evaluations of FDR (mdFDR) and power in identifying DA taxa in (a) multiple pairwise comparisons against a reference group, (b) multiple pairwise comparisons, and (c) pattern analysis.

Author contributions

Peer review

Peer review information

Data availability

Code availability

Competing interests

Footnotes

Extended data

Supplementary information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles