Abstract
Motivation
Microbial communities have been shown to be associated with many complex diseases, such as cancers and cardiovascular diseases. The identification of differentially abundant taxa is clinically important. It can help understand the pathology of complex diseases, and potentially provide preventive and therapeutic strategies. Appropriate differential analyses for microbiome data are challenging due to its unique data characteristics including compositional constraint, excessive zeros and high dimensionality. Most existing approaches either ignore these data characteristics or only account for the compositional constraint by using log-ratio transformations with zero observations replaced by a pseudocount. However, there is no consensus on how to choose a pseudocount. More importantly, ignoring the characteristic of excessive zeros may result in poorly powered analyses and therefore yield misleading findings.
Results
We develop a novel microbiome-based direction-assisted test for the detection of overall difference in microbial relative abundances between two health conditions, which simultaneously incorporates the characteristics of relative abundance data. The proposed test (i) divides the taxa into two clusters by the directions of mean differences of relative abundances and then combines them at cluster level, in light of the compositional characteristic; and (ii) contains a burden type test, which collapses multiple taxa into a single one to account for excessive zeros. Moreover, the proposed test is an adaptive procedure, which can accommodate high-dimensional settings and yield high power against various alternative hypotheses. We perform extensive simulation studies across a wide range of scenarios to evaluate the proposed test and show its substantial power gain over some existing tests. The superiority of the proposed approach is further demonstrated with real datasets from two microbiome studies.
Availability and implementation
An R package for MiDAT is available at https://github.com/zhangwei0125/MiDAT.
Supplementary information
Supplementary data are available at Bioinformatics online.
1 Introduction
The human microbiome, which is a microbial community of microorganisms (such as bacteria, fungi and viruses) inhabiting in or on the human body, has been proven to be an important contributor to many complex diseases including Crohn’s disease (Joossens et al., 2011), type 2 diabetes (Qin et al., 2012), hypertension (Wilck et al., 2017) and metastatic melanoma (Matson et al., 2018). A major goal of human microbiome studies is to identify microbial taxa that are differentially abundant across biological or clinical conditions, such as disease statuses. The identified taxa can help understand the pathology of diseases and potentially provide help in developing preventive and therapeutic strategies (Virgin and Todd, 2011).
The development of next-generation sequencing (NGS) techniques has enabled culture-free profiling of microbial communities via direct DNA sequencing without the need of laborious cultivation (Gill et al., 2006). Two NGS approaches are commonly used to quantify microbial abundances: 16S rRNA gene sequencing and shotgun metagenomic sequencing. In both approaches, sequencing reads are clustered by similarity into operational taxonomic units (OTUs), which are considered as surrogates for microbial taxa. The resulting data are called OTU tables where each cell represents the read count observed for an OTU in a sample. Due to possibly varying sequencing depths (i.e. total read counts for all OTUs) across samples, normalization procedures which convert raw read counts into relative abundances are routinely used to ensure meaningful comparison of data from different samples in downstream analyses. The relative abundances of samples are vectors of non-negative values summing up to a constant (usually one).
Statistical analyses for microbial relative abundances are challenging due to their complex data characteristics. First, the relative abundances are compositional, which reside in a simplex rather than the Euclidean space. Directly applying statistical methods intended for unconstrained data to compositional data is inappropriate and may result in misleading inferences (Airchison, 2003). Second, the relative abundance data are highly sparse, i.e. containing a disproportionally large number of zeros due to either the absence of OTUs or undersampling of microbial communities. For example, nearly 80% of the data entries in global gut data of Yatsunenko et al. (2012) at the genera level are zero. The excessive zeros can influence differential analyses of relative abundances and therefore need to be handled with caution. Third, the number of OTUs can be much larger than the sample size, depending on the taxonomic rank (kingdom, phylum, class, order, family, genus and species) that the data are organized into. This limits the use of conventional two-sample tests for low-dimensional settings, such as the Hotelling’s T2 test.
Many statistical approaches have been proposed in the literature to compare the relative abundances between two or more conditions (e.g. diseased and non-diseased), ranging from simple variants of the t-test, such as Metastats (White et al., 2009), to more sophisticated statistical tests, such as metagenomeSeq, based on zero-inflated Gaussian models (Paulson et al., 2013) and ANCOM based on log-ratio of abundance (Mandal et al., 2015), as well as some tests developed specifically for RNA-Seq data, such as DESeq2 (Love et al., 2014), edgeR (Robinson et al., 2010) and Voom (Law et al., 2014). Most of these methods are univariate tests, which examine whether a single OTU is differentially abundant between two conditions. In order to adjust for multiple individual P-values, multiple testing correction procedures (such as the Bonferroni and Benjamini–Hochberg procedure) are often employed (Xiao et al., 2017). However, the high-dimensional characteristic of microbiome data could greatly increase the burden of multiple testing correction, thereby reducing the power of detecting differentially abundant OTUs.
To address this limitation, multivariate tests, which simultaneously compare relative abundances of multiple OTUs are advocated (Banerjee et al., 2019; Cao et al., 2018). Most existing multivariate tests take no account of the characteristics of relative abundance data or only account for the compositional characteristic; few tests account for excessive zeros, let alone all the characteristics together. In the literature, a common strategy to relax the compositional constraint is using log-ratio transformations (Wang et al., 2020). Since the log transformation cannot be directly applied to zero, a small value termed as pseudocount is often employed to replace the zeros. However, there is no consensus on how to choose a pseudocount, because different pseudocount values can lead to very different results. Moreover, substituting a pseudocount ignores the information of truly absent OTUs, which may yield unreliable study findings.
It is worth pointing out that, although microbial abundance data share the same count data characteristic with RNA expression data, the methods of differential analysis for RNA expression data cannot be directly applied to taxa abundances due to different assumptions. The standard RNA-Seq differential expression analysis generally assumes that the majority of genes are not changed, while it is common that both absolute abundances and relative abundances are subject to drastic changes in differential abundance analysis. For example, diet and other influential factors including host genetic, birth mode and antibiotics can cause rapid changes of microbial abundances in humans (David et al., 2014; Hasan and Yang, 2019). In addition, without knowledge of total microbial load, changes in relative abundance do not reveal changes in absolute abundance of OTUs.
In this article, we propose a multivariate adaptive test for the detection of overall difference in microbial relative abundances between two conditions by taking all characteristics of relative abundance data into account. Due to the compositional characteristic, the mean differences of relative abundance in the taxa between two conditions are always summed up to zero. In light of this, the proposed test divides the mean differences of relative abundances of the taxa into two clusters by their directions (positive and negative) and combines the mean differences with the same direction. To account for excessive zeros, the proposed test includes a burden type test, which collapses multiple OTUs into a single one as a special case. In addition, the proposed test is an adaptive procedure, which can accommodate high-dimensional settings and yield high power against various alternative hypotheses. Extensive simulation studies show the superior performance and high power of the proposed test across a wide range of scenarios. Applications to two real microbiome datasets further demonstrate its effectiveness.
2 Materials and methods
Consider two groups of individuals, each with only one of two health conditions. Assume that microbiome data, representing a microbial community profile for p OTUs, have been collected from n1 individuals in Group 1 and n2 individuals in Group 2. Write . Let be the observed data matrix for group k, where the superscript T stands for the transpose of a matrix or vector, is the vector of relative abundances of p OTUs for individual i, which lies in the -dimensional simplex , and k = 1, 2. Hence, are compositional. Assume that are independent samples from a p-dimensional distribution with mean for k = 1, 2. Our interest is to test whether the mean relative abundances of the OTUs differ between the two groups, i.e.
| (1) |
Rejection of the null hypothesis indicates that group difference occurs in at least one OTU; otherwise, for all OTUs, the two groups have equal mean relative abundances.
2.1 Direction-assisted two-sample tests
A natural test statistic for the hypothesis (1) is based on the sample mean differences , where is the sample mean of relative abundance of OTU j for group k, , and k = 1, 2. Write . Due to the compositional nature, each row of sums up to 1, i.e. . It follows that the sample mean differences of the OTUs are always summed up to zero, i.e. . This implies that there are always positive and negative non-zero components (signals) in the true mean difference and the sum of positive components is always equal to the absolute value of that of negative components. To circumvent the cancelation effect of positive and negative components, we consider combining the non-zero components of by their signs.
The sum-of-squares type test (Bai and Saranadasa, 1996) and maximum-type (MAX) test (Cai et al., 2014; Liu et al., 2010) are two types of well-known two-sample tests for high-dimensional data in the Euclidean space. The former is constructed based on the L2-norm of the sample mean differences, i.e. , and is usually powerful under dense alternative hypotheses where there are a high proportion (HP) of non-zero components in . In contrast, the latter is based on the -norm of the sample mean differences, i.e. , and performs well under sparse alternative hypotheses where has few non-zero components. These two types of tests represent two extremes, with one type using all of the components and the other type only one component of as evidence against the null hypothesis. When the signals are neither too dense nor too sparse, they may suffer from substantial power loss.
To ensure robustness against various strengths of signals, we propose a class of microbiome-based direction-assisted tests (MiDAT), which combine the sample mean differences in OTUs with the same direction. The proposed tests are given by
| (2) |
where λ is a positive integer, , and is the sample variance of relative abundance of OTU j for group k, , k = 1, 2. We scale the components of in and to account for possibly varying variances. For both and , as λ increases, the larger components of would be of more importance, while the smaller components become negligible. An extreme case is that, as ,
which only use the largest and smallest component of , respectively. It is evident that the MAX test is a special case of and with , irrespective of the direction of sample mean difference and potentially varying variances of the OTUs.
The proposed tests also include the sum-of-squares type test described above as a special case. When λ = 2, and become and , which are the L2-norm of the scaled positive and negative sample mean differences, respectively. Moreover, the proposed tests can be reduced to the burden type tests, which are widely used in genetic association studies for rare variants (Madsen and Browning, 2009; Price et al., 2010). By collapsing multiple rare variants into a single variant, the burden type tests are much more powerful than the sum-of-squares type and MAX tests when the signals are dense and almost in the same direction. When λ = 1, , which collapses the OTUs with positive sample mean differences into a single OTU; in contrast, combines the OTUs with negative sample mean differences into one. Hence and with λ = 1 are burden type tests, which can handle excessive zeros in the relative abundance data.
2.2 An adaptive direction-assisted two-sample test
As noted, although and are valid for arbitrary λ, a good choice of λ can lead to improved testing power. However, the optimal λ depends on the true pattern of signals (including sparsity and strength) in the mean difference , which is usually unknown a priori. Therefore, we develop an adaptive direction-assisted test (aMiDAT) which extends MiDAT to simultaneously consider multiple choices for λ. The aMiDAT is given by
where and are the P-values of and , respectively, and Θ is a set of candidate values for λ. In practice, one needs to determine the set Θ. To include the burden type test, sum-of-squares type test and MAX test, which have been shown empirically to perform well under some scenarios, we suggest including in Θ. Moreover, given that the test with could be more powerful than the sum-of-squares type and MAX test when only weakly dense signals are present, we suggest also including some values >2 in Θ. The results of simulation studies and real data analyses in the following show that often suffices in most scenarios.
Here, we simply apply the minimum P-value approach, which has been widely used in the literature to combine the P-values of multiple MiDATs. Certainly, other combination approaches can also be used, such as the Fisher combination approach (Fisher, 1932), Berk–Jones test (Berk and Jones, 1979), group-combined test (Hu et al., 2016) and Cauchy combination approach (Liu and Xie, 2020). The determination of the optimal combination approach is an important and complicated statistical question, which is not the focus of this article. To calculate the P-value of , one usually needs to use a two-layer permutation procedure with the inner layer for the calculation of the P-values of and , and the outer layer for the adjustment of multiple testing over multiple values of λ. Such a type of permutation procedure can be computationally intensive when the number of OTUs is relatively large. To address this issue, by following the idea of Ge et al. (2003), we propose a one-layer permutation procedure to calculate the significance level of :
Based on the observed data, calculate the statistics and for each candidate value of λ in Θ.
Permutate the relative abundance data of the two groups to generate B permutated datasets. Based on the bth permutated dataset, calculate the statistics and for each candidate value of λ in Θ, denoted by and .
Calculate the P-values of and as and , and the corresponding adaptive MiDAT statistic as .
For the bth permutated dataset, , calculate the P-values of and as and , and the corresponding adaptive MiDAT statistic as .
The final P-value of the aMiDAT is obtained as .
3 Simulations
We conduct simulation studies to investigate the performance of the proposed test (aMiDAT) by comparing its type I error rates and powers with that of the permutational multivariate analysis of variance (PERMANOVA) test (Anderson, 2001), the microbiome regression-based kernel association (MiRKAT) test (Zhao et al., 2015), the microbiome-based sum of powered score (MiSPU) test (Wu et al., 2016) and the MAX test (Cao et al., 2018). The PERMANOVA, MiRKAT and MiSPU tests were originally developed for microbiome association studies. Here, we use them in the framework of microbiome differential abundance analysis by setting the outcomes of the two conditional groups to be 0 and 1, respectively.
3.1 Simulation settings
We generate microbiome relative abundance data from two distribution models: two-part model and Dirichlet-multinomial (DM) model. In the two-part model, we first generate the abundances of OTUs by , where U is the continuous part, which characterizes the magnitude of positive observations and V is the binary part, which governs the occurrence of non-zero observations. The continuous part, , is independent and identically sampled from the multivariate lognormal distribution . For the binary part, we assume that the observations for any two variables are independent. Specifically, the observations for the jth variable are simulated independently from the Bernoulli distribution with probability rj, i.e. . The binary observations of the two groups are simulated using the same distribution. Let . We then obtain the observations for relative abundance as , k = 1, 2. The resulted relative abundances for each individual are non-negative and summed up to 1, which satisfy the compositional constraint. Following the simulation settings of Cao et al. (2018), we consider banded covariance structure for Σ, i.e. , where A has non-zero entries ajj = 1 and , and D is a diagonal matrix with each entry randomly sampled from the uniform distribution . The components of are drawn independently from the uniform distribution . Under the null hypothesis, we set . Under the alternative hypothesis, we choose a subset and set for and for . Three levels of signal density are considered for the size of S by setting . Moreover, we consider two simulation configurations that differ in the proportion of non-zero observations: low proportion (LP) and high proportion (HP) configuration. The probabilities are independently drawn from under the LP configuration and from under the HP configuration. We vary the values of the signal size δ to assess the statistical powers of the tests across all simulation scenarios.
In addition, we also conduct simulation studies by mimicking the settings of a real microbiome dataset. For the DM model, we first generate microbiome count data from a DM distribution and then calculate the relative abundances. The parameter values of the DM distribution are specified from a real microbiome dataset concerning upper-respiratory-tract (Charlson et al., 2010), which consists of 856 OTUs measured on 60 individuals. Specifically, for a given p, we first randomly select p OTUs from the real microbiome dataset and estimate the concentration parameters via maximum likelihood based on the data for the selected OTUs. Denote the resulted concentration parameter estimator by . Then, for each individual in the kth group, we generate the observed OTU counts from the DM distribution with the parameters and set the total count per individual to be 1000, where is the estimated concentration parameters for the kth group, k = 1, 2. Under the null hypothesis, we set . Under the alternative hypothesis, the concentration parameters are specified using a similar setting to the two-part model, i.e. for , and for , where and is chosen from . Denote the observed OTU count data for the ith individual in the kth group by , k = 1, 2. The OTU relative abundance observations are thus obtained as , and k = 1, 2.
In all simulation scenarios, we use the Euclidean distance for PERMANOVA. For MiRKAT, we consider the Bray–Curtis kernel and Euclidean distance-based kernel, and use the one with the highest power. The parameter λ in the proposed aMiDAT is chosen from the set . As commented in Wu et al. (2016), MiSPU with γ = 1 often has low powers, so we set the range of values for γ to be . This is because positive and negative score components are always present in MiSPU for microbiome compositional data and they cancel each other out in the summing stage. Since not all the tests have asymptotic P-values, we adopt permutation procedures to calculate the P-values of the tests for a fair comparison. The PERMANOVA, MiRKAT and MiSPU tests are implemented using the R software packages vegan (v2.5-7), MiRKAT (v1.1.2) and MiSPU (v1.0), respectively. It takes ∼1.86 s to compute the P-value of the proposed test for n = 100, p = 100 and B = 1000 using the one-layer permutation procedure at an Inter Core (TM) i9-9900 CPU @3.10 GHz PC.
Throughout the simulations, we fix the sample size at and nominal significance level at 0.05. The total number p of OTUs is chosen from . The type I error rates of the tests are examined based on 5000 simulations and the powers are evaluated based on 1000 simulations. For each simulation, 1000 permutations are used to calculate the P-values of the tests. We present the results for p = 50, 100 here; the results for p = 200, 500 are similar and relegated to the Supplementary Material.
3.2 Simulation results
Table 1 summarizes the empirical type I error rates of the tests under two distribution models. Comparing the tests at the significance level of 0.05, one can see from the table that all the tests control the type I errors adequately across all simulation scenarios. For example, when and p = 500, the type I error rates of PERMANOVA, MiRKAT, MAX, MiSPU and aMiDAT are 0.054, 0.050, 0.052, 0.054 and 0.052, respectively.
Table 1.
Empirical type I error rates of five tests under the two-part models with LP and HP of non-zeros and the DM models
| Model | Scheme | p | PERMA NOVA | MiRKAT | MAX | MiSPU | aMiDAT |
|---|---|---|---|---|---|---|---|
| Two-part | LP | 50 | 0.054 | 0.053 | 0.050 | 0.054 | 0.050 |
| 100 | 0.049 | 0.046 | 0.052 | 0.048 | 0.052 | ||
| 200 | 0.046 | 0.047 | 0.051 | 0.047 | 0.053 | ||
| 500 | 0.051 | 0.053 | 0.052 | 0.058 | 0.051 | ||
| HP | 50 | 0.057 | 0.053 | 0.056 | 0.056 | 0.057 | |
| 100 | 0.047 | 0.045 | 0.055 | 0.051 | 0.048 | ||
| 200 | 0.047 | 0.045 | 0.050 | 0.051 | 0.053 | ||
| 500 | 0.050 | 0.052 | 0.052 | 0.052 | 0.053 | ||
|
| |||||||
| DM | 50 | 0.053 | 0.051 | 0.049 | 0.049 | 0.049 | |
| 100 | 0.046 | 0.047 | 0.048 | 0.049 | 0.051 | ||
| 200 | 0.050 | 0.048 | 0.052 | 0.052 | 0.053 | ||
| 500 | 0.054 | 0.050 | 0.052 | 0.054 | 0.052 | ||
Note: The nominal significance level is 0.05. p is the total number of OTUs.
Figures 1 and 2 display the empirical powers of the tests under the two-part model with LP and HP configuration for various levels of signal density. The horizontal axis in the figures is the signal size δ. For all the tests under comparison, the powers are increasing with the signal size. From the figures, it can be seen that the proposed test always yields the highest power among all five tests, regardless of the levels of signal density. Such superiority becomes more evident as the level of signal density () increases. It indicates that aMiDAT can largely boost the power by aggregating the signals with the same direction, especially when the signal is relatively dense. When the signal is sparse () and the dimension is relatively low (p = 50), MAX has comparable powers to the proposed test. However, the difference between them tends to be larger as the dimension or the level of signal density increases. This is because MAX, which only utilizes the strongest signal is beneficial only when the number of signals is extremely small. As the level of signal density increases, the performance of MiRKAT improves with smaller difference as compared to aMiDAT. When , it becomes the second most powerful test among all the tests under comparison. The MiSPU test appears to underperform under various scenarios, probably because it combines the mean differences of relative abundance without taking account of their directions and the mean differences in different directions cancel each other out. Moreover, by comparing the results for the LP and HP configurations, we find that the gain in power of the proposed test tends to be more substantial for the data with LP of non-zeros. For p = 200 and p = 500, the trend is similar; more details are presented in Supplementary Figures S1 and S2.
Fig. 1.

Empirical powers of the tests under the two-part models with LPs of non-zeros for p = 50 (first row) and p = 100 (second row). The left, middle and right panels are for the signal density levels of 10%, 30% and 50%, respectively
Fig. 2.

Empirical powers of the tests under the two-part models with HPs of non-zeros for p = 50 (first row) and p = 100 (second row). The left, middle and right panels are for the signal density levels of 10%, 30% and 50%, respectively
Figure 3 presents the empirical powers of the tests as functions of the signal size η under the DM model for selected values of signal sparsity. It shows that the powers of the tests are monotonically increasing with the signal size. As the level of signal density gets larger, the powers of the tests increase. For example, when p = 100 and , the powers of the proposed test for are 0.117, 0.367 and 0.802, respectively. Similarly, the proposed test outperforms the other four tests; moreover, such superiority becomes more evident as the level of signal density increases. Unlike aMiDAT, the other four tests underperform in most simulation settings. This may be because more zeros are generated under the DM model, bringing down the ratio of useful signals. The results for p = 200, 500 follow similar patterns and are shown in Supplementary Figure S3.
Fig. 3.

Empirical powers of the tests under the DM models for p = 50 (first row) and p = 100 (second row). The left, middle and right panels are for the signal density levels of 10%, 30% and 50%, respectively
4 Real data analyses
In this section, we apply the methods, PERMANOVA, MiRKAT, MAX, MiSPU and aMiDAT, to two real microbiome datasets to detect the overall difference in microbial relative abundances between two health conditions. Similar to the simulation studies, the PERMANOVA test with the Euclidean distance, and the MiRKAT test with the Bray–Curtis kernel and Euclidean distance-based kernel are considered. The parameter γ in MiSPU is chosen from . For aMiDAT, we set . The P-values of the tests are calculated based on 50 000 permutations.
4.1 Analysis of obesity microbiome data
Obesity, defined as body mass index (BMI) higher than 30 kg/, is a complex disease caused by a multifactorial etiology, including genetics, metabolism, excessive caloric intake, sedentary lifestyle, environment and behavior (The GBD 2015 Obesity Collaborators, 2017). Obesity is a significant risk factor for the increased morbidity and mortality from many chronic diseases, such as diabetes mellitus, cardiovascular disease, kidney disease and an array of musculoskeletal disorders. The prevalence of obesity is growing rapidly worldwide and reaching epidemic proportions, although extensive research efforts have been undertaken to curb it. Over the past decade, gut microbiome has emerged as an important contributor to the pathogenesis of obesity and its related metabolic disorders (Hartstra et al., 2015). Recently, Doumatey et al. (2020) investigated the role of gut microbiota in relation to metabolic disorders, such as obesity and type 2 diabetes in West Africans. In this study, 291 participants were enrolled from an urban center in Nigeria and demographic information was collected using standardized questionnaires and anthropometric, medical history. BMI was calculated as weight (kg) divided by the square of height (). For each participant, fecal samples were collected, from which DNA samples were extracted using the MoBioPowerMag Microbiome kit (Carlsbad, CA). The bacterial 16S rRNA gene was amplified utilizing fusion primers for the V4 region and the resulting amplicons were sequenced using the Illumina MiSeq platform. With the UPARSE pipeline, sequences were clustered at 97% similarity level into OTUs. After dropping the taxa that appear in <4 samples, a total of 1151 OTUs were obtained. These OTUs were further taxonomically classified up to 147 genera.
We are interested in whether the microbial compositions differ between lean and obese individuals at genus level. To this end, we divide 144 female participants into two groups: a lean group ( kg/) and an obese group ( kg/). Univariate comparison of the two groups is first conducted to provide some insights into each individual genus. Table 2 presents the scaled sample mean differences, , of the top six genera in microbial composition between the lean and obese groups. These genera are Brevibacillus, Olsenella, Marvinbryantia, Megamonas, Anaerococcus and Bacillus, which belong to the most abundant phyla Firmicutes and Actinobacteriota reported in Doumatey et al. (2020). It can be seen from Table 2 that the top mean differences of microbial compositions reside in different directions. We then apply PERMANOVA, MiRKAT, MAX, MiSPU and the proposed test to conduct the overall comparison of the genera between the lean and obese groups. The P-values of PERMANOVA, MiRKAT, MAX and MiSPU are obtained as 0.689, 0.654, 0.073 and 0.811, respectively; in contrast, aMiDAT yields a P-value of 0.017. This indicates that, under the significance level of 0.05, our proposed test is able to detect a statistical difference between the lean and obese groups in at least one genus, while all other tests fail to do so. This result is consistent with the conclusion of Doumatey et al. (2020) that obesity is associated with compositional changes in gut microbiota.
Table 2.
Scaled sample mean differences, , of the top six genera in microbial composition between two health groups for the obesity and CRC microbiome data
| Obesity microbiome data | CRC microbiome data | ||
|---|---|---|---|
| Genus | Genus | ||
| Brevibacillus | −3.26 | Eisenbergiella | −3.40 |
| Olsenella | −2.35 | Lachnoclostridium | −3.34 |
| Marvinbryantia | −2.31 | Ruthenibacterium | −3.22 |
| Megamonas | 2.22 | Erysipelatoclostridium | −3.17 |
| Anaerococcus | −2.08 | Fusicatenibacter | 3.05 |
| Bacillus | 2.01 | Faecalibacterium | 2.84 |
4.2 Analysis of colorectal cancer microbiome data
Colorectal cancer (CRC), as the third most common cancer in the world, accounts for >1.9 million newly diagnosed cancer cases and >0.9 million cancer deaths in 2020 (Sung et al., 2021). The etiology of CRC is complex and consists of genetic, lifestyle and environmental factors. Despite massive efforts in whole-genome sequencing and genome-wide association studies for CRC, genetic factors are only responsible for a minority of CRC cases. Accumulating evidence indicates that environmental factors, such as smoking, weight gain, obesity and heavy alcohol consumption, play major roles in causing CRC (Coker et al., 2019). Among the environmental factors, microbial dysbiosis is a major risk for the initiation, progression and metastasis of CRC. Yu et al. (2017) examined the association of CRC with changes in the gut microbial composition using metagenomic sequencing of fecal microbiomes. This study enrolled 128 participants (75 patients with CRC and 53 control subjects) from China. For each participant, stool samples were collected and DNA extraction from stool samples was performed using Qiagen QIAamp DNA Stool Mini Kit (Qiagen). Metagenomic sequencing was performed using Illumina HiSeq 2000 platform and reads were mapped to the Integrated Microbial Genome (IMG) reference database (v400) to generate IMG species and genus profiles. The relative abundance data of OTUs are downloaded using the R package ‘curatedMetagenomicData’, which provides a Bioconductor and command-line interface to thousands of metagenomic profiles from the Human Microbiome Project and some other publicly available datasets (Pasolli et al., 2017). The data include a total of 575 OTUs, which can be further merged into 190 genera. The scaled sample mean differences of these genera are in different directions, as presented in Table 2.
We apply PERMANOVA, MiRKAT, MAX, MiSPU and aMiDAT to detect for the overall difference in microbial relative abundance between CRC cases and controls. The P-values of PERMANOVA, MiRKAT, MAX, MiSPU and the proposed test are 0.232, 0.0245, 0.022, 0.303 and 0.00014, respectively. At the significance level of 0.05, MiRKAT, MAX and aMiDAT are able to detect a significant overall difference in relative abundance between CRC cases and controls, with the strongest evidence provided by the proposed test. At the significance level of 0.001, however, only the proposed test could effectively detect the difference in relative abundance between the two groups, indicating that the proposed test is more powerful than the other tests under comparison. Such results suggest that changes in the gut microbial compositions are associated with CRC, which has been reported in many studies (Castellarin et al., 2012; Kostic et al., 2012; Yu et al., 2017).
5 Discussion
In this article, we develop an adaptive test for the overall detection of mean relative abundance differences between two conditions for multiple OTUs. The proposed test is effective in aggregating signals of mean relative abundance differences by incorporating their directions. The test accommodates the characteristics of microbiome data including compositional constraint and excessive zeros, and is highly powered under various alternative hypotheses in high-dimensional settings. Through extensive simulations, we show that the proposed test can substantially increase the powers over some existing tests under a wide range of scenarios, with the type I error rates well-controlled. Applications of the proposed test to two real microbiome datasets further demonstrate its ability to detect the overall group difference in microbiome relative abundance, while other competing tests fail to do so.
OTUs with equal mean difference of relative abundance between two groups may have very different magnitudes of signals due to possibly varying variances and therefore contribute differently to the power of the test. Accordingly, the proposed test uses the standardized sample mean differences as the signals of individual OTUs. We conduct an additional simulation study to explore the performance of the tests based on standardized and non-standardized sample mean differences, with detailed simulation settings and results presented in the Supplementary Material. The simulation results show that both tests can improve power over several competing approaches and the test based on standardized sample mean differences is often more powerful than that based on non-standardized sample mean differences.
Although the proposed method is developed for comparison of relative abundances between two groups, it can be easily extended to comparison among multiple groups. For example, one can first perform comparisons among all pairs of groups using the proposed test and then combine the pairwise comparisons using a multiple correction procedure, such as the Bonferroni and Benjamini–Hochberg procedure.
To avoid the offset of positive and negative mean differences of OTUs, our test divides the OTUs into two clusters by their directions. Besides direction, signal strength can be also utilized to divide the OTUs. Hu et al. (2016) showed that combining P-values with similar significance levels can be more powerful than that with varying significance levels. Rather than two clusters, one can divide the signals of OTUs into multiple clusters by their directions and strengths, and then integrate them at cluster level. For example, given a threshold , the signals of mean differences can be divided into four clusters, i.e. and . The choice of ξ depends on the pattern of true mean difference and we can simply set it to be , and , which are the 95%, 97.5% and 99.9% quantiles of the standard normal distribution, respectively. Furthermore, given that the OTUs with weak signals of mean difference may contribute little to the power, one can consider only including the OTUs with strong signals (say G1 and G4) in the test. When there are many weak signals of mean difference in OTUs, such a test is expected to achieve high power by substantially reducing the number of taxa. Further research is warranted to investigate the grouping mechanism of mean differences of OTUs on the performance of the proposed test.
An important issue that has not been considered here is the effect of genome sequence similarity, which plays a crucial role in metagenomics studies. Due to the presence of highly similar genome sequences, taxa quantification may be subject to substantial biases, especially for relatively low taxonomic levels, such as strain level. Ignoring such biases would lead to many false-positive detections in subsequent differential abundance analyses. This issue has been addressed by several researchers in different contexts. For example, Fischer et al. (2017) proposed a generalized model framework for resolution of shared read counts, which causes significant biases on strain level due to the presence of highly similar genome sequences, and integrates variance of abundance estimates to enable accurate detections of differential abundances of taxa in metagenomics. Penzlin et al. (2014) developed a similarity correction tool for identification and quantification results in metaproteomics. Xia et al. (2011) introduced a unified probabilistic framework, which incorporates reference sequence similarities to compute genome relative abundances of microbial communities based on the mixture model theory. Further research on the effect of genome sequence similarity on MiDAT and how to correct for such an effect in differential testing is needed.
Although the proposed test is powerful to detect statistically significant changes of OTUs, it is unable to pinpoint the biologically relevant changes. From a biological perspective, statistical significance is neither a sufficient nor necessary condition for biological relevance (Nakagawa and Cuthill, 2007; Parks and Beiko, 2010), but it can be used as a filter to remove unimportant OTUs. For some microbes in the gut, an increase (or decrease) by more than 10% in relative abundance can pinpoint biologically relevant changes, while for some other microbes, an increase by 1% may be sufficient. Sometimes the presence/absence of a species is of more importance than its quantitative changes, such as Helicobacter pylori. In this case, one can still use the proposed test to identify statistically significant species and pursue further evidence for their biological relevance. A possible solution is transforming raw P-values from a statistical hypothesis test to alternative measures with possibly superior interpretations (Storey and Tibshirani, 2003) or interactively filtering results through publications with sufficient information to infer biological relevance (Parks and Beiko, 2010). Biological interpretation of statistical results is an important question that requires further research.
Beyond the overall difference between two conditional groups, there is considerable interest in detecting the specific OTUs that are differentially abundant. The proposed test can be used to rank the OTUs by calculating their individual sample mean differences between two groups, which provides some insights into which OTU is more likely to be differential. An important feature of microbiome data is that OTUs are usually organized in a phylogenetic tree based on their evolutionary relationships. Incorporating the phylogenetic tree information among OTUs into microbiome differential analyses can potentially improve testing power. Further investigation is warranted along this line.
It is common to include potential confounders in real microbiome studies. While it is relatively easy to incorporate confounders in a parametric model, it is very challenging in general for a non-parametric test. We suggest the following strategy to extend the proposed test to adjust for confounders. When the confounders are categorical variables, one can employ the stratification strategy to adjust for confounding effects. Specifically, one first stratifies the subjects by the level of confounders and within each stratum compares relative abundances of the subjects in two conditional groups. An overall P-value can then be obtained by combining the P-values from different strata, using methods, such as the Fisher combination approach. When the confounders are continuous, it seems that regression model offers the best solution. A plausible approach is regressing the relative abundances of the subjects on the potential confounders for the two groups, respectively, and then applying the proposed test to the resulted residuals. For certain continuous variables, converting them into categorical variables may be more meaningful, e.g. patients’ disease severity index. In this case, one can use the method for categorical variables (i.e. stratification) to adjust for confounding effects. Further work is needed to examine the performance of these (or others) strategies.
Supplementary Material
Acknowledgements
The authors are grateful to the Editor, Associate Editor and three referees for their helpful constructive comments, which have helped to improve the quality and presentation of the article. The authors would like to thank Yolanda L. Jones, NIH Library, for manuscript editing assistance. Research of Aiyi Liu was supported by the Intramural Research Program of the Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health.
Funding
Research of W.Z. was partially supported by the National Natural Science Foundation of China Grants [grant numbers 12001522, 72091212].
Conflict of Interest: none declared.
Contributor Information
Wei Zhang, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China.
Aiyi Liu, Biostatistics and Bioinformatics Branch, Division of Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD 20817, USA.
Zhiwei Zhang, Biometric Research Program, Division of Cancer Treatment and Diagnosis, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA.
Guanjie Chen, Center for Research on Genomics and Global Health, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA.
Qizhai Li, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China.
References
- Airchison J. (2003) The Statistical Analysis of Compositional Data. Blackburn Press, Caldwell. [Google Scholar]
- Anderson M.J. (2001) A new method for non-parametric multivariate analysis of variance. Austral Ecol., 26, 32–46. [Google Scholar]
- Bai Z.D., Saranadasa H. (1996) Effect of high dimension: by an example of a two sample problem. Stat. Sin., 6, 311–329. [Google Scholar]
- Banerjee K. et al. (2019) An adaptive multivariate two-sample test with application to microbiome differential abundance analysis. Front. Genet., 10, 350. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Berk R.H., Jones D.H. (1979) Goodness-of-fit test statistics that dominate the Kolmogorov statistics. Probab. Theory Relat. Fields, 47, 47–59. [Google Scholar]
- Cai T.T. et al. (2014) Two-sample test of high-dimensional means under dependence. J. R. Stat. Soc. B, 76, 349–372. [Google Scholar]
- Cao Y. et al. (2018) Two-sample tests of high-dimensional means for compositional data. Biometrika, 105, 115–132. [Google Scholar]
- Castellarin M. et al. (2012) Fusobacterium nucleatum infection is prevalent in human colorectal carcinoma. Genome Biol., 22, 299–306. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Charlson E.S. et al. (2010) Disordered microbial communities in the upper respiratory tract of cigarette smokers. PLoS One, 5, e15216. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Coker O.O. et al. (2019) Enteric fungal microbiota dysbiosis and ecological alterations in colorectal cancer. Gut, 68, 654–662. [DOI] [PMC free article] [PubMed] [Google Scholar]
- David L.A. et al. (2014) Diet rapidly and reproducibly alters the human gut microbiome. Nature, 505, 559–563. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Doumatey A.P. et al. (2020) Gut microbiome profiles are associated with type 2 diabetes in urban Africans. Front. Cell. Infect. Microbiol., 10, 63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fischer M. et al. (2017) Abundance estimation and differential testing on strain level in metagenomics data. Bioinformatics, 33, i124–i132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fisher R.A. (1932) Statistical Methods for Research Workers. 4th edn. Oliver and Boyd, London. [Google Scholar]
- Ge Y. et al. (2003) Resampling-based multiple testing for microarray data analysis. Test, 12, 1–44. [Google Scholar]
- Gill S.R. et al. (2006) Metagenomic analysis of the human distal gut microbiome. Science, 312, 1355–1359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hartstra A.V. et al. (2015) Insights into the role of the microbiome in obesity and type 2 diabetes. Diabetes Care, 38, 159–165. [DOI] [PubMed] [Google Scholar]
- Hasan N., Yang H. (2019) Factors affecting the composition of the gut microbiota, and its modulation. PeerJ, 7, e7502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hu X. et al. (2016) Group-combined p-values with applications to genetic association studies. Bioinformatics, 32, 2737–2743. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Joossens M. et al. (2011) Dysbiosis of the faecal microbiota in patients with Crohn’s disease and their unaffected relatives. Gut, 60, 631–637. [DOI] [PubMed] [Google Scholar]
- Kostic A.D. et al. (2012) Genomic analysis identifies association of fusobacterium with colorectal carcinoma. Genome Res., 22, 292–298. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Law C.W. et al. (2014) Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol., 15, R29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu A. et al. (2010) A rank-based test for comparison of multidimensional outcomes. J. Am. Stat. Assoc., 105, 578–587. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu Y., Xie J. (2020) Cauchy combination test: a powerful test with analytic p-value calculation under arbitrary dependency structures. J. Am. Stat. Assoc., 115, 393–402. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Love M.I. et al. (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol., 15, 550. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Madsen B.E., Browning S.R. (2009) A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet., 5, e1000384. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mandal S. et al. (2015) Analysis of composition of microbiomes: a novel method for studying microbial composition. Microb. Ecol. Health Dis., 26, 27663. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Matson V. et al. (2018) The commensal microbiome is associated with anti–PD-1 efficacy in metastatic melanoma patients. Science, 359, 104–108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nakagawa S., Cuthill I.C. (2007) Effect size, confidence interval and statistical significance: a practical guide for biologists. Biol. Rev. Camb. Philos. Soc., 82, 591–605. [DOI] [PubMed] [Google Scholar]
- Parks D.H., Beiko R.G. (2010) Identifying biologically relevant differences between metagenomic communities. Bioinformatics, 26, 715–721. [DOI] [PubMed] [Google Scholar]
- Pasolli E. et al. (2017) Accessible, curated metagenomic data through ExperimentHub. Nat. Methods, 14, 1023–1024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Paulson J.N. et al. (2013) Differential abundance analysis for microbial marker-gene surveys. Nat. Methods, 10, 1200–1202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Penzlin A. et al. (2014) Pipasic: similarity and expression correction for strain-level identification and quantification in metaproteomics. Bioinformatics, 30, i149–i156. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Price A.L. et al. (2010) Pooled association tests for rare variants in exon-resequencing studies. Am. J. Hum. Genet., 86, 832–838. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Qin J. et al. (2012) A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature, 490, 55–60. [DOI] [PubMed] [Google Scholar]
- Robinson M.D. et al. (2010) edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 26, 139–140. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Storey J.D., Tibshirani R. (2003) Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA, 100, 9440–9445. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sung H. et al. (2021) Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin., 71, 209–249. [DOI] [PubMed] [Google Scholar]
- The GBD 2015 Obesity Collaborators. (2017) Health effects of overweight and obesity in 195 countries over 25 years. N. Engl. J. Med., 377, 13–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Virgin H.W., Todd J.A. (2011) Metagenomics and personalized medicine. Cell, 147, 44–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang C. et al. (2020) Estimating and testing the microbial causal mediation effect with high-dimensional and compositional microbiome data. Bioinformatics, 36, 347–355. [DOI] [PMC free article] [PubMed] [Google Scholar]
- White J.R. et al. (2009) Statistical methods for detecting differentially abundant features in clinical metagenomic samples. PLoS Comput. Biol., 5, e1000352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wilck N. et al. (2017) Salt-response gut commensal modulates TH17 axis and disease. Nature, 551, 585–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu C. et al. (2016) An adaptive association test for microbiome data. Genome Med., 8, 56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xia L.C. et al. (2011) Accurate genome relative abundance estimation based on shotgun metagenomic reads. PLoS One, 6, e27992. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xiao J. et al. (2017) False discovery rate control incorporating phylogenetic tree increases detection power in microbiome-wide multiple testing. Bioinformatics, 33, 2873–2881. [DOI] [PubMed] [Google Scholar]
- Yatsunenko T. et al. (2012) Human gut microbiome viewed across age and geography. Nature, 486, 222–227. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yu J. et al. (2017) Metagenomic analysis of faecal microbiome as a tool towards targeted non-invasive biomarkers for colorectal cancer. Gut, 66, 70–78. [DOI] [PubMed] [Google Scholar]
- Zhao N. et al. (2015) Testing in microbiome-profiling studies with MiRKAT, the microbiome regression-based kernel association test. Am. J. Hum. Genet., 96, 797–807. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
