Skip to main content
mSystems logoLink to mSystems
. 2023 Jul 25;8(4):e00040-23. doi: 10.1128/msystems.00040-23

A mixed-model approach for estimating drivers of microbiota community composition and differential taxonomic abundance

Amy R Sweeny 1,2,✉,#, Hannah Lemon 1,#, Anan Ibrahim 3, Kathryn A Watt 1, Kenneth Wilson 4, Dylan Z Childs 2, Daniel H Nussey 1, Andrew Free 3, Luke McNally 1
Editor: Sarah M Hird5
PMCID: PMC10469806  PMID: 37489890

ABSTRACT

Next-generation sequencing (NGS) and metabarcoding approaches are increasingly applied to wild animal populations, but there is a disconnect between the widely applied generalized linear mixed model (GLMM) approaches commonly used to study phenotypic variation and the statistical toolkit from community ecology typically applied to metabarcoding data. Here, we describe the suitability of a novel GLMM-based approach for analyzing the taxon-specific sequence read counts derived from standard metabarcoding data. This approach allows decomposition of the contribution of different drivers to variation in community composition (e.g., age, season, individual) via interaction terms in the model random-effects structure. We provide guidance to implementing this approach and show how these models can identify how responsible specific taxonomic groups are for the effects attributed to different drivers. We applied this approach to two cross-sectional data sets from the Soay sheep population of St. Kilda. GLMMs showed agreement with dissimilarity-based approaches highlighting the substantial contribution of age and minimal contribution of season to microbiota community compositions, and simultaneously estimated the contribution of other technical and biological factors. We further used model predictions to show that age effects were principally due to increases in taxa of the phylum Bacteroidetes and declines in taxa of the phylum Firmicutes. This approach offers a powerful means for understanding the influence of drivers of community structure derived from metabarcoding data. We discuss how our approach could be readily adapted to allow researchers to estimate contributions of additional factors such as host or microbe phylogeny to answer emerging questions surrounding the ecological and evolutionary roles of within-host communities.

IMPORTANCE

NGS and fecal metabarcoding methods have provided powerful opportunities to study the wild gut microbiome. A wealth of data is, therefore, amassing across wild systems, generating the need for analytical approaches that can appropriately investigate simultaneous factors at the host and environmental scale that determine the composition of these communities. Here, we describe a generalized linear mixed-effects model (GLMM) approach to analyze read count data from metabarcoding of the gut microbiota, allowing us to quantify the contributions of multiple host and environmental factors to within-host community structure. Our approach provides outputs that are familiar to a majority of field ecologists and can be run using any standard mixed-effects modeling packages. We illustrate this approach using two metabarcoding data sets from the Soay sheep population of St. Kilda investigating age and season effects as worked examples.

KEYWORDS: microbiota, metabarcoding, 16S, amplicon sequence variants, generalized linear mixed-effects model, community composition, differential abundance, Bayesian estimation

INTRODUCTION

The ecological dynamics of within-host communities of parasites and commensal microbes can have dramatic effects on host health and fitness (1, 2). One increasingly well-studied example of such a within-host community is the so-called gut microbiota: the often complex and diverse community of commensal bacteria resident in the gastrointestinal tracts of their animal hosts. As well as playing a crucial role in the digestion of food, studies from humans and model laboratory animals highlight the impacts of the gut microbiota on host behavior and metabolism, as well as endocrine and immune homeostasis (3 - 6). A growing number of studies within ecology and evolutionary biology investigates the dynamics of the gut microbiota of natural systems using a combination of fecal sampling and next-generation sequencing (NGS) metabarcoding approaches. Understanding the role of within-host communities in underpinning host phenotypic variation, as well as wider ecological and evolutionary dynamics, in the wild will require statistical approaches that allow us to robustly quantify the contribution of different environmental and host-related factors to such metabarcoding data. Generalized linear mixed models (GLMMs) are a well-established and widely used suite of statistical models within ecology and evolution which provide a flexible means for appropriately dealing with the complex data structures and relationships between predictors of interest (7). Although they have yet to be widely applied in this context, GLMMs have huge potential to help dissect and understand the drivers of within-host community dynamics, as revealed by metabarcoding data.

Standard methodologies for investigating hypotheses concerning gut microbiota dynamics in the wild typically include the collection of fecal samples from selected study subjects and the application of NGS techniques for metabarcoding of informative bacterial genes for taxonomic assignment of sequenced reads (8). Microbiota community analysis commonly relies on the transformation of operational taxonomic units or amplicon sequence variant (ASV) counts into relative proportions per sample or rarefaction such that a set library size is randomly subsampled from all samples (9 - 11). Hypothesis testing using transformed counts from 16S taxonomic assignments typically is focused on community-level differences in taxonomic diversity and composition between experimental groups or time points of interest. Statistical approaches to this end include estimation of alpha diversity (the number of distinguishable taxa within a sample), distance measures (e.g., Bray–Curtis dissimilarity) (12), and ordination with dimensionality reduction (e.g., principal coordinates analysis [PCoA]). Data transformations and hypothesis tests in these approaches have several limitations. Standardizations of data based on proportions ignore heteroscedasticity from different library sizes across samples, while those relying on rarefaction restrict data such that the reads considered per each sample are limited to the minimum number of reads across all samples (9). This in turn can significantly elevate rates of false positives or reduce performance in microbiome clustering approaches. In addition to statistical pitfalls, these traditional approaches for assessing community-level differences differ philosophically from GLMM-based approaches that partition complex sources of variance. Although traditional approaches have provided substantial insights into microbiota community composition, they fall short of the flexibility and power offered by GLMM-based approaches to dissect the manifold and complex contributors to variation in measured phenotypes in natural populations. There has, therefore, been a movement among community ecologists toward such GLMM-based methods (13). Here, we develop a GLMM-based approach to decompose the sources of variation in count data derived from metabarcoding approaches and discuss the advantages of this approach for the analysis of microbiota and other community data.

The application of mixed-effects models to microbiota data sets is not new. The “Hierarchical Modeling of Species Communities” approach uses latent variable modeling and random effects to model community compositions and has been previously used to examine urbanization effects on fungal environmental microbiota (14). Similar approaches developing Joint Species Distribution Modeling for microbiota data sets have also shown insights into microbiota composition in the wild (15). Our suggested approach differs from these in several ways. First, these approaches focus on modeling correlations among microbial taxa using latent variables to model residual correlation; this adds substantial complexity to the modeling process. Our approach does not attempt to model these correlations, and it focuses on variance decomposition of the sort familiar to ecologists and evolutionary biologists working on wild systems. If correlations among microbial taxa are of primary interest, we would direct readers to these approaches. Second, our approach does not require the use of any particular modeling package or a high degree of proficiency in coding. There are two central ideas in our approach—using sample-level random effects in over-dispersed Poisson models to account for variability in library size and using random effects of microbial taxonomy to allow for effects of host and environment on microbiota composition—that can be implemented in almost any random-effects modeling software or packages with which the reader is familiar. Thus, at the expense of modeling residual correlation among species, our approach offers a familiar method to decompose sources of variance in the microbiota for field scientists. Below we outline the motivation for this approach and illustrate this via an application to two 16S metabarcoding data sets from a wild mammal.

MATERIALS AND METHODS

A GLMM approach

As gut microbes have such important effects on host physiology, behavior, and health, much research has sought to identify individual microbial taxa that are responsible for alterations of host phenotype and state. This has in many ways mirrored the goals of many genome-wide association study (GWAS) analyses, which have sought to identify particular genetic variants associated with phenotypes of interest, often with a goal of developing diagnostics or drug targets (16). However, just as GWAS analyses have shown us that most phenotypes are highly polygenic, being determined by a complex combination of genetic variants of small effects (17 - 19), the study of host-associated microbiomes has often failed to find single taxa associated with host states (20, 21). Instead many changes in the host state are associated with general shifts in microbiome composition, often termed dysbiosis when accompanied by negative health consequences (20, 22). Phenomena such as dysbiosis shift the level at which we look for associations with host phenotype from a small number of microbial taxa to the whole microbiota. In addition, the most pressing questions about host-associated microbiota in ecology and evolution are very general and focused on the entire microbial community (2). For example, what are the relative roles of host physiology and environment in shaping the microbiota? How heritable is the microbiota? How much does microbiota composition impact fitness? The shift in focus of these questions from individual taxa to complete community poses an important conceptual and statistical challenge.

As previously discussed, host-associated microbiota often constitute hundreds or thousands of different taxa. Whenever we need to estimate a large ensemble of related parameters, a common statistical approach is to treat them as random variables from some distribution (23). To understand how this approach applies to the microbiota, let us consider the concrete question of estimating how the composition of the gut microbiota might change with season in a wild mammal. In traditional approaches to analyzing microbiota data sets, it would be common to visualize an ordination of the data, distinguishing points by season. Then, one would perform a permutational analysis of variance (ANOVA) on a dissimilarity matrix to test if microbiota from different seasons are more dissimilar than those from the same season and go on to test for the differential abundance of individual taxa across seasons to identify taxa with a major role in these changes (24, 25) ). In this approach, estimates for how individual taxa differ by season are all independent of each other. Using a random-effects model would approach this in a fundamentally different manner, where the effect of season across microbial taxa is treated as drawn from a random distribution. This approach has the advantage that all taxa inform the estimate of the mean and variance of the distribution that the effects across taxa come from. The estimates of parameters for individual taxa are then “shrunk” to this distribution. This “shrinkage” is known to improve the accuracy of parameter estimation as long as there are large numbers of groups for the random effects, which is generally true for most host-associated microbiota owing to their large number of taxa.

While fitting such random-effects models is known to improve parameter estimation owing to shrinkage, its biggest advantage is in allowing us to shift the questions we ask to the whole microbiota level, and partition complex and inter-related sources of variance. GLMM approaches have been used across other ecological and evolutionary contexts to estimate repeatability, relative levels of spatiotemporal variance (26), social and common environment effects (27, 28), as well as heritability and the role of host genetics (29). Answering such questions has proved hugely challenging in the microbiota field as most analyses rely on tools which are not multi-level, from which it is extremely difficult to decompose the relative contribution of simultaneous processes at the host and environmental scale. However, multi-level models have been shown to offer significant advantages over many other compositional methods in community ecology for species abundance data (30). Here, we develop and illustrate a method to appropriately structure random effects across microbial taxa within a community using a GLMM, and thus partition the sources of variance driving microbiota composition.

To see how such a model can be structured, let us again return to the example of estimating the effect of season on the gut microbiota of a wild mammal (Fig. 1). Consider a scenario with two samples taken per host from a sample of hosts in a population, one in winter and one in summer, and with samples appropriately sequenced and reads bioinformatically assigned to ASVs. This will yield data in the form of a count of reads belonging to each ASV (the focal taxonomic group) within each sample (Fig. 1A), with two samples per individual host, one from each season. We can directly analyze such count data by fitting an over-dispersed Poisson family GLMM with a log-link. The predicted values on the link scale are given according to the following model (Fig. 1B).

Fig 1.

Fig 1

Overview of mixed-model approach to wild microbiota analysis. Data processing (A) generates amplicon sequence variant (ASV)–level abundances for each sample. These raw abundances are used as the response for generalized linear mixed-effects models with Poisson error families. In the example illustrated, data include sampling time points for a group of individuals taken during two seasons. Model syntax therefore specifies a fixed effect of age, and random effects for taxonomy (asv), sample id (host:season, h:s), individual differential abundance of ASVs (asv:h), differential abundance of ASVs across seasons (asv:s), and a residual variance at the row level (asv:h:s). GLMM output can be used to partition the variance explained by each random-effect term (B). These variance components can be interpreted as the relative contributions of both technical variation and host or environmental contributions to differential abundance as illustrated in (C). Created with BioRender.com.

log(yh,asv,s)=β0+β1s+uh+uasv+uh:s+uasv:h+uasv:s+uasv:h:s

where yh,asv,s is the read count, β0 is a global intercept, and the remaining terms account for technical variation effects in read counts (abundance) as well as biological variation in taxonomic composition. Fixed and random terms dealing with technical variation are as follows: β1 is the effect of season (s) on total read count, uh is a random effect describing some variations in total read count among individual hosts (h, where there are multiple samples per host), uasv is a random effect describing variation in the total read count of each ASV across samples and hosts, uh:s is a random effect accounting for library size by describing variation in mean read count in each sample (i.e., host by season), and uasv:h:s is an additional random effect accounting for row-level variation (over-dispersion). In this example, biological effects of interest are specified as follows: uasv:h is a random effect describing the abundance (read count) of an ASV in host h, uasv:s is a random effect describing how ASV abundances (read counts) differ between seasons. By apportioning the variance attributed to these different random effects, we can assess the relative contributions of these different factors to microbiota composition (Fig. 1B and C). For example, a high variance associated with uh:s would indicate a high degree of technical variation due to library size variation across samples, and high variance associated with uasv could indicate high variation in read counts across ASVs due to over-dispersion of total abundance between common and rare taxa (Fig. 1C). With regard to biological inference, the variance associated with uasv:h can be interpreted as indicative of individual “repeatability” of ASV community composition and uasv:s can be interpreted as reflecting variance associated with compositional shifts across seasons.

Continuing with the above example, we can further use random-effect estimates from model outputs to explore which specific taxa are driving differential abundances between groups of interests (e.g., season), which is commonly of great interest in microbiome studies, but for which many existing methods may be affected by library size and normalization methods (10, 31). Differential abundance in this example (and with Bayesian implementation) can be estimated using each ASV-by-season level of the random effect uasv:s and comparing posterior distributions for each ASV across factor levels. Although fundamentally different from established significance tests of specific taxon abundances between groups of interest (e.g., references [25, 32, 33]), information from all taxa will influence the variance of random effects for holistic inference. For example, the mean of the posterior distribution for ASV1:summer minus that for ASV1:spring can be interpreted as the differential abundance of ASV1 between spring and summer, allowing identification of ASVs that exhibit the largest deviations from the means for further hypothesis generation and investigation.

While the Poisson model accounts for variation in library size across samples, there has also been a shift in microbiota research toward explicitly compositional data analysis, which removes any effects of library size (other than in quantifying uncertainty) prior to analysis. The centered log-ratio (CLR) described by Aitchison (34) represents one such transformation that may be useful for difficult data distributions or when complex random-effect structures are necessary. We present full details of how to implement the above GLMM approach using CLRs as an example of flexibility of this approach across error families and data transformations and then apply this alternative parameterization to the example data described below, in our supplementary files.

A worked example: age and season effects on gut microbiota in wild sheep

To test and illustrate our approach, we obtained fecal samples from Soay sheep (Ovis aries) from the island of Hirta in the St. Kilda archipelago of the Outer Hebrides of Scotland. These animals are free-living and are part of a long-term study in which individuals have been marked and monitored longitudinally since 1985 (35). All animal sampled had been caught and uniquely tagged within a few days of birth so that their age and sex were known with certainty. Each year, fieldwork teams visit St. Kilda in spring to monitor lambing and capture, mark, and sample newborn lambs within a few days of birth. Subsequently, each August a larger field team visits to capture, mark, and sample animals living in the study area using a series of corral traps (35).

Two sets of fecal samples were collected, in 2013 and 2016, to allow comparison of the gut microbiota of individuals of different ages (2013) and from the same individuals sampled in different seasons (2016). The 2013 samples were collected during the August catch and included 30 samples from lambs (around 4 months old) and 28 samples from older adults (ages 2–13 years). The 2016 samples were collected from a set of 36 females aged 1–13 years who were sampled in both spring (around the time of parturition) and then 3–4 months later in August. Microbial DNA was extracted from samples, amplified using bacterial 16S rRNA V4 region primers, and sequenced using the Ilumina MiSeq platform to generate 250 base pair (bp) paired-end reads. Sequences were processed using the DADA2 pipeline in R (v1.12.1) to call ASVs (Callahan et al., 2016). Full details of sampling, sequencing, and data processing methods are provided in the electronic supplementary material (ESM 1.1 through 1.4). We conducted standard dissimilarity analysis and permutational multi-variate analysis of variance (PERMANOVA) tests on the effects of age and season for comparison to the results of our GLMM approach (see ESM 1.4.1 for full details). All data and code are available from GitHub (36).

Specification of GLMMs

A tutorial describing the workflow for analysis can be found at https://arsweeny.github.io/microbiome-glmm/. We applied separate GLMMs to the 2013 and 2016 data sets. First, an aggregate data set for each year was created from the sample metadata, taxonomic classifications for each ASV, and an ASV-by-sample abundance matrix (n observations: 2016: 169,488; 2013: 117,102). We use a Bayesian framework and fit GLMMs with Poisson errors and log links to each data set using the package “MCMCglmm” (37) following the approach introduced in the section Materials and Methods. However, those wishing to use maximum likelihood estimation can do so using lme4 or ASREML (38, 39). Models fit to 2013 data, which included samples from hosts of two age classes from a single season (one sample per host) were specified as follows:

log(yh,asv)= β0+aβa+uh+uasv+uasv:a  +uasv:h

where yh,asv is the read count per ASV within each host, β0 is a global intercept, βa is the effect of age (a, binary factor: lamb versus adult) on total read count, uh is a random effect describing variation in total read count among individual hosts (h, equivalent to sample here where hosts are sampled once each), uasv is a random effect describing variation in total read count of each ASV across hosts/samples, uasv:a is a random effect describing how ASV abundances (read counts) differ between host age classes, and uasv:h is an additional row-level random effect describing residual variation.

Models fit to 2016 data, which included samples from individual hosts of similar age sampled in both spring and summer of the same year (two samples per host), were specified as follows:

log(yh,asv,s)=β0+sβs+uh+uasv+uh:s+uasv:h+uasv:s+uasv:h:s

Here yh,asv,s is the read count, β0 is a global intercept, βs is the effect of season (s, binary factor: spring versus summer) on total read count, uh is a random effect describing variation in total read count among individual hosts (h, where there are multiple samples per host), uasv is a random effect by describing variation in total read count of each ASV across samples and hosts, uh:s is a random effect describing variation in total read count among individual hosts (h, where there are multiple samples per host), uasv:h is a random effect describing the abundance (read count) of an ASV in host h, uasv:s is a random effect describing how ASV abundances (read counts) differ between seasons, and uasv:h:s is an additional row-level random effect describing residual variation.

β0 is a global intercept, and the remaining terms account for technical variation as well as biological variation of interest. Fixed and random terms dealing with technical variation are as follows: β1 is the effect of season (s) on total read count, uh is a random effect describing variation in total read count among individual hosts (h, where there are multiple samples per host), uasv is a random effect by describing variation in the total read count of each ASV across samples and hosts, uh:s is a random effect accounting for library size by describing variation in mean read count in each sample (i.e., host by season), and uasv:h:s is an additional random effect accounting for row-level variation (over-dispersion). In this example, biological effects of interest are specified as follows: uasv:h is a random effect describing the abundance (read count) of an ASV in host h and uasv:s is a random effect describing how ASV abundances (read counts) differ between seasons.

Using this GLMM approach, we calculated the relative contributions to sources of variance in the data from both technical and biological model components. We followed Nakagawa and Schielzeth (40) for the calculation of r 2 from GLMMs with Poisson error distributions. Using this formula, there is a portion of variance equal to 1 minus the sum variance of the model components, which represents variance arising from the Poisson distribution. Where multiple samples are present per individual (2016), repeatability of the community composition of ASVs can be estimated as the proportion of variance attributable to differential taxonomic composition across individuals divided by the sum of the variance explained by all other non-technical component terms estimating compositional effects uasv:h/(uasv:s+uasv:h+ uasv:h:s) .

We investigated differential abundances as outlined above. To extract information on specific bacterial taxa contributing to differential abundance across age groups or seasons, we used Poisson model outputs and subtracted the posterior distributions for each ASV between group levels (2013: age; 2016: season). We used the resultant distribution to calculate a mean difference and the highest posterior density interval (HPDI) to estimate differential abundance for each ASV. For example, the mean of the posterior distribution for ASV1:summer minus that for ASV1:spring can be interpreted as the differential abundance of ASV between spring and summer. A difference can be considered robust when credible intervals do not span zero.

RESULTS

The gut microbiota communities of Soay sheep were dominated by two phyla, Firmicutes and Bacteroidetes (Fig. S1), as has been previously observed in most vertebrates (41). PCoA based on Bray–Curtis dissimilarity indicated clustering of samples by age and season (Fig. 2). The result of a PERMANOVA test on the 2013 data set showed a significant difference in group centroids for lambs and adults (pseudo-F = 7.161, P < 0.001), with 11.34% of the variance in gut microbiota composition (R 2) explained by differences between lambs and adults. PERMANOVA results for the 2016 data showed that group centroids for April and August are significantly distinct (pseudo-F = 2.026, P = 0.002), but season only explains 2.81% (R 2) of the observed variance.

Fig 2.

Fig 2

Soay sheep gut microbiota beta diversity in adults and lambs from 2013 (A) and from April and August of 2016 (B). Principal coordinates analysis (PCoA) plots represent Bray–Curtis dissimilarity indicating clustering of samples by the group. Ellipsoids represent a 95% confidence interval surrounding each group.

Poisson GLMMs from 2013 data showed comparable results to ordination approaches, where community composition differed substantially between age classes (proportion variance uasv:a : 19.88% CI 18.47%–21.39%; Fig. 3). Additional effects estimated by the model showed a substantial proportion of variance explained by taxonomic variation in ASV abundance ( uasv 2013: 17.35%), a small portion of variation explained by variation in mean library size across samples ( uh 2013: 1.59%) and considerable residual variance (estimated by the “units” term in MCMCglmm; uasv:h 2013: 49.42%; Fig. 3).

Fig 3.

Fig 3

Proportion of variance in bacterial read counts from different ASVs explained by GLMM component terms for two data sets. The 2013 data set (A) compared gut microbiota across two age classes from individuals sampled once at the same time point), while the 2016 data set (B) compared samples taken from the same individuals over two seasons.

Poisson GLMMs from 2016 data likewise showed comparable results to ordination approaches, where community composition changed negligibly between seasons ( uasv:s : 1.24% CI 0.99%–1.44%; Fig. 3). The 2016 model showed a substantial proportion of variance explained by taxonomic variation in ASV abundance ( uasv 2016: 34.74%), a small portion of variation explained by variation in mean library size across samples ( uh:s 2016: 1.99%) and considerable residual variance ( uasv:h:s 2016: 46.67%; Fig. 3). Repeated sampling of individuals in 2016 additionally showed moderate variance explained by inter-individual variation in community composition ( uasv:h 2016: 5.49%). This equated to an individual repeatability of 11.1% for their microbiota community composition across sampling time points.

As outlined in Materials and Methods (Specification of GLMMs), we calculated differential abundances using the posterior distributions for each random-effect level of specific taxa across age classes (2013 data set; Fig. 4A and B) and seasons (2016 data set; Fig. 4C and D). For the ASV-by-age effect in the 2013 data ( uasv:a ; Fig. 3), the estimates of taxa-specific differential abundances suggest that ASVs demonstrating strong shifts between lambs and adults belong primarily to two phyla, Firmicutes and Bacteroidetes (Fig. 4A and B). 683 out of 2,023 (33.76%) ASVs present in 2013 data showed shifts between lambs and adults whose credible intervals did not span zero (50.81% positive shifts, 49.19% negative shifts). Bacteroidetes represented 52.16% of these positive shifts into adulthood, and Firmicutes represented 72.02% of these negative shifts into adulthood (Fig. 4B; Table S4). For the ASV-by-season effect in the 2016 data ( uasv:s ; Fig. 3; Table S4), very few ASVs (24 of 2,364; 1.02%) had differential abundance effects with credible intervals that did not span zero between spring and summer sampling (Fig. 4C and D; Table S4).

Fig 4.

Fig 4

Differential abundances across age classes (A and B) or season (C and D) for individual ASVs calculated from GLMMs with Poisson error families and taxonomic levels specified as ASV only. (A and C) represent all ASV-level effects. Violin plots represent the distribution of effect estimates, and size of the point represents the inverse variance of the estimate. Rectangles indicate the ASVs with the highest magnitude (positive or negative) differential abundances in forest plots (B and D). Forest plots represent point estimates and HPDI for the ASVs involved in the 50 (age class) or 10 (season) strongest increases and decreases of abundance.

Our results indicate that there are developmental shifts in the Soay sheep gut microbiota between lambs and adults and that the majority of taxa shifting in abundance belong to the Bacteroidetes and Firmicutes. However, this raises the question of whether this shift is because Bacteroidetes are generally more abundant in adults and Firmicutes more abundant in lambs, or if the ASVs that show these patterns just happen to be in these phyla. To illustrate how GLMMs can be used to address questions of this sort, we modified our models for the 2013 data set to include additional taxonomic effects of family and phylum, allowing us to identify taxonomic levels most responsible for differential abundances. Details of these phylogenetically more explicit GLMMs and their results and implications are presented in detail in ESM 1.4.3; Fig. S4 and Table S3. Results suggest considerable variation with respect to age across families and ASVs within both the Bacteroidetes and Firmicutes phyla and that most of the age effects on microbiota community composition occur at these lower taxonomic levels (Fig. S4; Table S3).

DISCUSSION

We have described a novel approach to analyze metabarcoding data derived from NGS using a GLMM framework, have illustrated this method using data describing variation in the gut microbiota community in wild sheep, and have provided a user guide for implementing multiple versions of this approach (https://arsweeny.github.io/microbiome-glmm/). Our approach represents an important step forward for researchers interested in using meta-taxonomic approaches to understand variation in the community structure in complex, non-experimental settings. It allows the well-established power and flexibility of GLMM-based approaches to be harnessed to decompose drivers of variation in NGS-derived data on the taxonomic composition of samples. Our example analyses and tutorial provide very simple illustrations of how the approach can be used to estimate the contribution of host-related or environmental factors (specifically, age and season) to variation in the community structure of the gut microbiota. We show that results are comparable to those derived from widely applied ordination-based approaches and discuss the implications of observed variation in gut microbiota structure with age and season briefly below. However, these analyses are intended mainly as templates to help illustrate the approach, and barely scratch the surface of the types of important outstanding questions the method could be used to tackle with larger-scale data sets. Applying GLMMs to taxonomic-level sequence count data provides a rich toolkit from fields like quantitative genetics to dissect the contributions of different environmental and host factors to variation in community structure, with the potential to advance our understanding of community ecology, host–microbe or –pathogen interactions, and evolutionary dynamics.

In our illustrative analyses of wild Soay sheep, both GLMM and ordination-based approaches identified a stronger effect of age than a season on the gut microbiota. Changes in the structure of the gut microbiota across development in early life and during senescence in later adulthood are well-established in human studies (42) but remain poorly understood in natural populations. Our data clearly show the gut microbiota community structure changes between recently weaned lambs and adults and argues for further longitudinal studies in natural systems to test whether shifts in gut community structure could play a role in patterns of demographic aging in the wild. A growing number of studies in the wild have documented seasonal gut microbiota changes (43 - 46). The absence of strong seasonal differences in our data may be related to the relative homogeneity of the herbivorous diet of Soay sheep, as most previous wild studies are of omnivores with strong seasonal shifts in diet preference. Alternatively, it may be due to relatively low sample sizes in this pilot data set or because the spring and summer seasons we sampled are both periods of relatively high food abundance and quality compared to autumn and winter when habitat quality and food availability change more dramatically. In future studies, repeated sampling of the same individuals over time will be crucial to understand the effects of age, environmental, and other variables on gut community structure. The application of our approach to such longitudinal metataxonomic data sets will help researchers to robustly estimate within-individual patterns of change in community structures over time or space while also estimating how repeatable community structures across hosts.

Our novel GLMM approach allows the estimation of key ecological and evolutionary parameters from metabarcoding data sets which can advance our understanding of host–microbe evolutionary dynamics. Individual repeatability of measured phenotypes is an important and widely estimated parameter in ecology and quantitative genetics (47). Estimating the within-host repeatability of microbiota community structure over time can offer insight into the extent to which host control and environmental selection determine species composition (48). The GLMM structure presented here directly estimates this repeatability across two seasons in our 2016 data set at around 11%, although the small sample size and temporal proximity of the samples should mean we interpret this parameter estimate with caution. However, the model is illustrative and it should be clear that it is readily extendible to address emerging questions in the field. For example, the effects of host relatedness and inbreeding effects on microbiota composition have been explored previously in microbiome studies but via ordination methods using a small number of genetic clusters as grouping units (49). Including host genetic relatedness matrices as a random effect in a mixed-effects model (the so-called “animal models”) within our GLMM framework can offer insights into heritability and inbreeding effects on community composition as it compares to other forces in the population (50). Factors that can be of both considerable ecological and evolutionary interest and that can also confound heritability estimates, such as maternal effects or shared nest or litter effects can likewise be incorporated into these models (51). Statistical advances to address maternal and social effects as well as spatial autocorrelation in ecological data sets can also be incorporated into microbiota analyses (52, 53). Applied to larger longitudinal data sets, our GLMM approach can allow researchers to directly estimate how different aspects of host state, genotype, and environment influence the structure of within-host communities and address many outstanding questions about the evolutionary and ecological causes and consequences of host–microbe interactions.

Full realization of the role of microbiota communities in the ecology and evolution of wild organisms depends on both identifying factors with important effects on global microbiota composition and on being able to test whether and how individual taxa or taxonomic groups underpin those effects. Our GLMM approach readily lends itself to addressing both questions. We have illustrated how the approach can be used to identify ASVs involved in community-level shifts with age identified in the random-effects structure of the 2013 models (Fig. 4) and to further decompose the contribution of different taxonomic levels to community-level effects (ESM 1.4.3; Table S3). For example, our analysis highlights that analysis at the phylum level could provide a misleading view of compositional shifts associated with the age of Soay sheep and that there is substantial variation at the family level within each phylum (Fig. S4). This approach should offer similar insights to linear discriminant analysis (LDA) used in approaches such as LefSE (24), with the advantage of extraction of this information for multiple factors of interest rather than the requirement of a priori knowledge of effects of interest to run differential abundance analysis. Because variances are often unequal across taxonomic classes and nested taxonomic levels assume equal phylogenetic distance, our approach could be further developed to identify taxonomic levels associated with the greatest variance (ESM 1.4.3) by explicitly including the microbiota phylogeny within the GLMMs. This would allow the environment and host effects on community composition to be estimated accounting more accurately for phylogenetic distances between ASVs (54) in a similar manner to UNIFRAC clustering approaches (55). A GLMM-based approach capable of simultaneously dissecting the contributions of host environment, state, and genetics alongside microbial phylogeny to variation in microbial community structure seems to us to represent a powerful step toward to robustly address emerging questions surrounding the role of the microbiome in ecology and evolution.

Despite its advantages, we note that there are challenges with this approach and assumptions that must be considered in its application. In this paper, we employ several means of specifying parameters for and assessing the performance of models. In addition to the careful specification of random-effect structure aligned with the nature of the predictors, we encourage users employing MCMCglmm to inspect traces for model terms to identify autocorrelation or poor mixing to identify issues with convergence. In either Bayesian or frequentist frameworks, an inspection of model residuals can also indicate whether there are performance problems. Over-dispersion caused by zero inflation or aggregated counts can commonly pose problems for GLMMs (56), and over-dispersion is common in read count data describing microbiome community abundance (41). The degree of zero inflation and aggregation of counts will vary by site and system, and some consideration should be given to error family and data processing for model performance. In this manuscript, we use an over-dispersed Poisson distribution with an observation-level random effect; however, this does not always capture over-dispersion and can inflate R 2 values (57). Here, we used data subsets using several abundance thresholds in Poisson models to test the sensitivity of results to these choices (ESM 1.4.2). Additionally, we calculated the ratio of observed to model predicted zeros for both data sets presented in the main text and find that this ratio is close to 1 and that models predict true abundance means with very little deviation (ESM 1.4.4; Fig. S5). For instances where over-dispersed models may not suit investigators’ data or where the inclusion of more complex random-effect structures introduces computational limitations, we also provide some discussion in the supplementary methods with a worked example of an alternate approach (ESM Section 2) which uses the CLR data transformation and Gaussian error families. Where zero inflation is notably high or researchers are interested in questions around both the abundance and prevalence of taxa within the microbiome, zero-inflated Poisson models are an additional option, although they can be difficult to fit and fall outside the scope of this introduction. As with any GLMM approach, there will be limitations to this method dependent on sample replication and distribution of data across levels of random effects (58). We also note that most implementations of mixed-effects models assume that the random effects come from a Gaussian distribution. While this may at first appear a strong assumption, GLMMs are generally quite robust to violations of this assumption, though there is an upward bias in variance estimates if the true distribution of effects is bimodal but modeled as Gaussian (59). Such problems should be identifiable from plotting data across levels of random effects of interest.

Beyond microbiota community analyses, approaches outlined in this manuscript are applicable more broadly to different types of metataxonomic data being collected across myriad systems and research disciplines. For example, there has been great interest in describing the dynamics of the parasite community as an ecosystem and understanding its influence on host health (60 - 62). A growing number of studies apply metabarcoding to fecal samples to estimate the community structure of gastrointestinal parasite communities (63, 64), and our GLMM approach could readily be applied to such data sets to dissect the drivers of variation in parasite community composition. Another area of interest within ecology and evolution is using metabarcoding of fecal samples to estimate diet composition and its relationship to host phenotypes. Bayesian mixed-model approaches have also recently been applied to the analysis of the presence and absence of Cyanistes caeruleus (blue tit) diet components and align conceptually with approaches presented in this article (65). GLMM approaches to metabarcoding data maintain key similarities to other multivariate community ecological approaches to abundance data (66) while integrating the benefits of ecological and evolutionary approaches to quantifying phenotypic variation. We therefore suggest that approaches presented in this article can be applied across a range of systems and data types for powerful and flexible understanding of complex drivers of community dynamics.

Supplementary Material

Reviewer comments
reviewer-comments.pdf (338.9KB, pdf)

ACKNOWLEDGMENTS

This work was funded by a large Natural Environment Research Council (NERC) grant (NE/R016801/1), and the long-term study on St. Kilda was funded principally by responsive mode grants from NERC. L.M. was suppported by HFSP Young Investigator Project Grant RGY0072/21.

We thank Adam Hayward and Jill Pilkington for sample collection, the National Trust for Scotland for support of our work on St. Kilda, and QinetiQ and Kilda Cruises for logistical support. We also thank the Ecology Within Team for input in the analysis and manuscript and Josephine Pemberton for support and management of the field project. Figure 1 was created with Biorender.com. Photos on which sheep icons in Fig. 1 are based are by Hannah Vallin and Martin Stoffel.

L.M. and A.R.S. conceived and developed the statistical methods. A.R.S., L.M., A.F., and H.L. conducted the analyses. A.F., A.I., K.A.W., K.W., and D.H.N. oversaw and undertook sample collection and laboratory work. A.R.S. and H. L. wrote the first draft of the manuscript, and all authors contributed to the writing of the final manuscript.

Contributor Information

Amy R. Sweeny, Email: amyr.sweeny@gmail.com.

Sarah M. Hird, University of Connecticut, Storrs, Connecticut, USA

DATA AVAILABILITY

Data and code for this manuscript are available at GitHub. Sequences and metadata on which analysis is based can be found under number PRJEB39322 on the European Nucleotide Archive.

SUPPLEMENTAL MATERIAL

The following material is available online at https://doi.org/10.1128/msystems.00040-23.

Figure S1. msystems.00040-23-s0001.tif.

Relative abundance of phyla and families in example data sets.

DOI: 10.1128/msystems.00040-23.SuF1
Figure S2. msystems.00040-23-s0002.tif.

Proportion variance across multiple levels of initial abundance filtering.

DOI: 10.1128/msystems.00040-23.SuF2
Figure S3. msystems.00040-23-s0003.tif.

Proportion variance for CLR models.

DOI: 10.1128/msystems.00040-23.SuF3
Figure S4. msystems.00040-23-s0004.tif.

Differential abundances across age classes accounting for multiple levels of taxonomy.

DOI: 10.1128/msystems.00040-23.SuF4
Figure S5. msystems.00040-23-s0005.tif.

Excess zero distributions.

DOI: 10.1128/msystems.00040-23.SuF5
Table S1. msystems.00040-23-s0006.xlsx.

MCMCglmm output for Poisson GLMMs.

DOI: 10.1128/msystems.00040-23.SuF6
Table S2. msystems.00040-23-s0007.xlsx.

MCMCglmm output for CLR Gaussian GLMMs.

DOI: 10.1128/msystems.00040-23.SuF7
Table S3. msystems.00040-23-s0008.xlsx.

MCMCglmm output for Poisson

with nested taxonomic structure.

DOI: 10.1128/msystems.00040-23.SuF8
Table S4. msystems.00040-23-s0009.xlsx.

Representation across phyla of significant ASV-level compositional shifts.

DOI: 10.1128/msystems.00040-23.SuF9
Electronic Supplemental Information. msystems.00040-23-s0010.docx.

Additional information regarding methods, model specification, model validation, and alternate model implementations. Supplemental figure and table legends.

DOI: 10.1128/msystems.00040-23.SuF10
OPEN PEER REVIEW. reviewer-comments.pdf.

An accounting of the reviewer comments and feedback.

DOI: 10.1128/msystems.00040-23.SuF11

ASM does not own the copyrights to Supplemental Material that may be linked to, or accessed through, an article. The authors have granted ASM a non-exclusive, world-wide license to publish the Supplemental Material files. Please contact the corresponding author directly for reuse.

REFERENCES

  • 1. Alberdi A, Aizpurua O, Bohmann K, Zepeda-Mendoza ML, Gilbert MTP. 2016. Do vertebrate gut metagenomes confer rapid ecological adaptation? Trends Ecol Evol 31:689–699. doi: 10.1016/j.tree.2016.06.008 [DOI] [PubMed] [Google Scholar]
  • 2. Koskella B, Hall LJ, Metcalf CJE. 2017. The microbiome beyond the horizon of ecological and evolutionary theory. Nat Ecol Evol 1:1606–1615. doi: 10.1038/s41559-017-0340-2 [DOI] [PubMed] [Google Scholar]
  • 3. Sudo N, Chida Y, Aiba Y, Sonoda J, Oyama N, Yu X-N, Kubo C, Koga Y. 2004. Postnatal microbial colonization programs the hypothalamic-pituitary-adrenal system for stress response in mice. J Physiol 558:263–275. doi: 10.1113/jphysiol.2004.063388 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Cox-Foster DL, Conlan S, Holmes EC, Palacios G, Evans JD, Moran NA, Quan P-L, Briese T, Hornig M, Geiser DM, Martinson V, vanEngelsdorp D, Kalkstein AL, Drysdale A, Hui J, Zhai J, Cui L, Hutchison SK, Simons JF, Egholm M, Pettis JS, Lipkin WI. 2007. A metagenomic survey of microbes in honey bee colony collapse disorder. Science 318:283–287. doi: 10.1126/science.1146498 [DOI] [PubMed] [Google Scholar]
  • 5. Desbonnet L, Clarke G, Shanahan F, Dinan TG, Cryan JF. 2014. Microbiota is essential for social development in the mouse. Mol Psychiatry 19:146–148. doi: 10.1038/mp.2013.65 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Wahlström A, Sayin SI, Marschall H-U, Bäckhed F. 2016. Intestinal crosstalk between bile acids and microbiota and its impact on host metabolism. Cell Metab 24:41–50. doi: 10.1016/j.cmet.2016.05.005 [DOI] [PubMed] [Google Scholar]
  • 7. Bolker BM, Brooks ME, Clark CJ, Geange SW, Poulsen JR, Stevens MHH, White J-S. 2009. Generalized linear mixed models: a practical guide for ecology and evolution. Trends Ecol Evol 24:127–135. doi: 10.1016/j.tree.2008.10.008 [DOI] [PubMed] [Google Scholar]
  • 8. Pollock J, Glendinning L, Wisedchanwet T, Watson M. 2018. The madness of microbiome: attempting to find consensus "best practice" for 16S microbiome studies. Appl Environ Microbiol 84:e02627-17. doi: 10.1128/AEM.02627-17 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. McMurdie PJ, Holmes S. 2014. Waste not, want not: why rarefying microbiome data is inadmissible. PLoS Comput Biol 10:e1003531. doi: 10.1371/journal.pcbi.1003531 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Weiss S, Xu ZZ, Peddada S, Amir A, Bittinger K, Gonzalez A, Lozupone C, Zaneveld JR, Vázquez-Baeza Y, Birmingham A, Hyde ER, Knight R. 2017. Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome 5:27. doi: 10.1186/s40168-017-0237-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. McMurdie PJ. 2018. Normalization of microbiome profiling data. Methods Mol Biol 1849:143–168. doi: 10.1007/978-1-4939-8728-3_10 [DOI] [PubMed] [Google Scholar]
  • 12. Bray JR, Curtis JT. 1957. An ordination of the upland forest communities of Southern Wisconsin. Ecological Monographs 27:325–349. doi: 10.2307/1942268 [DOI] [Google Scholar]
  • 13. Niku J, Hui FKC, Taskinen S, Warton DI, Goslee S. 2019. gllvm: Fast analysis of multivariate abundance data with generalized linear latent variable models inR. Methods Ecol Evol 10:2173–2182. doi: 10.1111/2041-210X.13303 [DOI] [Google Scholar]
  • 14. Abrego N, Crosier B, Somervuo P, Ivanova N, Abrahamyan A, Abdi A, Hämäläinen K, Junninen K, Maunula M, Purhonen J, Ovaskainen O. 2020. Fungal communities decline with urbanization—more in air than in soil. ISME J 14:2806–2815. doi: 10.1038/s41396-020-0732-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Björk JR, Hui FKC, O’Hara RB, Montoya JM. 2018. Uncovering the drivers of host‐associated microbiota with joint species distribution modelling. Mol Ecol 27:2714–2724. doi: 10.1111/mec.14718 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Visscher PM, Brown MA, McCarthy MI, Yang J. 2012. Five years of GWAS discovery. Am J Hum Genet 90:7–24. doi: 10.1016/j.ajhg.2011.11.029 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Goldstein DB. 2009. Common genetic variation and human traits. N Engl J Med 360:1696–1698. doi: 10.1056/NEJMp0806284 [DOI] [PubMed] [Google Scholar]
  • 18. Purcell SM, Wray NR, Stone JL, Visscher PM, O’Donovan MC, Sullivan PF, Sklar P, International Schizophrenia Consortium . 2009. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460:748–752. doi: 10.1038/nature08185 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Loh P-R, Bhatia G, Gusev A, Finucane HK, Bulik-Sullivan BK, Pollack SJ, de Candia TR, Lee SH, Wray NR, Kendler KS, O’Donovan MC, Neale BM, Patterson N, Price AL, Schizophrenia Working Group of Psychiatric Genomics Consortium . 2015. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat Genet 47:1385–1392. doi: 10.1038/ng.3431 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Clemente JC, Ursell LK, Parfrey LW, Knight R. 2012. The impact of the gut microbiota on human health: an integrative view. Cell 148:1258–1270. doi: 10.1016/j.cell.2012.01.035 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Vayssier-Taussat M, Albina E, Citti C, Cosson J-F, Jacques M-A, Lebrun M-H, Le Loir Y, Ogliastro M, Petit M-A, Roumagnac P, Candresse T. 2014. Shifting the paradigm from pathogens to pathobiome: new concepts in the light of meta-omics. Front Cell Infect Microbiol 4:29. doi: 10.3389/fcimb.2014.00029 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Carding S, Verbeke K, Vipond DT, Corfe BM, Owen LJ. 2015. Dysbiosis of the gut microbiota in disease. Microb Ecol Health Dis 26:26191. doi: 10.3402/mehd.v26.26191 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Gelman A, Hill J. 2006. Data analysis using regression and Multilevel/Hierarchical models. Cambridge University Press. doi: 10.1017/CBO9780511790942 [DOI] [Google Scholar]
  • 24. Segata N, Izard J, Waldron L, Gevers D, Miropolsky L, Garrett WS, Huttenhower C. 2011. Metagenomic biomarker discovery and explanation. Genome Biol 12:R60. doi: 10.1186/gb-2011-12-6-r60 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Mandal S, Van Treuren W, White RA, Eggesbø M, Knight R, Peddada SD. 2015. Analysis of composition of microbiomes: a novel method for studying microbial composition. Microb Ecol Health Dis 26:27663. doi: 10.3402/mehd.v26.27663 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Albery GF, Becker DJ, Kenyon F, Nussey DH, Pemberton JM. 2019. The fine-scale landscape of immunity and parasitism in a wild ungulate population. Integr Comp Biol 59:1165–1175. doi: 10.1093/icb/icz016 [DOI] [PubMed] [Google Scholar]
  • 27. Rushmore J, Caillaud D, Matamba L, Stumpf RM, Borgatti SP, Altizer S. 2013. Social network analysis of wild chimpanzees provides insights for predicting infectious disease risk. J Anim Ecol 82:976–986. doi: 10.1111/1365-2656.12088 [DOI] [PubMed] [Google Scholar]
  • 28. Froy H, Börger L, Regan CE, Morris A, Morris S, Pilkington JG, Crawley MJ, Clutton-Brock TH, Pemberton JM, Nussey DH. 2018. Declining home range area predicts reduced late-life survival in two wild ungulate populations. Ecol Lett 21:1001–1009. doi: 10.1111/ele.12965 [DOI] [PubMed] [Google Scholar]
  • 29. Hayward AD, Garnier R, Watt KA, Pilkington JG, Grenfell BT, Matthews JB, Pemberton JM, Nussey DH, Graham AL. 2014. Heritable, heterogeneous, and costly resistance of sheep against nematodes and potential feedbacks to epidemiological dynamics. Am Nat 184:S58–S76. doi: 10.1086/676929 [DOI] [PubMed] [Google Scholar]
  • 30. Jackson MM, Turner MG, Pearson SM, Ives AR. 2012. Seeing the forest and the trees: multilevel models reveal both species and community patterns. Ecosphere 3:art79. doi: 10.1890/ES12-00116.1 [DOI] [Google Scholar]
  • 31. Nearing JT, Douglas GM, Hayes MG, MacDonald J, Desai DK, Allward N, Jones CMA, Wright RJ, Dhanani AS, Comeau AM, Langille MGI. 2022. Microbiome differential abundance methods produce different results across 38 datasets. Nat Commun 13:777. doi: 10.1038/s41467-022-28401-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Anders S, Huber W. 2010. Differential expression analysis for sequence count data. Genome Biol 11:R106. doi: 10.1186/gb-2010-11-10-r106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Fernandes AD, Macklaim JM, Linn TG, Reid G, Gloor GB. 2013. ANOVA-like differential expression (ALDEx) analysis for mixed population RNA-seq. PLoS One 8:e67019. doi: 10.1371/journal.pone.0067019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Aitchison J. 1982. The statistical analysis of compositional data. J R Stat Soc 44:139–160. doi: 10.1111/j.2517-6161.1982.tb01195.x [DOI] [Google Scholar]
  • 35. Clutton-Brock TH, Pemberton JM. 2003. Soay sheep: Dynamics and selection in an island population. Cambridge University Press. doi: 10.1017/CBO9780511550669 [DOI] [Google Scholar]
  • 36. Sweeny AR. Microbiome-GLMM. GitHub. https://github.com/arsweeny/microbiome-glmm
  • 37. Hadfield JD. 2010. MCMC methods for multi-response generalized linear mixed models: theMCMCglmm. J Stat Soft 33:1–22. doi: 10.18637/jss.v033.i02 [DOI] [Google Scholar]
  • 38. Bates D, Mächler M, Bolker B, Walker S. 2015. Fitting linear mixed-effects models usinglme4. J Stat Soft 67:1–48. doi: 10.18637/jss.v067.i01 [DOI] [Google Scholar]
  • 39. Ar G 1999. ASREML reference manual. NSW Agriculture Biometric Bulletin 3:1–210. [Google Scholar]
  • 40. Nakagawa S, Schielzeth H, O’Hara RB. 2013. A general and simple method for obtaining R 2 from generalized linear mixed-effects models. Methods Ecol Evol 4:133–142. doi: 10.1111/j.2041-210x.2012.00261.x [DOI] [Google Scholar]
  • 41. Ley RE, Hamady M, Lozupone C, Turnbaugh PJ, Ramey RR, Bircher JS, Schlegel ML, Tucker TA, Schrenzel MD, Knight R, Gordon JI. 2008. Evolution of mammals and their gut microbes. Science 320:1647–1651. doi: 10.1126/science.1155725 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Popkes M, Valenzano DR. 2020. Microbiota-host interactions shape ageing dynamics. Philos Trans R Soc Lond B Biol Sci 375:20190596. doi: 10.1098/rstb.2019.0596 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Amato KR, Leigh SR, Kent A, Mackie RI, Yeoman CJ, Stumpf RM, Wilson BA, Nelson KE, White BA, Garber PA. 2015. The gut microbiota appears to compensate for seasonal diet variation in the wild black howler monkey (Alouatta pigra). Microb Ecol 69:434–443. doi: 10.1007/s00248-014-0554-7 [DOI] [PubMed] [Google Scholar]
  • 44. Maurice CF, Knowles SCL, Ladau J, Pollard KS, Fenton A, Pedersen AB, Turnbaugh PJ. 2015. Marked seasonal variation in the wild mouse gut microbiota. ISME J 9:2423–2434. doi: 10.1038/ismej.2015.53 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Ren T, Boutin S, Humphries MM, Dantzer B, Gorrell JC, Coltman DW, McAdam AG, Wu M. 2017. Seasonal, spatial, and maternal effects on gut microbiome in wild red squirrels. Microbiome 5:163. doi: 10.1186/s40168-017-0382-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Orkin JD, Campos FA, Myers MS, Cheves Hernandez SE, Guadamuz A, Melin AD. 2019. Seasonality of the gut microbiota of free-ranging white-faced capuchins in a tropical dry forest. ISME J 13:183–196. doi: 10.1038/s41396-018-0256-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Wilson AJ. 2018. How should we interpret estimates of individual repeatability? Evol Lett 2:4–8. doi: 10.1002/evl3.40 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Foster KR, Schluter J, Coyte KZ, Rakoff-Nahoum S. 2017. The evolution of the host microbiome as an ecosystem on a leash. Nature 548:43–51. doi: 10.1038/nature23292 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Yuan ML, Dean SH, Longo AV, Rothermel BB, Tuberville TD, Zamudio KR. 2015. Kinship, inbreeding and fine-scale spatial structure influence gut microbiota in a hindgut-fermenting tortoise. Mol Ecol 24:2521–2536. doi: 10.1111/mec.13169 [DOI] [PubMed] [Google Scholar]
  • 50. Wilson AJ, Réale D, Clements MN, Morrissey MM, Postma E, Walling CA, Kruuk LEB, Nussey DH. 2010. An ecologist's guide to the animal model. J Anim Ecol 79:13–26. doi: 10.1111/j.1365-2656.2009.01639.x [DOI] [PubMed] [Google Scholar]
  • 51. Kruuk LEB, Hadfield JD. 2007. How to separate genetic and environmental causes of similarity between relatives. J Evol Biol 20:1890–1903. doi: 10.1111/j.1420-9101.2007.01377.x [DOI] [PubMed] [Google Scholar]
  • 52. Hayward AD, Pilkington JG, Pemberton JM, Kruuk LEB. 2010. Maternal effects and early-life performance are associated with parasite resistance across life in free-living Soay sheep. Parasitology 137:1261–1273. doi: 10.1017/S0031182010000193 [DOI] [PubMed] [Google Scholar]
  • 53. Albery GF, Kirkpatrick L, Firth JA, Bansal S. 2021. Unifying spatial and social network analysis in disease ecology. J Anim Ecol 90:45–61. doi: 10.1111/1365-2656.13356 [DOI] [PubMed] [Google Scholar]
  • 54. Hadfield JD, Nakagawa S. 2010. General quantitative genetic methods for comparative biology: phylogenies, taxonomies and multi-trait models for continuous and categorical characters. J Evol Biol 23:494–508. doi: 10.1111/j.1420-9101.2009.01915.x [DOI] [PubMed] [Google Scholar]
  • 55. Lozupone C, Hamady M, Knight R. 2006. Unifrac--an online tool for comparing microbial community diversity in a phylogenetic context. BMC Bioinformatics 7:371. doi: 10.1186/1471-2105-7-371 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Zuur AF, Ieno EN, Walker N, Saveliev AA, Smith GM. 2009. Mixed effects models and extensions in Ecology with R. Springer, New York, NY. doi: 10.1007/978-0-387-87458-6 [DOI] [Google Scholar]
  • 57. Harrison XA. 2014. Using observation-level random effects to model overdispersion in count data in ecology and evolution. PeerJ 2:e616. doi: 10.7717/peerj.616 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Harrison XA, Donaldson L, Correa-Cano ME, Evans J, Fisher DN, Goodwin CED, Robinson BS, Hodgson DJ, Inger R. 2018. A brief introduction to mixed effects modelling and multi-model inference in ecology. PeerJ 6:e4794. doi: 10.7717/peerj.4794 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Schielzeth H, Dingemanse NJ, Nakagawa S, Westneat DF, Allegue H, Teplitsky C, Réale D, Dochtermann NA, Garamszegi LZ, Araya‐Ajoy YG, Sutherland C. 2020. Robustness of linear mixed-effects models to violations of distributional assumptions. Methods Ecol Evol 11:1141–1152. doi: 10.1111/2041-210X.13434 [DOI] [Google Scholar]
  • 60. Pedersen AB, Fenton A. 2007. Emphasizing the ecology in parasite community ecology. Trends Ecol Evol 22:133–139. doi: 10.1016/j.tree.2006.11.005 [DOI] [PubMed] [Google Scholar]
  • 61. Graham AL. 2008. Ecological rules governing helminth–microparasite coinfection. Proc Natl Acad Sci U S A 105:566–570. doi: 10.1073/pnas.0707221105 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Rynkiewicz EC, Pedersen AB, Fenton A. 2015. An ecosystem approach to understanding and managing within-host parasite community dynamics. Trends Parasitol 31:212–221. doi: 10.1016/j.pt.2015.02.005 [DOI] [PubMed] [Google Scholar]
  • 63. Avramenko RW, Redman EM, Lewis R, Yazwinski TA, Wasmuth JD, Gilleard JS. 2015. Exploring the gastrointestinal "Nemabiome": deep amplicon sequencing to quantify the species composition of parasitic nematode communities. PLoS One 10:e0143559. doi: 10.1371/journal.pone.0143559 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Aivelo T, Medlar A. 2018. Opportunities and challenges in metabarcoding approaches for helminth community identification in wild mammals. Parasitology 145:608–621. doi: 10.1017/S0031182017000610 [DOI] [PubMed] [Google Scholar]
  • 65. Shutt JD, Nicholls JA, Trivedi UH, Burgess MD, Stone GN, Hadfield JD, Phillimore AB. 2020. Gradients in richness and turnover of a forest passerine’s diet prior to breeding: a mixed model approach applied to faecal metabarcoding data. Mol Ecol 29:1199–1213. doi: 10.1111/mec.15394 [DOI] [PubMed] [Google Scholar]
  • 66. Wang Y, Naumann U, Wright ST, Warton DI. 2012. Mvabund - an R package for model-based analysis of multivariate abundance data. Methods Ecol Evol 3:471–474. doi: 10.1111/j.2041-210X.2012.00190.x [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Reviewer comments
reviewer-comments.pdf (338.9KB, pdf)
Figure S1. msystems.00040-23-s0001.tif.

Relative abundance of phyla and families in example data sets.

DOI: 10.1128/msystems.00040-23.SuF1
Figure S2. msystems.00040-23-s0002.tif.

Proportion variance across multiple levels of initial abundance filtering.

DOI: 10.1128/msystems.00040-23.SuF2
Figure S3. msystems.00040-23-s0003.tif.

Proportion variance for CLR models.

DOI: 10.1128/msystems.00040-23.SuF3
Figure S4. msystems.00040-23-s0004.tif.

Differential abundances across age classes accounting for multiple levels of taxonomy.

DOI: 10.1128/msystems.00040-23.SuF4
Figure S5. msystems.00040-23-s0005.tif.

Excess zero distributions.

DOI: 10.1128/msystems.00040-23.SuF5
Table S1. msystems.00040-23-s0006.xlsx.

MCMCglmm output for Poisson GLMMs.

DOI: 10.1128/msystems.00040-23.SuF6
Table S2. msystems.00040-23-s0007.xlsx.

MCMCglmm output for CLR Gaussian GLMMs.

DOI: 10.1128/msystems.00040-23.SuF7
Table S3. msystems.00040-23-s0008.xlsx.

MCMCglmm output for Poisson

with nested taxonomic structure.

DOI: 10.1128/msystems.00040-23.SuF8
Table S4. msystems.00040-23-s0009.xlsx.

Representation across phyla of significant ASV-level compositional shifts.

DOI: 10.1128/msystems.00040-23.SuF9
Electronic Supplemental Information. msystems.00040-23-s0010.docx.

Additional information regarding methods, model specification, model validation, and alternate model implementations. Supplemental figure and table legends.

DOI: 10.1128/msystems.00040-23.SuF10
OPEN PEER REVIEW. reviewer-comments.pdf.

An accounting of the reviewer comments and feedback.

DOI: 10.1128/msystems.00040-23.SuF11

Data Availability Statement

Data and code for this manuscript are available at GitHub. Sequences and metadata on which analysis is based can be found under number PRJEB39322 on the European Nucleotide Archive.


Articles from mSystems are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES