Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2020 Mar 26;117(15):8539–8545. doi: 10.1073/pnas.1918741117

DNA metabarcoding reveals metacommunity dynamics in a threatened boreal wetland wilderness

Alex Bush a,b,1, Wendy A Monk c, Zacchaeus G Compson a,d, Daniel L Peters e, Teresita M Porter f,g,h, Shadi Shokralla g,h, Michael T G Wright g,h, Mehrdad Hajibabaei g,h,2, Donald J Baird a,2
PMCID: PMC7165428  PMID: 32217735

Significance

Too often, ecological monitoring studies are designed without understanding whether they have sufficient statistical power to detect changes beyond natural variability. The Peace–Athabasca Delta is North America’s largest inland delta, within a World Heritage area, and is currently threatened by human development. Using multispecies occupancy models we show that the wetland macroinvertebrate community is highly diverse, and spatial and temporal turnover are so high that composition is nearly random, emphasizing stochastic processes of assembly. Using DNA metabarcoding, our study detected more taxa, both overall and per sample, than traditional morphology-based sample processing, increasing our power to detect ecosystem change. Improving data quality and quantifying error are key to delivering effective monitoring and understanding the dynamic structure of the metacommunity.

Keywords: occupancy, detectability, taxonomic resolution, stochasticity, power analysis

Abstract

The complexity and natural variability of ecosystems present a challenge for reliable detection of change due to anthropogenic influences. This issue is exacerbated by necessary trade-offs that reduce the quality and resolution of survey data for assessments at large scales. The Peace–Athabasca Delta (PAD) is a large inland wetland complex in northern Alberta, Canada. Despite its geographic isolation, the PAD is threatened by encroachment of oil sands mining in the Athabasca watershed and hydroelectric dams in the Peace watershed. Methods capable of reliably detecting changes in ecosystem health are needed to evaluate and manage risks. Between 2011 and 2016, aquatic macroinvertebrates were sampled across a gradient of wetland flood frequency, applying both microscope-based morphological identification and DNA metabarcoding. By using multispecies occupancy models, we demonstrate that DNA metabarcoding detected a much broader range of taxa and more taxa per sample compared to traditional morphological identification and was essential to identifying significant responses to flood and thermal regimes. We show that family-level occupancy masks high variation among genera and quantify the bias of barcoding primers on the probability of detection in a natural community. Interestingly, patterns of community assembly were nearly random, suggesting a strong role of stochasticity in the dynamics of the metacommunity. This variability seriously compromises effective monitoring at local scales but also reflects resilience to hydrological and thermal variability. Nevertheless, simulations showed the greater efficiency of metabarcoding, particularly at a finer taxonomic resolution, provided the statistical power needed to detect change at the landscape scale.


Tackling the global loss of biodiversity (1) is hindered by a lack of basic biological information needed to guide sustainable management strategies (2). Despite legal protections, freshwater ecosystems are increasingly degraded by multiple stressors (3). In addition, the quality and volume of data collected by monitoring programs often fail to support evidence-based management decisions (46). Here, we demonstrate how DNA metabarcoding can resolve challenges faced by traditional monitoring, alter our perspectives on ecosystem dynamics, and improve our understanding of natural variation and sampling error, supporting evidence-based decision making.

DNA barcoding uses short genetic sequences to identify individual taxa. By contrast, DNA metabarcoding supports simultaneous identification of entire assemblages via high-throughput sequencing (7, 8). Using metabarcoding for ecosystem monitoring provides an opportunity to identify organisms in bulk samples at a high taxonomic resolution consistently and accurately (Biomonitoring 2.0; ref. 9). The accuracy, consistency, and resolution of taxonomic identification remains a constraint for many biomonitoring programs that must trade off data quality to make assessment protocols rapid and cost-effective (10). Aquatic macroinvertebrates exemplify this challenge, as their diversity of forms and functions are sensitive to multiple drivers of ecosystem condition. Thus, ecosystem degradation can be identified based on changes in assemblage composition due to environmental filtering (5). Despite decades of development, the challenges associated with traditional methods of sample processing limit inference of biomonitoring programs to gross status classifications (e.g., ref. 11). Metabarcoding presents an opportunity to describe community composition more accurately and consistently, supporting more effective and informative biomonitoring (12, 13).

The Peace–Athabasca Delta (PAD) in northern Alberta, Canada (Fig. 1 and ref. 14) is North America’s largest inland delta (∼6,000 km2) and is located at the confluence of the Peace and Athabasca Rivers, consisting of hundreds of lakes and wetlands that become connected during flood events, particularly when spring snowmelt leads to ice jams (15, 16). The PAD is a Ramsar wetland, protected within Wood Buffalo National Park, a United Nations Educational, Scientific and Cultural Organization World Heritage site. Nonetheless, there have been concerns that the PAD could be affected by upstream developments, including current and proposed hydroelectric dams on the Peace River, continued expansion of oil sands mining on the Athabasca River to within 30 km of the park boundary, and climate change (17). Assessing how such factors influence the integrity of a natural wilderness is made more challenging by the paucity of biological surveys that have been conducted and the logistics of working in such a remote region. To gain a better understanding of the PAD’s ecology, rapid assessments of aquatic macroinvertebrates have been conducted in since 2011 to establish a baseline understanding of the ecosystem’s diversity and structure (14, 18). Importantly, while surveys have followed established protocols from the Canadian Aquatic Biomonitoring Network (hereafter CABIN) (19), samples were processed using both traditional and DNA metabarcoding approaches, allowing us to test the power of each approach to support environmental management of the delta.

Fig. 1.

Fig. 1.

Location of sampling sites in the PAD. (Inset) The full extent of Wood Buffalo National Park in Alberta (AB), and boundaries of neighboring provinces: British Columbia (BC), Saskatoon (SK), and the Northwest Territories (NWT). Photo taken at Rocher River wetland (PAD 37).

Sampling error is a ubiquitous feature of any ecological survey, irrespective of the methodology, and of particular concern is the frequency of false absences (20). Depending on the covariance of species’ detectability with other environmental characteristics, models can be structurally biased and their confidence overestimated (21). Although imperfect detection is very common, and freshwater biomonitoring protocols have a long history of standardization to maintain comparability (5), there are few examples of research explicitly quantifying the nature of sampling error (e.g., ref. 22). Instead, variability due to sampling error is usually combined with that from natural sources (i.e., as “noise”; ref. 23). An alternative is to specify the likelihood of detection (the observation process model) and simultaneously correct our estimates of species occurrence (the ecological state model) within a single hierarchical framework (24). In this study, we employed multispecies occupancy models (MSOMs; refs. 25 and 26) to account for the effects of imperfect detection on estimates of macroinvertebrate diversity, drawing upon data from 6 y of macroinvertebrate surveys in the PAD. We quantify the efficiency with which the macroinvertebrate community can be surveyed using both traditional morphological identification and DNA metabarcoding and demonstrate that these approaches make a qualitative difference to our view of how the metacommunity is structured, to the efficiency of monitoring, and consequently to our power to detect change (27).

Results

A key difference between our sampling approaches was that the standard CABIN wetland protocol (19) provided estimates of relative abundance based on counts from a subset of each sample, whereas sequences identified using DNA metabarcoding were converted to presence–absence data (13, 28). In addition, CABIN identified 74 families based on morphological features, but metabarcoding could identify 109 families, as well as 263 genera (SI Appendix, Fig. S1.6). As a result, we trained four hierarchical MSOMs for each data type: 1) counts of macroinvertebrate families from CABIN (CABIN Fcount), 2) the presence–absence of macroinvertebrate families from CABIN (CABIN Fpa), 3) the presence–absence of macroinvertebrate families from DNA data (DNA Fpa), and 4) the presence–absence of macroinvertebrate genera from DNA data (DNA Gpa). Although metabarcoding can discriminate among taxa at even finer resolution (i.e., species), given the prevalence was lower than the prevalence of genera and the available sample size, we did not feel the detectability and occupancy of those taxa could be estimated reliably.

Occupancy and Detectability.

The CABIN Fcount model predicted total abundance was dominated by four taxa (two Chironomidae subfamilies, Oligochaeta and Planorbidae) but also suggested that almost all taxa were present everywhere within the PAD (i.e., site occupancy ∼1), with no environmental covariates retained in the final model. This scenario is plausible, but if we apply the predicted probabilities of detection and same survey effort (number of individuals counted), and assume taxa are sampled at random from the pool of individuals, the CABIN Fcount model suggested we should have observed 38 taxa on average instead of 18. Nonrandom aggregation of individuals is typical of ecological communities (29) and may be why the model appeared to be misspecified.

In contrast to the count-based model, the presence–absence models all suggested taxon site occupancy was below 1 (Fig. 2 and SI Appendix, Fig. S1.10), although the “U-shaped” form of the hyperparameters in Fig. 2 A and C was an artifact of the bounded distribution (29). The CABIN Fpa model that estimated the probability of detecting macroinvertebrate families was lower than the DNA Fpa model (Fig. 2B right-skewed relative to Fig. 2 D and E; see also Fig. 3). Models must balance the expected occupancy to fit with the detections, and probability of detection made in each survey, and the CABIN Fpa model therefore also predicted higher occupancy than the DNA Fpa model (Fig. 2A left-skewed relative to Fig. 2C). The differences in occupancy and detectability of specific families were not associated with prevalence, although many taxa were not recorded by both approaches and therefore cannot be compared (red points in Fig. 3; see SI Appendix, Appendix 1 for detail). In addition, detectability using DNA metabarcoding is intrinsically linked to the genetic primer used (30), and the importance of primer bias is well known from mock laboratory samples (e.g., ref. 31). Here we show biases in detectability can be quantified as part of the observation model, either at the community level (Fig. 2 D and E) or for individual taxa (SI Appendix, Fig. S1.11). Finally, neither the CABIN Fcount nor Fpa model retained environmental variables to explain changes in occurrence, whereas both DNA occupancy models did so consistently. The covariates retained were 1) the frequency of spring and summer floods (i.e., connections between the wetland and river), 2) time since the ice melt, and 3) maximum water temperature prior to each survey. Responses to environment at the community level were almost neutral (SI Appendix, Fig. S1.13), and the posterior distribution of coefficients differed from zero for only a minority of taxa (SI Appendix, Fig. S1.14), but their inclusion in the model suggests the high interannual turnover (SI Appendix, Fig. S1.7) may be explained in part by deterministic factors.

Fig. 2.

Fig. 2.

Predicted occupancy (A and C) and detectability (B, D, and E) of taxa based on the presence–absence data collected using the CABIN protocol (A and B) and DNA metabarcoding (CE) at the family level. Detectability using metabarcoding is further split by primer pair (D and E). The shaded polygons describe the probability density of the community hyperparameters, and the gray bars indicate the underlying frequency of the values estimated for each taxon. See SI Appendix, Fig. S1.10 for the CABIN Fcount and DNA Gpa model distributions.

Fig. 3.

Fig. 3.

Comparison of (A) occupancy and (B) detectability estimates in models trained by CABIN data and DNA metabarcode data at the family level (n = 50). Red points indicate taxa not observed by the complementary method, that is, 18 and 59 families were unique to CABIN and metabarcoding, respectively. See SI Appendix, Appendix 1 for further information on the identities of unique taxa.

Alpha, Beta, and Gamma Diversity.

Recognizing that imperfect detection is commonplace in ecological surveys, it follows that regional (gamma diversity; SI Appendix, Fig. S1.8) and local (alpha diversity; SI Appendix, Fig. S1.9) diversity is routinely underestimated. As the CABIN Fcount model effectively assumed alpha and gamma diversity were equal, it estimated that only two families were likely to have gone undetected in the metacommunity. Conversely, the CABIN Fpa model estimated ∼20 families were missed (i.e., γ = 95), a 28% increase on the observed total. Interestingly, this estimate was still short of the richness observed using metabarcoding (n = 109), and based on the distribution of detection probabilities, the DNA Fpa and Gpa models estimated the metacommunity could potentially contain 130 families and 360 genera, a 19% and 37% increase (SI Appendix, Fig. S1.8).

Although imperfect detection always underestimates richness, its effect on the observed compositional dissimilarity between sites (beta diversity) is less predictable. The observed pairwise dissimilarity of samples consistently exceeded 40%, both within and between years, with no consistent increase over time (SI Appendix, Fig. S1.7). Our analysis showed that compositional turnover in the CABIN dataset was overestimated, whereas for the DNA models the corrected and observed dissimilarities were similar (SI Appendix, Fig. S1.15), although temporal turnover (i.e., interannual, within-site dissimilarity) was marginally overestimated by the DNA dataset. This implies that although metabarcoding underestimated alpha diversity at each site, the proportions of the taxa missed that were shared or unique to site pairs were similar. Finally, one predictable aspect of turnover is that as the taxonomic resolution is increased subtaxa are on average less prevalent than their parental ranks (SI Appendix, Fig. S1.12A), typically harder to detect (presumably because they are also less abundant than parental ranks), and therefore dissimilarity among sites at the genus level was 7% higher compared to the family level.

Power Analysis.

The power to detect statistically significant changes depends on the strength of the ecological signal relative to other natural variability, as well as the efficiency with which we can accurately describe ecological state, factors directly related to taxonomic resolution, and detectability (27). We simulated the PAD metacommunity based on a fitted distribution of occupancy and estimated gamma diversity to represent its baseline condition and then took subsamples that reflected the observed biases in each sampling approach. Note that the true state and behavior of the system are unknown, and underlying processes were instead inferred by the MSOMs after quantifying observation biases. Human impacts that might affect the PAD system in the future were also unknown, and this analysis therefore aimed to identify our power to observe a generalized stressor effect. To keep the process model consistent, we based simulations on the most detailed DNA Gpa model and then aggregated taxa to higher ranks to compare power among sampling approaches. A complete description of the simulation and power analysis is provided in SI Appendix, Appendix 2.

A natural consequence of high, near-random, background variation in composition is that degradation of a wetland site would need to be severe to raise concerns. Instead, it is more effective to measure when there is a shift away from our expectation of the PAD metacommunity aggregated across sites (i.e., changes in occupancy of many taxa). Even so, based on the high natural variability of the PAD, the survey effort needed to confidently detect shifts in occupancy in any year would be prohibitive. As a result, we considered a monitoring system to be adequate if significant differences in composition were detected within 2 y (at least 50% of the time; SI Appendix, Fig. S2.4). Our results demonstrated that our power increased 1) as the number of sites sampled increased (but the rate of increase declined beyond 8 to 10 sites per year); 2) with DNA metabarcoding compared to CABIN sampling, and with genus- compared to family-level data; and 3) if we sampled sites multiple times (but gains depended on the number of sites and sampling approach) (Fig. 4). Statistical power also varied by stressor type because metacommunity shifts were readily apparent if the stressor impacted prevalent taxa, whereas changes were challenging to observe if prevalent taxa were also tolerant. The relationship between taxa occupancy and their sensitivity to a stressor was therefore most influential when sample sizes, and hence our power to detect rare taxa, were low (SI Appendix, Fig. S2.6).

Fig. 4.

Fig. 4.

Minimum reduction to community occupancy that is detectable >50% of the time with 95% confidence in response to number of sites surveyed annually. Lines show the average of 100 simulations based on the CABIN Fpa (blue), DNA Fpa (red), and DNA Gpa (green) occupancy-detection models, with either single (open symbol) or triplicate (closed symbol) samples per site. Taxon tolerance was not correlated with occupancy. See SI Appendix, Appendix 2 for further information.

Discussion

The PAD represents one of Canada’s national biodiversity treasures. However, multiple external pressures, including the development of oil sands, hydroelectric power, wildfires, and climate change are potentially affecting biodiversity through modification of natural physical processes in the area and threaten its World Heritage listing (17). Our study demonstrates that the PAD is an immensely rich habitat, including over 25% and 20% of all aquatic macroinvertebrate families and genera recorded by the CABIN national biomonitoring program (32). This total may still underestimate the total diversity present, and we demonstrate the importance sampling errors can have for modeling this community. Communities exhibited near-random patterns of spatial and temporal turnover, a property rarely observed in freshwater systems (33). As a result, impacts on the wetland macroinvertebrate community are difficult to establish at local scales because occurrence is weakly related to environmental factors and site composition can fluctuate rapidly over time (SI Appendix, Fig. S1.7). Properties of the metacommunity must therefore be aggregated across sites, and directional shifts can only be inferred when dissimilarities are unlikely to be explained by stochastic differences in our null baseline model (34). Our analysis shows that detecting a decline in metacommunity condition would depend on both sample size and stressor type, and that further changes to sampling design may be required to detect change earlier or at specific locations of concern.

The most significant finding of this study was the value added to biomonitoring data generated by DNA metabarcoding of bulk community samples. Our analysis supports previous studies that have shown the breadth and resolution of taxonomic information achievable with metabarcoding (e.g., refs. 14 and 35). Clear differences in occupancy and detectability profiles with metabarcoding (Fig. 3) influence our description of baseline reference conditions (36, 37). Further differences in estimates of occupancy with increasing taxonomic resolution (SI Appendix, Fig. S1.12) may also indicate differential environmental responses (38, 39). We did not find evidence to suggest the count data (“relative abundance”) in CABIN samples were necessary to detect changes in ecological structure. In fact, only the presence–absence DNA metabarcoding models identified significant relationships with the major environmental covariates of this region (16). These effects could be estimated precisely because detectability, and thereby sampling efficiency, was higher for so many taxa using DNA metabarcoding (13). We also used the occupancy model framework to compare detectability of each taxon with different primers, a more robust measure of their complementarity than lists of taxa observed. Quantifying detectability is vital to making the results of this study comparable to others with varying protocols, and this approach could be used to refine and select complementary primers (28, 31). Crucially, DNA metabarcoding, particularly at the genus level, substantially improved our power to detect ecosystem-scale changes compared to traditional CABIN sampling (Fig. 4 and SI Appendix, Appendix 2). Extending this approach to the species level could improve overall power further still, as long as a sufficient number of species have a similar probability of detection as their parent genera.

A second significant outcome was the importance of explicitly considering imperfect detection. Practitioners are well aware of sampling differences (e.g., ref. 40) but have typically focused on how those errors propagated to aggregated metrics, rather than explicitly quantifying the sources of uncertainty (23). Hierarchical occupancy models accommodate irregularly sampled data, estimate community properties (that extend inference to rare taxa), and allow straightforward biological interpretations of those parameter estimates (41). There have been few examples of hierarchical models accounting for detectability in freshwater ecology (e.g., refs. 42 and 43), despite studies showing it can bias our interpretation of taxonomic, functional, and phylogenetic diversity at the community level (e.g., ref. 44). Given the high prevalence of false absences it is not surprising occupancy models are becoming commonplace for analyzing eDNA data (45), although it appears multispecies models are still rare (46). Importantly, what these and other studies have shown is that the time and expense of adding replicate samples may be the most efficient way to improve the statistical power of a study (24, 47). Inferences about the number of taxa missed in the metacommunity naturally carry some uncertainty (48), but by acknowledging imperfect detection, risks can be quantified, and decision makers’ overall efficiency can be improved. This analysis minimizes the likelihood of management agencies responding to a false signal of degradation (type 1 error) and identifies how to optimize survey design to ensure we have the necessary power to detect a desired degree of change (type 2 error) (27, 49).

Although our analysis provided evidence of environmental filtering, the distribution of beta diversity was equivalent to that expected from random assembly, suggesting that the metacommunity was operating in a quasi-neutral manner at the scale of our analysis (50). Quasi-neutral dynamics are expected to be commonly observed in taxon-rich communities, but given the high degree of landscape connectivity, we would expect mass effects, rather than dispersal limitation, to underlie the low habitat specificity of the community. Our dataset was insufficient to identify which mechanisms underlie metacommunity assembly, because the same patterns of turnover may be the result of different assembly processes (51). While further studies could reduce this uncertainty, currently models of coexistence that combine stochasticity with niche theory may be the most suitable option to explain the structure and dynamics of aquatic invertebrate communities in the PAD, without relying on the fragile premise of ecological equivalence in neutral theory (50, 52). Although many ecologists have acknowledged stochastic processes are likely to have a role in understanding community composition (53, 54), we are unaware of any biomonitoring programs that incorporate, or even acknowledge, community assembly mechanisms other than environmental filtering (e.g., ref. 37). Our results firmly challenge that traditional perspective, and if we wish to understand the resilience of the PAD, we must adopt a metacommunity perspective (55). More broadly, a metacommunity perspective of the PAD could indicate which assembly processes are absent from more managed landscapes, therefore providing critical insights into the mitigation of biodiversity loss at the landscape scale.

The isolation of wilderness areas like the PAD implies a pristine nature, but that isolation has also hindered our appreciation of the sheer magnitude of diversity which occurs there and has until now precluded a basic description of how community structure changes over space and time. Near-random patterns of assembly and substantial sampling error pose a challenge to detecting ecosystem change. Without evaluating data quality and statistical power at the start, many monitoring programs are unable to confidently reject a false null hypothesis, undermining project goals and providing a misleading sense of achievement (27, 56). Despite the high turnover, we show the statistical power of data generated by DNA metabarcoding was superior to traditional biomonitoring approaches for the detection of large-scale ecosystem change. Although macroinvertebrate composition provides a wealth of information, the power to detect and draw inference from taxonomic changes will be improved by further refining the list of taxa that respond to particular threats (e.g., oil sands contaminants), particularly by linking metabarcoding to trait databases (57), and this remains a major focus of our ongoing research.

Materials and Methods

Field Surveys.

Field survey methods followed the CABIN wetland macroinvertebrate protocol (14, 19). Briefly, aquatic invertebrates were sampled by sweeping submerged and emergent aquatic vegetation at wetland edges for 2 min. A sterile 400-μm-mesh net was steadily moved in a zig-zag pattern, from the surface of the sediment to the water surface, to capture disturbed organisms and minimize the amount of sediment collected. Excess vegetation was carefully rinsed and removed, and samples with excess sediment were sieved. Material was placed in sterile 1-L polyethylene sample jars, filled no more than half full, and immediately preserved in 95% ethanol in the field. Samples were stored in a cooler with ice in the field and transferred to a freezer at the field station before shipment. Nets were disinfected between each new site, and field crews wore nitrile gloves to collect and handle samples, minimizing the risk of cross-site contamination.

Sample Processing.

In total, 126 and 138 samples were collected from 72 separate site visits for the CABIN and DNA metabarcoding datasets, respectively (SI Appendix, Table S1.2). Samples identified using morphological characteristics were processed and identified in accordance with the CABIN laboratory manual (19). Briefly, material from each 2-min sweep was subsampled using a 100-cell Marchant box. Successive cells were processed until at least 300 individuals were identified and a minimum of five cells were processed. Most taxa were identified to the family level, although for some groups only class- or order-level identification was recovered, and given the importance and diversity of Chironomidae, we retained four subfamily divisions that could be reliably identified (SI Appendix, Appendix 1) (58).

The laboratory protocol for processing samples for DNA metabarcoding followed the same procedure as outlined in Gibson et al. (14). This targeted the CO1 amplicon using two complementary primers, BE/BR5 and F230R (30, 59). All DNA samples were analyzed using BE or BR5 that target the same COI region, and F230R was introduced in 2012. While field and laboratory protocols have remained consistent since the study began, there have been a number of advances made in bioinformatic tools, as well as expansion of the reference sequence libraries supporting the identification of taxa (60). The bioinformatic pipeline used to process all samples in this study, as well as the CO1 classifier that allocates sequences to the most likely taxa, is described in SI Appendix, Appendix 1 and available on GitHub (61). The sequences generated have been deposited in the NCBI Sequence Read Archive, project PRJNA603969.

Hierarchical MSOM.

MSOMs employ a flexible hierarchical framework that allows for imperfect detection to predict species’ occurrence (25). The hierarchy consists of an underlying state model that describes the probability of species’ occurrence and a second observation model to describe the probability of detecting that species when it is present (informed by detection across replicates). The fitted state model is thereby updated to account for the probability of false negatives. MSOMs extend this single-species approach by assuming species’ coefficients are related and can be treated as random effects, drawn from a common distribution (hyperparameters). Data augmentation extends the community approach a step further by using the hyperparameters for occupancy and detectability to estimate the possibility additional taxa may have been present but by chance were never observed. Our analysis adapted the notation and code provided by ref. 41 as the basis for this study (see SI Appendix, Appendix 1 for model code):

  • 1.

    Data augmentation process: wkBernoulli(Ω)

  • 2.

    State process: zikBernoulli(wkψk)

  • 3.

    Observation process: yijk|zikBernoulli(zikpijk)

  • 4.
    Models of taxon heterogeneity:
    logit(ψk)lpsik+betalpsik×covariatei+logit(pijk)lpk+betalpk×covariateij+

    Given:

    lpsikNormal(μlpsi,σlpsi2)betalpsikNormal(μbetalpsi,σbetalpsi2)lpkNormal(μlp,σlp2)betalpkNormal(μbetalp,σbetalp2).

The observed data yijk describe the detection or nondetection of taxon k at site i in replicate sample j. Replicate observations, in our case simultaneous independent samples (21), allowed the model to discriminate between processes that determine the system’s state (occupancy) and the observation process (detectability). The occupancy of each taxon at each site zik is described by a Bernoulli trial with probability ψik, and the likelihood of detecting the respective taxa in each replicate sample is described by another set of Bernoulli processes with probability pijk. Seven water temperature and flood regime variables were tested as covariates within a multiple logistic regression for occupancy, and measures of sample processing effort were tested for detectability (sequencing depth and the number of individuals identified). Individual intercepts and slopes represented species-specific random effects, governed by a common prior distribution whose mean and variance were estimated as a community-level hyperparameter.

The statistical distributions of the parameters governing occupancy and detectability shared by the community were used to consider the possibility of other taxa in the metacommunity that were not recorded in any visit to any site, a process known as data augmentation (48, 62). Given a sufficiently large total pool of M potential taxa, a set of binary indicators, wk, governed by the parameter Ω, represent the probability each taxon is part of the community. The total number of taxa in the metacommunity (γ diversity) is therefore simply the sum of wk.

The occupancy model above was suitable for presence/absence observations of taxa, but CABIN samples also included information on the relative abundances of taxa. To utilize all information available, we constructed a community-level N-mixture model that estimates the latent abundance Nik of each taxon, rather than their occurrence (zik), and modeled counts as a function of a Poisson distribution:

  • 2.

    State process: NikPoisson(wkλk)

  • 3.

    Observation process: yijk|NikBinomial(Nik,pijk)

  • 4.

    Models of taxon heterogeneity: log(ψk)lpsik+betalpsik×covariatei+.

Finally, model selection for covariates of taxon heterogeneity in both the occupancy and N-mixture models was determined by a set of binary indicator variables Vx1-xn, one for each of the n predictor variables used (63). Using Vx ∼ Bernoulli(0.5) as standard priors, variables had an equal likelihood of being included or excluded from likelihood estimates, and model selection was therefore based on which combination had the highest joint posterior probability p(Vx1-xn = 1). Note convergence of the Vx indicators was very slow, particularly in the most complex models, and a “slab and spike” approach did not improve mixing (see 7.6.2 in ref. 41).

Analyses were conducted using the R package jagsUI (64). We assessed model convergence of all monitored parameters across chains by visual inspection of trace plots and by using the Gelman–Rubin statistic (65), with the diagnostic value <1.1. As overdispersion cannot be estimated from the binary responses in occupancy models (41), plots of Dunn–Smyth residuals for fitted estimates of occupancy and detectability were used to evaluate the fit of separate taxa (66). Although plots suggest the models were well fit in most cases, the pattern of residuals suggested there may have been other covariates, or nonlinear effects, missing from the models influencing the occupancy of some taxa.

Simulation and Power Analysis.

The code and process used to simulate communities are described in detail in SI Appendix, Appendix 2. In summary, a hypothetical presence–absence matrix of the metacommunity was derived from estimates of gamma diversity and occupancy in the DNA Gpa occupancy model from which we could manipulate sampling designs. Environmental covariates were varied according to the mean and SD of values observed from the surveys available to us (SI Appendix, Fig. S2.1), but the simulation was not spatially explicit. While occupancy covariates drove some temporal turnover (SI Appendix, Fig, S2.2; ∼10 tp 27%), this was insufficient to replicate the turnover observed (SI Appendix, Fig. S2.3), so permutation of the presence–absence matrix was used to simulate further stochastic changes in composition (i.e., local extinction/colonization; ref. 67). Replicating observed turnover required the complete redistribution of occurrences (i.e., random assembly patterns). Taxon occupancy (row sums) and site richness (column sums) were held constant during permutation. The metacommunity was modified by successively removing occurrences of taxa based on a hypothetical distribution of tolerances, which were themselves generated to covary with the distribution of occupancy. Sampling error was applied by a binomial function weighted by the taxon’s probability of detection, and the “detected” composition of reference and modified metacommunities were then compared using mvabund (68). Power of DNA Gpa was compared to DNA Fpa and CABIN Fpa approaches by aggregating genera to the family level and subsequently applying the family-level detection probabilities.

Data Availability.

All datasets needed to evaluate our conclusions are publicly available as referenced within the article and described in SI Appendix.

Supplementary Material

Supplementary File

Acknowledgments

We acknowledge support for this work from a Large-Scale Applied Research Project award from Genome Canada to M.H. and D.J.B. D.J.B. also received support from the Natural Sciences and Engineering Research Council of Canada Discovery Grants Program and through Environment and Climate Change Canada program funds, including the Genomics Research & Development Initiative that supported both A.B. and T.M.P. through the Ecobiomics Project. Access to sites was supported by the Canada-Alberta Oil Sands Monitoring Program. This work was funded in part under the Oil Sands Monitoring Program and is a contribution to the Program but does not necessarily reflect the position of the Program. Field support was provided by Parks Canada (Jeff Shatford, Queenie Gray, Jason Straka, Sharon Irwin, and air-boat pilots Ronnie and David Campbell) and Environment and Climate Change Canada’s Watershed Hydrology and Ecology Research Division (Kristie Heard, Colin Curry, Daryl Halliwell, Tom Carter, Adam Bliss, Adam Martens, Cath Choung, and Steff Connor).

Footnotes

The authors declare no competing interest.

This article is a PNAS Direct Submission.

Data deposition: Data have been deposited in the NCBI Sequence Read Archive, https://www.ncbi.nlm.nih.gov/sra (accession ID PRJNA603969).

This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1918741117/-/DCSupplemental.

References

  • 1.IPBES , Global assessment report on biodiversity and ecosystem services of the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services, Brondizio E. S., Settele J., Díaz S., Ngo H. T., Eds. (IPBES Secretariat, Bonn, Germany, 2019). [Google Scholar]
  • 2.Pereira H. M., et al. , Ecology. Essential biodiversity variables. Science 339, 277–278 (2013). [DOI] [PubMed] [Google Scholar]
  • 3.Ormerod S. J., Dobson M., Hildrew A. G., Townsend C. R., Multiple stressors in freshwater ecosystems. Freshw. Biol. 55 (suppl. 1), 1–4 (2010). [Google Scholar]
  • 4.Gray C., et al. , FORUM: Ecological networks: The missing links in biomonitoring science. J. Appl. Ecol. 51, 1444–1449 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Friberg N., et al. , “Biomonitoring of human impacts in freshwater ecosystems: The Good, the Bad and the Ugly” in Advances in Ecological Research, Guy W., Ed. (Academic Press, 2011), vol. 44, pp. 1–68. [Google Scholar]
  • 6.Bailey R. C., Linke S., Yates A. G., Bioassessment of freshwater ecosystems using the reference condition approach: Comparing established and new methods with common data sets. Freshw. Sci. 33, 1204–1211 (2014). [Google Scholar]
  • 7.Taberlet P., Coissac E., Hajibabaei M., Rieseberg L. H., Environmental DNA. Mol. Ecol. 21, 1789–1793 (2012). [DOI] [PubMed] [Google Scholar]
  • 8.Yu D. W., et al. , Biodiversity soup: Metabarcoding of arthropods for rapid biodiversity assessment and biomonitoring. Methods Ecol. Evol. 3, 613–623 (2012). [Google Scholar]
  • 9.Baird D. J., Hajibabaei M., Biomonitoring 2.0: A new paradigm in ecosystem assessment made possible by next-generation DNA sequencing. Mol. Ecol. 21, 2039–2044 (2012). [DOI] [PubMed] [Google Scholar]
  • 10.Jones F. C., Taxonomic sufficiency: The influence of taxonomic resolution on freshwater bioassessments using benthic macroinvertebrates. Environ. Rev. 16, 45–69 (2008). [Google Scholar]
  • 11.Strachan S. A., Reynoldson T. B., Performance of the standard CABIN method: Comparison of BEAST models and error rates to detect simulated degradation from multiple data sets. Freshw. Sci. 33, 1225–1237 (2014). [Google Scholar]
  • 12.Chariton A. A., et al. , Emergent technologies and analytical approaches for understanding the effects of multiple stressors in aquatic environments. Mar. Freshw. Res. 67, 414–428 (2015). [Google Scholar]
  • 13.Bush A., et al. , Studying ecosystems with DNA metabarcoding: Lessons from biomonitoring of aquatic macroinvertebrates. Front. Ecol. Evol. 7, 1–12 (2019). [Google Scholar]
  • 14.Gibson J. F., et al. , Large-scale biomonitoring of remote and threatened ecosystems via high-throughput sequencing. PLoS One 10, e0138432 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Peters D. L., Prowse T. D., Pietroniro A., Leconte R., Flood hydrology of the Peace-Athabasca Delta, northern Canada. Hydrol. Processes 20, 4073–4096 (2006). [Google Scholar]
  • 16.Peters D. L., Caissie D., Monk W. A., Rood S. B., St-Hilaire A., An ecological perspective on floods in Canada. Canadian Water Resour. J. Revue 41, 288–306 (2016). [Google Scholar]
  • 17.WBNP , Action plan to protect the World Heritage values of Wood Buffalo National Park (Parks Canada, Fort Smith, NT, Canada, 2019). [Google Scholar]
  • 18.Hajibabaei M., Baird D. J., Fahner N. A., Beiko R., Golding G. B., A new way to contemplate Darwin’s tangled bank: How DNA barcodes are reconnecting biodiversity science and biomonitoring. Philos. Trans. R. Soc. London B Biol. Sci. 371, 20150330 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.ECCC , CABIN Wetland Macroinvertebrate Protocol (Environment and Climate Change Canada, Gatineau, QC, Canada, 2018). [Google Scholar]
  • 20.Ugland K. I., Gray J. S., Ellingsen K. E., The species–accumulation curve and estimation of species richness. J. Anim. Ecol. 72, 888–897 (2003). [Google Scholar]
  • 21.Guillera-Arroita G., Modelling of species distributions, range dynamics and communities under imperfect detection: Advances, challenges and opportunities. Ecography 40, 281–295 (2017). [Google Scholar]
  • 22.Cao Y., Hawkins C. P., Larsen D. P., Van Sickle J., Effects of sample standardization on mean species detectabilities and estimates of relative differences in species richness among assemblages. Am. Nat. 170, 381–395 (2007). [DOI] [PubMed] [Google Scholar]
  • 23.Clarke R., Uncertainty in WFD assessments for rivers based on macroinvertebrates and RIVPACS (Integrated Catchment Science Programme Science Report SC060044/SR4, Environment Agency, Bristol, UK, 2009). [Google Scholar]
  • 24.Mackenzie D. L., Royle J. A., Designing occupancy studies: General advice and allocating survey effort. J. Appl. Ecol. 42, 1105–1114 (2005). [Google Scholar]
  • 25.Dorazio R. M., Royle J. A., Estimating size and composition of biological communities by modeling the occurrence of species. J. Am. Stat. Assoc. 100, 389–398 (2005). [Google Scholar]
  • 26.Dorazio R. M., Royle J. A., Söderström B., Glimskär A., Estimating species richness and accumulation by modeling species occurrence and detectability. Ecology 87, 842–854 (2006). [DOI] [PubMed] [Google Scholar]
  • 27.Legg C. J., Nagy L., Why most conservation monitoring is, but need not be, a waste of time. J. Environ. Manage. 78, 194–199 (2006). [DOI] [PubMed] [Google Scholar]
  • 28.Braukmann T. W. A., et al. , Metabarcoding a diverse arthropod mock community. Mol. Ecol. Resour. 19, 711–727 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.McGill B. J., Towards a unification of unified theories of biodiversity. Ecol. Lett. 13, 627–642 (2010). [DOI] [PubMed] [Google Scholar]
  • 30.Gibson J., et al. , Simultaneous assessment of the macrobiome and microbiome in a bulk sample of tropical arthropods through DNA metasystematics. Proc. Natl. Acad. Sci. U.S.A. 111, 8007–8012 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Elbrecht V., Leese F., Validation and development of COI metabarcoding primers for freshwater macroinvertebrate Bioassessment. Front. Environ. Sci. 5, (2017). [Google Scholar]
  • 32.Curry C. J., Gibson J. F., Shokralla S., Hajibabaei M., Baird D. J., Identifying North American freshwater invertebrates using DNA barcodes: Are existing COI sequence libraries fit for purpose? Freshw. Sci. 37, 178–189 (2018). [Google Scholar]
  • 33.Gregoire Taillefer A., Wheeler T. A., Tracking wetland community evolution using Diptera taxonomic, functional and phylogenetic structure. Insect Conserv. Divers. 11, 276–293 (2018). [Google Scholar]
  • 34.Hillebrand H., et al. , Biodiversity change is uncoupled from species richness trends: Consequences for conservation and monitoring. J. Appl. Ecol. 55, 169–184 (2018). [Google Scholar]
  • 35.Leese F., et al. , “Chapter two - Why we need sustainable networks Bridging countries, disciplines, cultures and generations for aquatic biomonitoring 2.0: A perspective derived from the DNAqua-net cost action” in Advances in Ecological Research, Bohan D. A., Dumbrell A. J., Woodward G., Jackson M., Eds. (Academic Press, 2018), vol. 58, pp. 63–99. [Google Scholar]
  • 36.Schmidt-Kloiber A., Nijboer R. C., The effect of taxonomic resolution on the assessment of ecological water quality classes. Hydrobiologia 516, 269–283 (2004). [Google Scholar]
  • 37.Hawkins C. P., Norris R. H., Hogue J. N., Feminella J. W., Development and Evaluation of predictive models for measuring the biological integrity of streams. Ecol. Appl. 10, 1456–1477 (2000). [Google Scholar]
  • 38.Macher J. N., et al. , Multiple-stressor effects on stream invertebrates: DNA barcoding reveals contrasting responses of cryptic mayfly species. Ecol. Indic. 61, 159–169 (2016). [Google Scholar]
  • 39.Beermann A. J., Zizka V. M. A., Elbrecht V., Baranov V., Leese F., DNA metabarcoding reveals the complex and hidden responses of chironomids to multiple stressors. Environ. Sci. Eur. 30, 26 (2018). [Google Scholar]
  • 40.Ramos-Merchante A., Prenda J., Macroinvertebrate taxa richness uncertainty and kick sampling in the establishment of Mediterranean rivers ecological status. Ecol. Indic. 72, 1–12 (2017). [Google Scholar]
  • 41.Kéry M., Royle A. J., Applied Hierarchical Modeling in Ecology: Analysis of Distribution, Abundance and Species Richness in R and BUGS: Volume 1: Prelude and Static Models (Academic Press, 2015). [Google Scholar]
  • 42.Reid S. M., Haxton T. J., Backpack electrofishing effort and imperfect detection: Influence on riverine fish inventories and monitoring. Applied Icthyology 33, 1083–1091 (2017). [Google Scholar]
  • 43.Wedderburn S. D., Multi-species monitoring of rare wetland fishes should account for imperfect detection of sampling devices. Wetlands Ecol. Manage. 26, 1107–1120 (2018). [Google Scholar]
  • 44.Si X., et al. , The importance of accounting for imperfect detection when estimating functional and phylogenetic community structure. Ecology 99, 2103–2112 (2018). [DOI] [PubMed] [Google Scholar]
  • 45.Ficetola G. F., et al. , Replication levels, false presences and the estimation of the presence/absence from eDNA metabarcoding data. Mol. Ecol. Resour. 15, 543–556 (2015). [DOI] [PubMed] [Google Scholar]
  • 46.Doi H., et al. , Evaluation of detection probabilities at the water-filtering and initial PCR steps in environmental DNA metabarcoding using a multispecies site occupancy model. Sci. Rep. 9, 3581 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Guillera-Arroita G., Lahoz-Monfort J. J., Designing studies to detect differences in species occupancy: Power analysis under imperfect detection. Methods Ecol. Evol. 3, 860–869 (2012). [Google Scholar]
  • 48.Guillera-Arroita G., Kéry M., Lahoz-Monfort J. J., Inferring species richness using multispecies occupancy modeling: Estimation performance and interpretation. Ecol. Evol. 9, 780–792 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Keizer-Vlek H. E., Verdonschot P. F. M., Verdonschot R. C. M., Goedhart P. W., Quantifying spatial and temporal variability of macroinvertebrate metrics. Ecol. Indic. 23, 384–393 (2012). [Google Scholar]
  • 50.Souza A. F., Bezerra A. D., Longhi S. J., Quasi-neutral community assembly: Evidence from niche overlap, phylogenetic, and trait distribution analyses of a subtropical forest in South America. Perspect. Plant Ecol. Evol. Syst. 23, 1–10 (2016). [Google Scholar]
  • 51.Leibold M., Chase J., Metacommunity Ecology (Princeton University Press, 2018). [Google Scholar]
  • 52.O’Sullivan J. D., Knell R. J., Rossberg A. G., Metacommunity-scale biodiversity regulation and the self-organised emergence of macroecological patterns. Ecol. Lett. 22, 1428–1438 (2019). [DOI] [PubMed] [Google Scholar]
  • 53.Heino J., A macroecological perspective of diversity patterns in the freshwater realm. Freshw. Biol. 56, 1703–1722 (2011). [Google Scholar]
  • 54.Thompson R., Townsend C., A truce with neutral theory: Local deterministic factors, species traits and dispersal limitation together determine patterns of diversity in stream invertebrates. J. Anim. Ecol. 75, 476–484 (2006). [DOI] [PubMed] [Google Scholar]
  • 55.Adler P. B., et al. , Evidence for a general species-time-area relationship. Ecology 86, 2032–2039 (2005). [Google Scholar]
  • 56.Lindenmayer D. B., Likens G. E., The science and application of ecological monitoring. Biol. Conserv. 143, 1317–1328 (2010). [Google Scholar]
  • 57.Compson Z. G., et al. , “Linking DNA metabarcoding and text mining to create network-based biomonitoring tools: A case study on Boreal wetland macroinvertebrate communities” in Advances in Ecological Research (Academic Press, 2018), vol. 59, pp. 33–74. [Google Scholar]
  • 58.Environment and Climate Change Canada, Benthic invertebrates deltaic ecosystem health macroinvertebrates http://donnees.ec.gc.ca/data/substances/monitor/benthic-invertebrates-oil-sands-region/deltaic-ecosystem-health-oil-sands-region/. Accessed 1 January 2019.
  • 59.Folmer O., Black M., Hoeh W., Lutz R., Vrijenhoek R., DNA primers for amplification of mitochondrial cytochrome c oxidase subunit I from diverse metazoan invertebrates. Mol. Mar. Biol. Biotechnol. 3, 294–299 (1994). [PubMed] [Google Scholar]
  • 60.Porter T. M., Hajibabaei M., Over 2.5 million COI sequences in GenBank and growing. PLoS One 13, e0200177 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Porter T. M., Hajibabaei M., Automated high throughput animal CO1 metabarcode classification. Sci. Rep. 8, 4226 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Royle A. J., Dorazio R. M., Link W. A., Analysis of multinomial models with unknown index using data augmentation. J. Comput. Graph. Stat. 16, 67–85 (2007). [Google Scholar]
  • 63.Kuo L., Mallick B., Variable selection for regression models. Sankhya. Indian J. Stat. Ser. B 60, 65–81 (1998). [Google Scholar]
  • 64.Kellner K., jagsUI: A wrapper around ‘rjags’ to streamline ‘JAGS’ analyses. R package Version 1.4.9). https://cran.r-project.org/web/packages/jagsUI/index.html. Accessed 1 January 2019.
  • 65.Gelman A., Rubin D. B., Inference from iterative simulation using multiple sequences. Stat. Sci. 7, 457–511 (1992). [Google Scholar]
  • 66.Warton D. I., Stoklosa J., Guillera-Arroita G., MacKenzie D. I., Welsh A. H., Graphical diagnostics for occupancy models with imperfect detection. Methods Ecol. Evol. 8, 408–419 (2017). [Google Scholar]
  • 67.Gibert C., Escarguel G., PER-SIMPER—A new tool for inferring community assembly processes from taxon occurrences. Glob. Ecol. Biogeogr. 28, 374–385 (2019). [Google Scholar]
  • 68.Wang Y., Naumann U., Eddelbuettel D., Wilshire J., Warton D.. (2019) mvabund: Statistical methods for analysing multivariate abundance data. R package Version 4.0.1. https://cran.r-project.org/web/packages/mvabund/index.html. Accessed 1 January 2019.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File

Data Availability Statement

All datasets needed to evaluate our conclusions are publicly available as referenced within the article and described in SI Appendix.


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES