Skip to main content
PLOS One logoLink to PLOS One
. 2021 Sep 21;16(9):e0257510. doi: 10.1371/journal.pone.0257510

Environmental DNA gives comparable results to morphology-based indices of macroinvertebrates in a large-scale ecological assessment

Jeanine Brantschen 1,2,*, Rosetta C Blackman 1,2,3, Jean-Claude Walser 4, Florian Altermatt 1,2,3,*
Editor: Hideyuki Doi5
PMCID: PMC8454941  PMID: 34547039

Abstract

Anthropogenic activities are changing the state of ecosystems worldwide, affecting community composition and often resulting in loss of biodiversity. Rivers are among the most impacted ecosystems. Recording their current state with regular biomonitoring is important to assess the future trajectory of biodiversity. Traditional monitoring methods for ecological assessments are costly and time-intensive. Here, we compared monitoring of macroinvertebrates based on environmental DNA (eDNA) sampling with monitoring based on traditional kick-net sampling to assess biodiversity patterns at 92 river sites covering all major Swiss river catchments. From the kick-net community data, a biotic index (IBCH) based on 145 indicator taxa had been established. The index was matched by the taxonomically annotated eDNA data by using a machine learning approach. Our comparison of diversity patterns only uses the zero-radius Operational Taxonomic Units assigned to the indicator taxa. Overall, we found a strong congruence between both methods for the assessment of the total indicator community composition (gamma diversity). However, when assessing biodiversity at the site level (alpha diversity), the methods were less consistent and gave complementary data on composition. Specifically, environmental DNA retrieved significantly fewer indicator taxa per site than the kick-net approach. Importantly, however, the subsequent ecological classification of rivers based on the detected indicators resulted in similar biotic index scores for the kick-net and the eDNA data that was classified using a random forest approach. The majority of the predictions (72%) from the random forest classification resulted in the same river status categories as the kick-net approach. Thus, environmental DNA validly detected indicator communities and, combined with machine learning, provided reliable classifications of the ecological state of rivers. Overall, while environmental DNA gives complementary data on the macroinvertebrate community composition compared to the kick-net approach, the subsequently calculated indices for the ecological classification of river sites are nevertheless directly comparable and consistent.

Introduction

Human activities change natural habitats and thereby inherently affect biodiversity [1]. Freshwater ecosystems are among the most affected and are facing steep biodiversity declines due to anthropogenic pressures [2]. To quantify changes in diversity and community structure of natural communities, monitoring of species is essential [3]. Over decades, ecologists have built an understanding of the responsiveness of certain taxonomic groups to pressures based on assemblages of communities and changes therein [4]. The ecological knowledge of species allows for the interpretation biodiversity pattern to assess environmental pressures and impacts on ecosystems. Routine biomonitoring often classifies ecosystem states through biotic indices that succinctly summarize the information of species assemblages, allowing for comparison to reference states or systems [5,6].

The classification of ecological integrity based on biotic indices is generally focused on certain taxonomic groups. In freshwater ecosystems, the most commonly used are fish, macroinvertebrates, macrophytes, and diatoms [7,8]. For these groups, monitoring generally involves the capture, preservation, and morphological identification of specimens in the field or laboratory. This type of monitoring can be costly in terms of time and money, requiring expert taxonomic skills and often missing rare, small, or elusive species [9].

In the last decade, molecular approaches have proven effectivity for the for the assessment of distributions of individual species (species-specific approach) and community assemblages (metabarcoding approaches). Thereby, DNA extracted from an environmental sample (so-called environmental DNA, eDNA) provides information about the possible occurrence and distribution of species [1013]. Monitoring based on eDNA sampling has been explored for different indicator taxa, including fish (e.g., [14,15]); macroinvertebrates (e.g., [1618]); macrophytes (e.g., [19,20]) and diatoms (e.g., [21,22]). In several instances, eDNA sampling was shown to complement traditional approaches for the assessment of biological indicators and the ecological state of ecosystems [2325]. Most comparisons of monitoring indicator taxa have strongly focused on reproducing diversity patterns observed by traditional approaches and have paid less attention to possible challenges and opportunities of eDNA-based metabarcoding [26,27].

As eDNA and kick-net sample fundamentally different units (DNA vs. specimen) [28], the processing of samples and the interpretation of the detection cannot be compared one-to-one. The community recovered from an eDNA sample strongly depends on hydrological conditions of the body of water (e.g., transport of DNA) and choices in the downstream processing (e.g., the barcoding regions, primer choice). Contrastingly, traditional monitoring employs morphologically identifiable indicator organisms, thereby often using a subset of species as a para- or even polyphyletic group (e.g., [29]). For example, the widely used organisms belonging to the “macroinvertebrates” are defined by their size and function, and not by their phylogenetic unity. Targeting the genetic material of these phylogenetically dispersed organisms therefore aims for the coverage of a large portion of metazoans. It is well known that eDNA-based metabarcoding using generic primers (e.g., targeting Cytochrome Oxidase I) often result in limited taxonomic resolution and extensive amplification of non-target groups [30] such as rotifers or other small eukaryotes [31].

Technical biases of the metabarcoding data (e.g., PCR bias, sequencing errors) restrict the comparability to traditional count data [32,33] and limit the implementation in frameworks of existing biotic indices. Novel approaches to fully exploit and interpret molecular data are thus in demand [34,35]. A promising approach in the era of big data, machine learning has emerged for the analysis of high-dimensional and complex data [36] and recent studies have demonstrated the use of these approaches, such as random forest model [37], also in an ecological context (e.g., [3841]). A fundamental difference is the unit of OTUs vs. species used to calculate biotic indices, as not all OTUs are assigned to species. Taxonomy-free approaches accounting for the genetic diversity recovered in sequencing data can inform about features of communities outside of the classical indicator species concept. Such approaches may overcome the limitations of trying to match traditional and novel monitoring methods in a one-to-one manner and help to explore the opportunities, but also differences, between both approaches.

Here, we used eDNA metabarcoding for the detection of macroinvertebrate indicator taxa in a large-scale ecological assessment and addressed to which degree diversity pattern and the respective ecological index shows convergence. Kick-net and eDNA samples collected within the biomonitoring program for Swiss surface waters were used i) to compare community richness and composition estimates based on morphological identification versus eDNA metabarcoding data using the same indicator taxa, and ii) to evaluate how a supervised machine learning approach can be used for the prediction of the ecological state of rivers when also incorporating data on taxonomic groups not considered by the traditional approaches. We specifically focus on highly replicated and representative samplings based on a nationwide monitoring scheme run by the Swiss Federal Agency for the Environment, in order to cover all major river systems in Switzerland and to directly provide stakeholder-relevant conclusions.

Methods

Sample collection

The Swiss Federal Office for the Environment (FOEN) carries out routine monitoring of freshwater quality in Switzerland ("Nationale Beobachtung Obergewässerqualität", hereafter: NAWA). The goal of NAWA is to gather long-term reference data of the ecological state of riverine systems. Approximately 100 sites distributed over the major catchments in Switzerland are monitored regularly. The sampling scheme involves physicochemical parameters and the assessment of biological indicators (fish, macroinvertebrates, macrophytes, and diatoms), whereby macroinvertebrates are sampled using a standardized kick-net approach with subsequent morphological identification of species (see also [42]). In 2019, along with standard kick-net sampling, water samples for eDNA analyses were also taken at 92 of these sampling sites and were analyzed amplifying a barcode region for macroinvertebrates (Fig 1). More details of the sampling sites can be found in the supplements (S1 Table in S1 File), also providing information about the scores, the predictions and the classification for each site.

Fig 1. Description of the study design.

Fig 1

A) The taxonomic composition of benthic macroinvertebrates at each sampling site was assessed with two methods: Kick-net and eDNA sampling. Subsequently, the focal index on the biological state (IBCH Index) was calculated from kick-net and eDNA data. B) Map of Switzerland showing the spatial setup of the biomonitoring sampling sites. Sampling sites are given as black points overlaid on the main network of rivers and lakes. Different blue shading highlights major catchments of Switzerland.

In brief: eDNA samples were collected in four replicates before the kick-net sampling. For each sampling site, two filter replicates were taken per riverbank (right and left bank, respectively, total n per site = 4). In these river systems, communities are not expected to systematically differ between the left and right riverbanks. Water was filtered on site (500 mL per filter, 2L per site), using a disposable sterile 60 ml syringe and Sterivex filters with a 0.22 μm pore size (Merck Millipore, Merck KgaA, Darmstadt, Germany). A total volume of 2 L was filtered per sampling site. Sterivex filters were then sealed with Luer caps (Merck Millipore, Merck KgaA, Darmstadt, Germany), put in a labeled plastic bag and placed in a cool box for short-term storage. After transport to the lab, filters were stored in the fridge at –20°C until further processing. After placing the eDNA samples in the cool box, kick-net samples were collected upstream of the eDNA sampling sites in order to sample undisturbed habitats and to account for downstream transport of DNA, following standard procedures for the sampling of river systems [43]. The traditional sampling of macroinvertebrate groups by contractors followed commonly used kick-net methods [44]. In brief: At each site, the benthic fauna of eight microhabitats were sampled with a kick-net, each microhabitat was sampled for 30 seconds by disturbing the substrate by foot. Coarse organic material, sediment and gravel, and non-target organisms such as fish or amphibians were removed from the sample on site. The remaining sample was preserved with 95% ETOH on site. In the laboratory, macroinvertebrate specimens were sorted and classified into 145 pre-defined taxonomic indicator groups (S2 Table in S1 File). These groups were mostly at the family level, except for Porifera, Bryozoa, and Cnidaria, which were only grouped at the phylum level.

eDNA sample processing

The extraction of the eDNA samples took place in a specialized laboratory following clean lab procedures [45]. The DNA was extracted by using the Qiagen PowerWater Sterivex Extraction Kit (Qiagen, Germany) following the manufacturer’s protocol. Extractions took place in batches of twelve samples. Field filter controls served as negative extraction controls and were extracted randomly among the samples. The extracted DNA was eluted in 100 μl elution buffer and stored at –20°C until further processing.

Library preparation

Samples were amplified using a 313 bp fragment of the COI marker. The primer pair used was mICOIintF and jgHCO2198 [46,47] with a modification to include the Nextera® transposase sequences (S3 Table in S1 File). All samples, negative and positive controls (the latter being a synthetic oligo, see S3 Table in S1 File) were randomized over four 96-well PCR plates. The first PCR was carried out in a total volume of 25 μL containing polymerase AmpliTaq Gold 360° (1.25 U/μL), 0.5 μM each of each primer, 1x Buffer I (Thermo Fisher Scientific, MD, USA), BSA (0.1 mg/μL), dNTP (0.2 mM), MgCl2 (1 mM), SigmaFree water and 2 μL of DNA template was added per reaction. The PCRs were performed on a thermal cycler (Biometra T1 Thermocycler, Analytik Jena GMBH, Ge) using the following touchdown protocol: initial denaturation at 95°C for 10 min, the first 25 cycles started with the denaturation at 95°C for 15 s, annealing at 62°C for 30 s, followed by extension at 72°C for 30 s. After this the cycler performed 16 cycles where the annealing temperature was reduced by one degree each cycle, performing the last cycle at a temperature of 45 degrees. Final extension was performed at 72°C for 5 min before the plates were cooled down to 10°C. All samples were tested for amplification success with the AM320 method on the QiAxcel Screening Cartridge (Qiagen, Germany). First step PCR products were cleaned with the ZR DNA Sequencing clean-up Kit (Zymo Research, USA) following the manufacturer’s protocol with the minor modification by which the elution step was prolonged to 2 min. at 4000 g.

The clean amplicons were indexed using the Illumina Nextera XT Index Kit A and D following the manufacturer’s protocol (Illumina, Inc., San Diego, CA, USA). A reaction contained 25 μL 2x KAPA HIFI HotStart ReadyMix (Kapa Biosystems, Inc., USA), 5 μL of each of the Nextera XT Index adaptors, and 15 μL of the DNA templates. The second reaction had the following PCR protocol: initial activation at 95°C for 10 min, thermal cycling following a denaturation at 95°C for 30 s; annealing at 55°C for 30 s; extension 72°C for 30 s. After 8 cycles of final extension at 72°C for 5 minutes, they were cooled to 10°C and stored in the fridge at 4°C for downstream application. PCR products were then cleaned using the Thermo MG Magjet bead clean-up kit and a customized program for the KingFisher Flex Purification System (Thermo Fisher Scientific Inc., MA, USA) to remove excessive Nextera XT adaptors. The cleaned product was then eluted 50 μl in a fresh plate and stored at 4°C.

DNA quantification and normalization

Clean PCR products were quantified using the Qubit BR DNA Assay Kit (Life Technologies, Carlsbad, CA, USA). DNA of samples and the standard dilution series were quantified in replicates on a Spark Multimode Microplate Reader (Tecan, US Inc., USA). Samples were united in a normalization step into equimolar pools, according to their respective concentration using the BRAND Liquid Handling Station (BRAND GMBH + CO KG, Wertheim, GE). Negative extraction and PCR controls were added according to their concentration. The final library was cleaned using SPRI beads (0.8 x) twice. Library concentration was quantified by the Qubit with the HS Assay Kit and amplicon size was verified on the Agilent 4200 TapeStation (Agilent Technologies, Inc., USA). The Nextera XT library prep Kit (Illumina, Inc. San Diego, CA, USA) was used before loading the library onto the flow cell with a 16 pM target concentration and 10% PhiX. Paired-end sequencing using v3 chemistry was performed on an Illumina MiSeq (Illumina, Inc. San Diego, CA, USA) at the Genetic Diversity Center (ETH, Zurich).

Bioinformatic amplicon sequence analysis and quality filtering

The bioinformatics workflow for post-sequencing data processing used the following approach: the data quality of the demultiplexed reads was checked using FastQC [48]. Raw reads were first end-trimmed; merged and full-length primer sites removed using usearch (v11.0.667_i86linux64) [43]. The merged and primer trimmed reads were quality filtered using prinseq-lite (v0.20.4). The UNOISE3 (usearch v10.0.240) [44] method with an additional clustering at 99% identity was applied to obtain error corrected and chimera-filtered sequence variants (zero-radius OTUs) [45]. The invertebrate mt code was used to check for stop codons in sequences; retained were zOTUS (OTUs hereafter) with open reading frames. These OTUs were mapped against a customized COI reference for taxonomic assignments (S6 Information in S1 File). A more detailed workflow report file including parameter settings and documenting data loss are specified in the supplements (S7 Information in S1 File).

Statistical analysis

All analyses were performed in R (v3.6.0) [49]. The data were imported using the package (v1.28.0) [50]. In the first step, raw data were filtered based on the detection of OTUs in negative and positive controls. For this, a read threshold of OTU detected in controls was calculated and subsequently subtracted from every sample. The threshold was based on the number of reads in controls versus the overall number of reads in all samples for this species, so for every OTU we calculated a minimum number of reads in a sample to count as a detection, lower read numbers were removed. Further, we checked the read depth of samples compared to controls. To reduce spurious and stochastically detected OTUs, the filter replicates per site were used to establish stringency filters. With the least stringent filter, an OTU had to be detected in 1 out of the 4 filters to be retained. Increasing stringency kept only OTUs detected in at least 2, 3, or 4 out of 4 filter replicates in the data. Here, we used the data with an OTU detection of 2 out of 4 filters. We tested the sensitivity of our results with respect to the two out of four threshold, and found that the results are qualitatively and quantitatively consistent with a less stringent threshold. To compare the detection of indicators from the kick-net and the eDNA data, a taxonomic filter removed non-target OTUs as defined by the monitoring framework (S2 Table in S1 File).

After those filter steps, spatial patterns of indicator group richness based on presence-absence of indicators were plotted using Swissriverplot (v1.28.0) [51]. In a second step, the taxonomic composition was compared using ggalluvial (v0.12.0) [52] and ggplot2 (v3.3.2) [53]. For eDNA metabarcoding data, the proportion of a taxon was calculated as “reads of all OTUs within one indicator group/total number reads”. For kick-net-samples, proportions are “counts within one indicator group/total number of counts”.

Under the framework of the Swiss-wide NAWA biomonitoring, the ecological state of rivers is evaluated based on the calculation of a biotic index from kick-net data [42]. The index accounts for taxa diversity and taxa indicator values and is represented as a numerical score from 0 to 20. The higher the score, the less anthropogenic influence is recognizable at a site. The numerical scores are then categorized into five categories “bad” to “very good”. For the calculation of the index, specimens captured in kick-net samples had been sorted, identified to phylum or family level (S2 Table in S1 File), and counted. The calculation is based on the diversity and indicator value of the taxa [44]. The eDNA data were also analyzed to predict the biotic index, using a supervised random forest model. In order to run the random forest model, we included only indicator macroinvertebrate groups, therefore we subset the eDNA data to the OTUs assigned to the following phyla: Arthropoda, Cnidaria, Porifera, Bryozoa, Mollusca. The presence-absence data of OTUs were used to train a random forest model based on a taxonomy-free approach [39]. A grid search was performed in caret (v6.0.86) [54] to establish the optimal value for the parameters mtry (n/3), node size (3), and ntrees (500). A random forest model with those optimal parameters was fitted for every sample using ranger (v0.12.1) [55]. Based on a random subset of the samples (mtry = n/3), the random forest classifier was trained using OTUs of indicator taxa as predictive features and the index score calculated from kick-net as the response variable. The classifier subsequently predicted the biotic index of a sample based on the OTU composition. This biotic index was predicted for every sample iteratively in ranger. The relationship between the observed and the predicted score was evaluated based on the adjusted R2 of a linear model and the goodness of fit based on Cohen’s Kappa κ [56]. The level of agreement is described for all values as: κ < 0.05: no agreement, 0.05 < κ < 0.20: very poor, 0.20 < κ < 0.40: poor, 0.40 < κ < 0.55: fair, 0.55 < κ < 0.70: good, 0.70 < κ < 0.85: very good, 0.85 < κ < 0.99: excellent, and κ = 1: perfect.

Results

Summary of raw amplicon sequencing data

Next generation sequencing of eDNA generated 26.64 million reads of which 24.60 million passed the quality filter. After bioinformatic processing, 15.8 million reads were left for downstream analysis. The average sequencing depth per sampling site (4 filter replicates pooled) was 166,827 reads (range: 30,543–660,048), covering in total 7,231 OTUs. After removal of weak samples and cleaning of the raw data based on positive and negative controls, only OTUs detected in at least 2 out of 4 filter replicates per sampling site were retained in the data. This step decreased the mean read depth per sampling site to 147,913 reads (range: 40,891–602,621; Fig 2A) and the mean number of OTUs per sampling site to 835 (range: 118–1698; Fig 2B). In total 4,599 OTUs were retained after this “2 out of 4” filter step, and of those, 205 OTUs were assigned to the 145 taxonomic levels of indicator groups used for the calculation of the biotic index. The read abundance distribution indicates that the indicator OTUs generally have a relatively high read coverage but were also interspersed by non-target OTUs (Fig 2C). In the kick-net sampling, a total of 145 possible indicator taxa were assessed (S2 Table in S1 File), of which 98 were detected in this monitoring campaign.

Fig 2. Filtering of raw sequencing data.

Fig 2

Distribution of A) mean read number per sampling site and B) mean OTU number per sampling site using thresholds based on detection rate in the four field filter replicates. In the violin plots, the black dots indicate A) the mean read number over all sampling sites and B) the mean number of OTUs over all sites, the black vertical lines span the 95%-quantiles of all values. The detection rate is given with increasing stringency: Detection of an OTU in at least 1, 2, 3, or 4 out of 4 filter replicates per site, respectively. C) Read abundance distribution of OTUs (n = 4599) detected in at least 2 out of 4 replicates per site. Read abundances of OTUs that were taxonomically assigned to indicator taxa are highlighted in green.

Local diversity pattern (alpha diversity)

To compare the alpha diversity pattern of macroinvertebrates from kick-net with eDNA sampling, indicator group richness was mapped for all 92 sampling sites in Switzerland (Fig 3). The observed mean richness at a site derived from kick-net was 23 indicator taxa (range: 11–41). Constraining the sequencing data to OTUs assigned to all possible indicator taxa, the eDNA approach led to the detection of 205 OTUs in 21 families (OTUs assigned to family, genus or species levels). A large portion of OTUs was not considered for the downstream analysis, either belonging to non-indicator taxa or lacking taxonomic assignments. The observed mean richness of indicators by eDNA sampling was 9 indicator taxa (range: 2–18, Fig 3). A linear model showed little agreement in local richness pattern (adj. R2 = 0.026, p = 0.08) detected by the two methods (S5 Fig in S1 File). However, eDNA detected significantly fewer indicator taxa at a site level (p < 0.001).

Fig 3. Spatial richness pattern.

Fig 3

The taxonomic richness of indicator groups in Swiss rivers at each sampling site based on A) kick-net monitoring and B) eDNA monitoring. For the latter, only macroinvertebrates also considered in the traditional biological assessment are included. The color gradient is adjusted to the respective range of indicator richness values.

Overall diversity pattern (gamma diversity)

The overall composition of indicator taxa (gamma diversity) detected by the two methods corresponded adequately for common groups (i.e., Diptera, Ephemeroptera, Plecoptera, Amphipoda). Abundances of indicators were rendered comparable by using read and count data on a log-transformed proportional scale (Fig 4). Abundant indicator taxa in the kick-net sampling were also identified as common groups derived from eDNA. The most frequently detected indicator taxa were Diptera, Ephemeroptera, Plecoptera, Trichoptera, and Amphipoda. Of those, Diptera and Ephemeroptera ranked equally in kick-net and eDNA data. Proportionally, Trichoptera was more abundant than Plecoptera in kick-net samples, but this ranking was reversed in eDNA data. For less frequently observed groups in the community, their relative rank varied between the two methods (e.g., Cnidaria, Isopoda, Coleoptera, Oligochaeta, Mollusca, or Gastropoda) (S4 Table in S1 File).

Fig 4. Overall diversity pattern.

Fig 4

Proportions of indicator groups detected by kick-net versus eDNA monitoring. The stacked bars indicate proportions of indicator groups inferred by the two methods (proportion of counts for kick-net and of reads for eDNA). For the most common indicator groups, names are given. The flows between the two stacked bars connect families within indicator groups between the methods. A change in flow width indicates a change in proportion depending on the method used.

Inference of the biotic index from eDNA data

The biotic index is calculated based on two components: one is the taxa richness, the other one is the occurrence of indicators groups. Kick-net and eDNA sampling methods picked up diverging richness patterns of indicator taxa, but largely showed a similar composition of indicator communities. The realized biotic index score was on average 13.3 (range: 8–17). The random forest model predictions were highly correlated with the biotic index scores observed from kick-net sampling (R2 = 0.61, p < 0.001) (Fig 5).

Fig 5. Biotic index based on bioindicators.

Fig 5

Comparison of the index on the biological state (IBCH index) based on kick-net-derived scores (IBCH index observed) versus the predicted index derived from eDNA data. The predictions are the output from a random forest model deriving IBCH index scores using OTU presence-absence as input. A linear regression model gives the relationship between observed and predicted values (adjusted R2 = 0.61, p-value < 0.001). The colored boxes summarize the numerical index scores ranging from 5 to 20 into categories ranging from “unsatisfactory” to “very good”.

Similar to the kick-net assessment, most predictions of the index were centered on the categories “intermediate” to “good”, and only a few were predicted to fall into the categories “bad” or “very good” (Fig 6A). The majority of the predictions (72%) classified the ecological state of sampling sites correspondingly to the traditional kick-net-based estimates, and maximally diverged by one category (Fig 6B).

Fig 6. Distribution of predicted classifications of the biotic state.

Fig 6

Comparison of the biological state of sampling sites when comparing classifications based on kick-net or random forest predictions. A) The density distributions for the observed (kick-net-based, grey) and the predicted (eDNA-based, green) IBCH index scores. The x-axis indicates the range of the biological index from 5 to 20. B) Barplot showing the percentage of sites that fell in the same (x = 0) or different (x ≠ 0) category by the random forest predictions based on eDNA data compared to the kick-net-based classifications. The majority of sampling sites were classified in the same category (72%). All other sites (28%) were maximally deviating by one category.

Discussion

Molecular surveys for traditionally established bioindicators

Despite the demonstrated suitability of eDNA for the survey of species and communities in aquatic ecosystems [1013], routine implementation of molecular approaches in biomonitoring such as water quality assessments remains scarce. Here, we demonstrated the utility of water eDNA sampling for the assessment of macroinvertebrate-based ecological indicators on a national scale with a comparison to the traditional kick-net approach. Overall, the most common indicator taxa detected by eDNA were equally well covered by kick-net sampling at the national level (gamma diversity). However, local richness patterns (alpha diversity) were less consistent between the two methods, as eDNA samples on average detected fewer indicator taxa at the site level. Nevertheless, the composition of indicator OTUs at a site effectively informed about the ecological state, when using a machine learning algorithm. This study shows that eDNA is a valuable resource for the detection of macroinvertebrate indicator taxa and subsequent calculation of biotic indices, and it can be implemented for the ecological assessment of rivers.

Local and overall diversity patterns

Diversity patterns of macroinvertebrate communities in this study were restricted to the indicator groups as defined by the 145 taxonomic levels in the biotic IBCH index only. Significantly fewer indicator taxa were detected by eDNA sampling on a site level, although the ranking of the indicator taxa based on their overall relative abundance (proportions) in the community was similar for the most common groups. Diversity assessments of macroinvertebrate communities through COI metabarcoding have often reported similar or higher numbers of taxa compared to traditional methods (e.g., [17,18,57]), however, only a proportion of all reads are assigned to macroinvertebrates and even less to indicator taxa [58,59]. Arthropods are essential to the biotic assessment based on macroinvertebrates, and were the most dominant group among the OTUs assigned to bioindicators. These were consistently detected by eDNA and kick-net, with Diptera as the most common order, followed by orders with high indicator values, namely Ephemeroptera, Plecoptera, Trichoptera. However, some target groups were starkly underrepresented by eDNA, namely Hemiptera, Arachnida, and Coleoptera. The ecology of the target species delivers a possible explanation, e.g., their hydrophobic exoskeletons and the lower DNA shedding rates of these organisms [60], impairing these species’ detection [61].

In river systems, DNA fragments are transported with the water flow [62,63], potentially mixing signals of locally occurring species as well as species only occurring further upstream. Thus, the two methods differ in the kick-net method being a truly local assessment, while the eDNA approach also integrating information on the communities along the stream [64]. The detection probability of target taxa can thus be influenced by the sampling strategy [65,66], and should generally consider the dendritic network structure of rivers [67]. This difference may be explaining the partial mismatches of local richness or identity of organisms measured with the eDNA method compared to the kick-net approach, and result in more complementary than directly comparable diversity and composition estimates.

Another factor affecting the comparison between molecular and traditional methods is the taxonomic range of bioindicators used. The macroinvertebrate indicator groups used in the calculation of the biotic index are selected due to their known biotic responses to stressors and their size and are not a monophyletic group. In order to target those metazoans traditionally surveyed, the metabarcoding approach relies on highly degenerate primers [68], which amplify a wide range of non-target DNA fragments from eDNA samples. This approach has the drawback of unspecific amplification of non-target organisms at the cost of target organism sequences [30]. Therefore, for the comparison of diversity with the traditional approach, only a small fraction of the eDNA reads corresponding to macroinvertebrate indicator taxa are used in the biotic index. In combination with the high diversity detected with degenerate primers, this might hamper the detection of less abundant taxa at a site [16], as many reads are assigned to non-target taxa. As a possible solution, local richness measures may be improved by using more recently developed group-specific primers (e.g., [69]) or a combination of multiple markers for multiple groups [70]. With this, the eDNA metabarcoding would be more targeted towards the classical macroinvertebrate indicator taxa without the drawback of amplifying non-target taxa and lost read depth. Alternatively, the taxa considered could be extended and also include the many invertebrates amplified but not considered in the kick-net-based indices.

Inference of the biotic index from eDNA data

Although eDNA unraveled lower indicator taxa richness locally, the overall community composition was similar for both methods and is thus promising for the implementation of eDNA-based biotic indices for water quality assessments. However, reads from a metabarcoding approach cannot be translated into species counts [32,33], and thus we could not directly calculate the IBCH index based on specimen counts for eDNA reads. As molecular monitoring provides different data than traditional surveys, novel approaches to fully exploit the eDNA derived community assemblages are needed [34,35]. Here, a supervised machine learning algorithm, i.e., random forest [71], was used to predict the biotic index based on the composition of indicator OTUs as predictive features of the ecological state of a site. This data-driven approach allows for the inclusion of the comprehensive list of indicator OTUs, as in contrast to the one-to-one filtering for the diversity measures, it is not based on taxonomically assigned OTUs as features. Instead of restricting the input data to the taxonomic levels of the biotic index (n = 205 OTUs), the machine learning algorithm included all OTUs belonging to surveyed macroinvertebrate phyla (n = 693 OTUs). That is, the machine learning approach was based on the OTU-level diversity of aquatic communities at a site. By this, we were able to include multiple OTUs previously united at family level, thus describe the community on a more nuanced level and therefore extract information not considered by the traditional index calculation. In the kick-net-based index calculations, only organisms captured by kick-net sampling, identified to family level at best and known to be responsive to stressors are included, whereas the eDNA survey is not restricted to these groups.

The incorporation of the OTU-level information resulted in highly comparable predictions of ecological status of the individual river sites, despite the previously described mismatch of local richness patterns. It showed that the great majority of predictions on the state of the river at a site corresponded with the classic approach, and maximally one category divergence between these two methods was observed. Importantly, a mismatch between the two methods does not necessarily mean that the eDNA-based approach is less accurate, as both approaches are proxies (each with their inherent error) of a true state to be estimated.

The random forest model is well-suited to deal with high dimensional data [37] such as the community composition (observed OTUs) at each site. Using the kick-net-based observed ecological state as the response variable, the model is trained on a random subset of sites. After the training phase, it is then applied in order to infer the ecological state of any site based on the composition community, without the need of pre-assigning indicator values to the OTUs. This opens up the opportunity to use the information of metabarcoding more comprehensively and to shift away from the limited range of previously established indicator taxa. Machine learning as a data-driven approach could thus be used to identify sensitive taxa that were, due to limitations of the traditional methods, previously out of scope for ecological assessments. However, the predictive power of data-driven approaches is limited by the range of data. In this study, the values of the biotic index for most study sites were centered on the categories of “intermediate” and “good”; fewer observations were available to train models for moderate or very good sites, while none of the sites were classified as “poor”. With a better coverage of all the possible categories of the index, the predictive power of the machine-learning algorithm could increase, as the distribution of averaged prediction from regression trees is narrower than the observed range of values [72]. Furthermore, the inference of causal links with random forest is limited. The data-driven predictions and the resulting classifications do not provide a mechanistic understanding of the biotic index calculation. The interpretation of misclassifications relies on the ecological understanding of the importance of input features, i.e. OTUs for the prediction. Despite these limitations, machine learning approaches have shown similar applicability for other biotic indicator taxa such as diatoms [24], macroinvertebrates [73], and taxonomy-free based approaches on prokaryotic and eukaryotic communities [39]. Overall, supervised machine learning can offer complementing or novel insights for the interpretation of big data in an ecological context, especially in the context of biotic indices when the like-for-like comparison is hindered by methodological differences.

Conclusion

By carrying out a comparison of eDNA sampling with kick-net samples on a large scale, we take a crucial step in advancing the use of molecular methods for direct application in the assessment of the ecological state in routine monitoring programs. This study shows that eDNA sampling from water compared to kick-net data can give different estimates for the composition and diversity of macroinvertebrate communities at a local scale. Importantly, however, when assembling the community data on the level of the biotic states of river systems, both, kick-net and eDNA data indicate very comparable classifications of the ecological state. Thus, while the two methods are complementary at the level of biodiversity estimates, they still give comparable results at the level of ecological indices. Especially for biomonitorings, where the data on community composition and diversity is summarized into a biological index, eDNA can thus be a valid method to recover comparable assessments of ecological integrity.

Supporting information

S1 File. Additional information (S1-S7) supporting the study entitled “environmental DNA gives comparable results to morphology-based indices of macroinvertebrates in a large-scale ecological assessment”.

(DOCX)

Acknowledgments

We thank Silvia Kobel and Aria Minder for technical advice in the laboratory. We thank the Swiss Federal Office for the Environment (BAFU/FOEN) and all the contractors for logistic support and the provision of the eDNA samples. The data analyzed in this paper were generated in collaboration with the Genetic Diversity Centre (GDC), ETH Zurich. We thank Noriko Uchida and the second, anonymous reviewer for their constructive feedback on our manuscript.

Data Availability

All raw Illumina sequencing data files are available from the European Nucletoide Archive (project number PRJEB44539).

Funding Statement

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Funding for the project (to FA) is provided by the Swiss National Science Foundation Grant No 31003A_173074 and the Swiss Federal Office for the Environment (BAFU/FOEN).

References

  • 1.Dirzo R, Young HS, Galetti M, Ceballos G, Isaac NJB, Collen B. Defaunation in the Anthropocene. Science. 2014;345: 401–406. doi: 10.1126/science.1251817 [DOI] [PubMed] [Google Scholar]
  • 2.Dudgeon D. Multiple threats imperil freshwater biodiversity in the Anthropocene. Current Biology. 2019;29: 960–967. doi: 10.1016/j.cub.2019.08.002 [DOI] [PubMed] [Google Scholar]
  • 3.Brauman KA, Garibaldi LA, Polasky S, Aumeeruddy-Thomas Y, Brancalion PHS, DeClerck F, et al. Global trends in nature’s contributions to people. Proceedings of the National Academy of Sciences. 2020; 202010473. doi: 10.1073/pnas.2010473117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Karr J. Biological Integrity—A Long-Neglected Aspect of Water-Resource Management. Ecological Applications. 1991;1: 66–84. doi: 10.2307/1941848 [DOI] [PubMed] [Google Scholar]
  • 5.Borja A, Dauer DM, Gremare A. The importance of setting targets and reference conditions in assessing marine ecosystem quality. Ecological Indicators. 2012;12: 1–7. doi: 10.1016/j.ecolind.2011.06.018 [DOI] [Google Scholar]
  • 6.Pawlowski J, Apothéloz-Perret-Gentil L, Altermatt F. Environmental DNA: What’s behind the term? Clarifying the terminology and recommendations for its future use in biomonitoring. Molecular Ecology. 2020;29: 4258–4264. 10.1111/mec.15643 [DOI] [PubMed] [Google Scholar]
  • 7.Barbour M, Gerritsen J, Snyder B, Stribling J. Rapid bioassessment protocols foruse in streams and wadable rivers: Periphyton, benthic invertebrates and fish. Second Edition. U.S. Environmental Protection Agency; Office of Water; Washington, D.C.; 1999. Available: http://www.epa.gov/OWOW/monitoring/techmon.html.
  • 8.Lobo EA, Heinrich CG, Schuch M, Wetzel CE, Ector L. Diatoms as Bioindicators in Rivers. In: Necchi JR O, editor. River Algae. Cham: Springer International Publishing; 2016. pp. 245–271. doi: 10.1007/978-3-319-31984-1_11 [DOI] [Google Scholar]
  • 9.Sweeney BW, Battle JM, Jackson JK, Dapkey T. Can DNA barcodes of stream macroinvertebrates improve descriptions of community structure and water quality? Journal of the North American Benthological Society. 2011;30: 195–216. doi: 10.1899/10-016.1 [DOI] [Google Scholar]
  • 10.Mächler E, Deiner K, Steinmann P, Altermatt F. Utility of environmental DNA for monitoring rare and indicator macroinvertebrate species. Freshwater Science. 2014;33: 1174–1183. doi: 10.1086/678128 [DOI] [Google Scholar]
  • 11.Deiner K, Bik HM, Mächler E, Seymour M, Lacoursière-Roussel A, Altermatt F, et al. Environmental DNA metabarcoding: Transforming how we survey animal and plant communities. Molecular Ecology. 2017;26: 5872–5895. doi: 10.1111/mec.14350 [DOI] [PubMed] [Google Scholar]
  • 12.Altermatt F, Little CJ, Mächler E, Wang S, Zhang X, Blackman RC. Uncovering the complete biodiversity structure in spatial networks: the example of riverine systems. Oikos. 2020;129: 607–618. 10.1111/oik.06806. [DOI] [Google Scholar]
  • 13.Taberlet P, Coissac E, Pompanon F, Brochmann C, Willerslev E. Towards next-generation biodiversity assessment using DNA metabarcoding. Molecular Ecology. 2012;21: 2045–2050. doi: 10.1111/j.1365-294X.2012.05470.x [DOI] [PubMed] [Google Scholar]
  • 14.Aglieri G, Baillie C, Mariani S, Cattano C, Calò A, Turco G, et al. Environmental DNA effectively captures functional diversity of coastal fish communities. Molecular Ecology. 00: 1–13. 10.1111/mec.15661 [DOI] [PubMed] [Google Scholar]
  • 15.Pont D, Valentini A, Rocle M, Maire A, Delaigue O, Jean P, et al. The future of fish-based ecological assessment of European rivers: from traditional EU Water Framework Directive compliant methods to eDNA metabarcoding-based approaches. Journal of Fish Biology. 2019;98: 354–366. 10.1111/jfb.14176 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Carew ME, Pettigrove VJ, Metzeling L, Hoffmann AA. Environmental monitoring using next generation sequencing: rapid identification of macroinvertebrate bioindicator species. Frontiers in Zoology. 2013;10: 45. doi: 10.1186/1742-9994-10-45 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Fernández S, Rodríguez S, Martínez JL, Borrell YJ, Ardura A, García-Vázquez E. Evaluating freshwater macroinvertebrates from eDNA metabarcoding: A river Nalón case study. PLOS ONE. 2018;13: e0201741. doi: 10.1371/journal.pone.0201741 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Uchida N, Kubota K, Aita S, Kazama S. Aquatic insect community structure revealed by eDNA metabarcoding derives indices for environmental assessment. PeerJ. 2020;8: e9176. doi: 10.7717/peerj.9176 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Camargo JA. Responses of aquatic macrophytes to anthropogenic pressures: comparison between macrophyte metrics and indices. Environmental Monitoring and Assessment. 2018;190: 173. doi: 10.1007/s10661-018-6549-y [DOI] [PubMed] [Google Scholar]
  • 20.Ortega A, Geraldi NR, Duarte CM. Environmental DNA identifies marine macrophyte contributions to Blue Carbon sediments. Limnology and Oceanography. 2020;65: 3139–3149. 10.1002/lno.11579. [DOI] [Google Scholar]
  • 21.Apothéloz-Perret-Gentil L, Bouchez A, Cordier T, Cordonier A, Guéguen J, Rimet F, et al. Monitoring the ecological status of rivers with diatom eDNA metabarcoding: A comparison of taxonomic markers and analytical approaches for the inference of a molecular diatom index. Molecular Ecology. 2020;00: 1–10. 10.1111/mec.15646 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Rimet F, Abarca N, Bouchez A, Kusber W-H, Jahn R, Kahlert M, et al. The potential of High-Throughput Sequencing (HTS) of natural samples as a source of primary taxonomic information for reference libraries of diatom barcodes. Fottea. 2018;18: 37–54. doi: 10.5507/fot.2017.013 [DOI] [Google Scholar]
  • 23.Li F, Peng Y, Fang W, Altermatt F, Xie Y, Yang J, et al. Application of Environmental DNA Metabarcoding for Predicting Anthropogenic Pollution in Rivers. Environmental Science & Technology. 2018;52: 11708–11719. doi: 10.1021/acs.est.8b03869 [DOI] [PubMed] [Google Scholar]
  • 24.Pawlowski J, Kelly-Quinn M, Altermatt F, Apothéloz-Perret-Gentil L, Beja P, Boggero A, et al. The future of biotic indices in the ecogenomic era: Integrating (e)DNA metabarcoding in biological assessment of aquatic ecosystems. Science of The Total Environment. 2018;637–638: 1295–1310. doi: 10.1016/j.scitotenv.2018.05.002 [DOI] [PubMed] [Google Scholar]
  • 25.Bush A, Compson ZG, Monk WA, Porter TM, Steeves R, Emilson E, et al. Studying Ecosystems With DNA Metabarcoding: Lessons From Biomonitoring of Aquatic Macroinvertebrates. Frontiers in Ecology and Evolution. 2019;7: 434. doi: 10.3389/fevo.2019.00434 [DOI] [Google Scholar]
  • 26.Keck F, Blackman RC, Bossart R, Brantschen J, Couton M, Hürlemann S, et al. Meta-analysis shows both congruence and complementarity of DNA metabarcoding to traditional methods for biological community assessment. Ecology; 2021Jun. doi: 10.1101/2021.06.29.450286 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Cordier T, Alonso-Sáez L, Apothéloz-Perret-Gentil L, Aylagas E, Bohan DA, Bouchez A, et al. Ecosystems monitoring powered by environmental genomics: A review of current strategies with an implementation roadmap. Mol Ecol. 2021;30: 2937–2958. doi: 10.1111/mec.15472 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Ruppert KM, Kline RJ, Rahman MS. Past, present, and future perspectives of environmental DNA (eDNA) metabarcoding: A systematic review in methods, monitoring, and applications of global eDNA. Global Ecology and Conservation. 2019;17: e00547. doi: 10.1016/j.gecco.2019.e00547 [DOI] [Google Scholar]
  • 29.Carew ME, Miller AD, Hoffmann AA. Phylogenetic signals and ecotoxicological responses: potential implications for aquatic biomonitoring. Ecotoxicology. 2011;20: 595–606. doi: 10.1007/s10646-011-0615-3 [DOI] [PubMed] [Google Scholar]
  • 30.Hajibabaei M, Porter T, Robinson C, Baird D, Shokralla S, Wright M. Watered-down biodiversity? A comparison of metabarcoding results from DNA extracted from matched water and bulk tissue biomonitoring samples. 2019. doi: 10.1371/journal.pone.0225409 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Deiner K, Fronhofer EA, Mächler E, Walser J-C, Altermatt F. Environmental DNA reveals that rivers are conveyer belts of biodiversity information. Nature Communications. 2016;7: 12544. doi: 10.1038/ncomms12544 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Elbrecht V, Leese F. Can DNA-Based Ecosystem Assessments Quantify Species Abundance? Testing Primer Bias and Biomass—Sequence Relationships with an Innovative Metabarcoding Protocol. Hajibabaei M, editor. PLoS ONE. 2015;10: e0130324. doi: 10.1371/journal.pone.0130324 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Piñol J, Senar MA, Symondson WOC. The choice of universal primers and the characteristics of the species mixture determine when DNA metabarcoding can be quantitative. Molecular Ecology. 2019;28: 407–419. 10.1111/mec.14776 [DOI] [PubMed] [Google Scholar]
  • 34.Garcia C. From ecological indicators to ecological functioning: Integrative approaches to seize on ecological, climatic and socio-economic databases. Ecological Indicators. 2019;107: 105612. doi: 10.1016/j.ecolind.2019.105612 [DOI] [Google Scholar]
  • 35.Compson ZG, McClenaghan B, Singer GAC, Fahner NA, Hajibabaei M. Metabarcoding From Microbes to Mammals: Comprehensive Bioassessment on a Global Scale. Frontiers in Ecology and Evolution. 2020;8. doi: 10.3389/fevo.2020.581835 [DOI] [Google Scholar]
  • 36.Jordan MI, Mitchell TM. Machine learning: Trends, perspectives, and prospects. Science. 2015;349: 255–260. doi: 10.1126/science.aaa8415 [DOI] [PubMed] [Google Scholar]
  • 37.Cutler A, Cutler D, Stevens J. Random Forests. Machine Learning—ML. 2011. pp. 157–176. doi: 10.1007/978-1-4419-9326-7_5 [DOI] [Google Scholar]
  • 38.Bohan DA, Vacher C, Tamaddoni-Nezhad A, Raybould A, Dumbrell AJ, Woodward G. Next-Generation Global Biomonitoring: Large-scale, Automated Reconstruction of Ecological Networks. Trends in Ecology & Evolution. 2017;32: 477–487. doi: 10.1016/j.tree.2017.03.001 [DOI] [PubMed] [Google Scholar]
  • 39.Cordier T, Forster D, Dufresne Y, Martins C, Stoeck T, Pawlowski J. Supervised machine learning outperforms taxonomy-based environmental DNA metabarcoding applied to biomonitoring. Molecular Ecology Resources. 2018;18. doi: 10.1111/1755-0998.12694 [DOI] [PubMed] [Google Scholar]
  • 40.Keck F, Vasselon V, Rimet F, Bouchez A, Kahlert M. Boosting DNA metabarcoding for biomonitoring with phylogenetic estimation of operational taxonomic units’ ecological profiles. Molecular Ecology Resources. 2018;18: 1299–1309. 10.1111/1755-0998.12919 [DOI] [PubMed] [Google Scholar]
  • 41.Frühe L, Cordier T, Dully V, Breiner H, Lentendu G, Pawlowski J, et al. Supervised machine learning is superior to indicator value inference in monitoring the environmental impacts of salmon aquaculture using eDNA metabarcodes. Molecular Ecology. 2020;00: 1–19. doi: 10.1111/mec.15434 [DOI] [PubMed] [Google Scholar]
  • 42.FOEN, Bundesamt für Umwelt BAFU; Office fédéral de l’environnement OFEV; Ufficio federale dell’ambiente. NAWA–Nationale Beobachtung Oberflächengewässerqualität. Konzept Fliessgewässer. Umwelt-Wissen. 2013;Nr. 1327: 72 S.
  • 43.Pawlowski J, Apothéloz-Perret-Gentil L, Mächler E, Altermatt F. Environmental DNA applications in biomonitoring and bioassessment of aquatic ecosystems. Guidelines. Federal Office for the Environment, Bern. Environmental Studies. 2020; 71 pp.
  • 44.Stucki P. Methoden zur Untersuchung und Beurteilung der Fliessgewässer: Makrozoobenthos Stufe F. Bundesamt für Umwelt, Bern Umwelt-Vollzug. 2010;Umwelt-Vollzug Nr. 1026: 61 S.
  • 45.Deiner K, Walser J-C, Mächler E, Altermatt F. Choice of capture and extraction methods affect detection of freshwater biodiversity from environmental DNA. Biological Conservation. 2015;183: 53–63. doi: 10.1016/j.biocon.2014.11.018 [DOI] [Google Scholar]
  • 46.Leray M, Yang JY, Meyer CP, Mills SC, Agudelo N, Ranwez V, et al. A new versatile primer set targeting a short fragment of the mitochondrial COI region for metabarcoding metazoan diversity: application for characterizing coral reef fish gut contents. Frontiers in Zoology. 2013;10: 34. doi: 10.1186/1742-9994-10-34 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Geller J, Meyer C, Parker M, Hawk H. Redesign of PCR primers for mitochondrial cytochrome c oxidase subunit I for marine invertebrates and application in all-taxa biotic surveys. Molecular Ecology Resources. 2013;13: 851–861. doi: 10.1111/1755-0998.12138 [DOI] [PubMed] [Google Scholar]
  • 48.Andrews S, Krueger F, Segonds-Pichon A, Biggins L, Krueger C, Wingett S. FastQC A Quality Control tool for High Throughput Sequence Data. Babraham Bioinformatics. 2012. Available: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/. [Google Scholar]
  • 49.R Core Team. R: A language and environment for statistical computing. [cited 10 Mar 2021]. Available: https://www.R-project.org/.
  • 50.McMurdie PJ, Holmes S. phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data. PLoS ONE. 2013;8: e61217. doi: 10.1371/journal.pone.0061217 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Alther R, Altermatt F. SwissRiverPlot: Package to plot the Swiss river network in a customizable way. https://github.com/romanalther. 2020;R package version 0.2–14.
  • 52.Brunson CJ. ggalluavial: Alluvial Plots in ggplot2. In: R package ggalluvial. 2020. Available: https://cran.r-project.org/web/packages/ggalluvial/vignettes/ggalluvial.html.
  • 53.Wickham H, Chang W, Henry L, Pedersen TL, Takahashi K, Wilke C, et al. ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. 2020. Available: https://CRAN.R-project.org/package=ggplot2. [Google Scholar]
  • 54.Kuhn M, Wing J, Weston S, Williams A, Keefer C, Engelhardt A, et al. caret: Classification and Regression Training. 2020. Available: https://CRAN.R-project.org/package=caret. [Google Scholar]
  • 55.Wright MN, Ziegler A. ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. Journal of Statistical Software. 2017;77. doi: 10.18637/jss.v077.i01 [DOI] [Google Scholar]
  • 56.Cohen J. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement. 1960;20: 37–46. doi: 10.1177/001316446002000104 [DOI] [Google Scholar]
  • 57.Macher J-N, Vivancos A, Piggott JJ, Centeno FC, Matthaei CD, Leese F. Comparison of environmental DNA and bulk-sample metabarcoding using highly degenerate cytochrome c oxidase I primers. Molecular Ecology Resources. 2018;18: 1456–1468. doi: 10.1111/1755-0998.12940 [DOI] [PubMed] [Google Scholar]
  • 58.Mächler E, Little CJ, Wüthrich R, Alther R, Fronhofer EA, Gounand I, et al. Assessing different components of diversity across a river network using eDNA. Environmental DNA. 2019;1: 290–301. 10.1002/edn3.33. [DOI] [Google Scholar]
  • 59.Leese F, Sander M, Buchner D, Elbrecht V, Haase P, Zizka VMA. Improved freshwater macroinvertebrate detection from environmental DNA through minimized nontarget amplification. Environmental DNA. 2021;3: 261–276. doi: 10.1002/edn3.177 [DOI] [Google Scholar]
  • 60.Gleason JE, Elbrecht V, Braukmann TWA, Hanner RH, Cottenie K. Assessment of stream macroinvertebrate communities with eDNA is not congruent with tissue-based metabarcoding. Molecular Ecology. 2020;00: 1–13. doi: 10.1111/mec.15328 [DOI] [PubMed] [Google Scholar]
  • 61.Allan EA, Zhang WG, Lavery AC, Govindarajan AF. Environmental DNA shedding and decay rates from diverse animal forms and thermal regimes. Environmental DNA. 2021;3: 492–514. 10.1002/edn3.141. [DOI] [Google Scholar]
  • 62.Deiner K, Altermatt F. Transport Distance of Invertebrate Environmental DNA in a Natural River. PLOS ONE. 2014;9: e88786. doi: 10.1371/journal.pone.0088786 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Pont D, Rocle M, Valentini A, Civade R, Jean P, Maire A, et al. Environmental DNA reveals quantitative patterns of fish biodiversity in large rivers despite its downstream transportation. Sci Rep. 2018;8: 10361. doi: 10.1038/s41598-018-28424-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Carraro L, Mächler E, Wüthrich R, Altermatt F. Environmental DNA allows upscaling spatial patterns of biodiversity in freshwater ecosystems. Nature Communications. 2020;11: 3585. doi: 10.1038/s41467-020-17337-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Dickie IA, Boyer S, Buckley HL, Duncan RP, Gardner PP, Hogg ID, et al. Towards robust and repeatable sampling methods in eDNA-based studies. Molecular Ecology Resources. 2018;18: 940–952. 10.1111/1755-0998.12907 [DOI] [PubMed] [Google Scholar]
  • 66.Carraro L, Stauffer JB, Altermatt F. How to design optimal eDNA sampling strategies for biomonitoring in river networks. Environmental DNA. 2021;3: 157–172. doi: 10.1002/edn3.137 [DOI] [Google Scholar]
  • 67.Altermatt F. Diversity in riverine metacommunities: a network perspective. Aquatic Ecology. 2013;47: 365–377. doi: 10.1007/s10452-013-9450-3 [DOI] [Google Scholar]
  • 68.Elbrecht V, Leese F. Validation and Development of COI Metabarcoding Primers for Freshwater Macroinvertebrate Bioassessment. Frontiers in Environmental Science. 2017;5: 11. doi: 10.3389/fenvs.2017.00011 [DOI] [Google Scholar]
  • 69.Meyer A, Boyer F, Valentini A, Bonin A, Ficetola GF, Beisel J-N, et al. Morphological vs. DNA metabarcoding approaches for the evaluation of stream ecological status with benthic invertebrates: Testing different combinations of markers and strategies of data filtering. Molecular Ecology. 2020;00: 1–18. 10.1111/mec.15723 [DOI] [PubMed] [Google Scholar]
  • 70.Hajibabaei M, Porter TM, Wright M, Rudar J. COI metabarcoding primer choice affects richness and recovery of indicator taxa in freshwater systems. PLOS ONE. 2019;14: e0220953. doi: 10.1371/journal.pone.0220953 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Breiman L. Random Forests. Machine Learning. 2001;45: 5–32. doi: 10.1023/A:1010933404324 [DOI] [Google Scholar]
  • 72.Zhang X. Environmental DNA Shaping A New Era of Ecotoxicological Research. Environmental Science & Technology. 2019. [cited 29 Apr 2019]. doi: 10.1021/acs.est.8b06631 [DOI] [PubMed] [Google Scholar]
  • 73.Fan J, Wang S, Li H, Yan Z, Zhang Y, Zheng X, et al. Modeling the ecological status response of rivers to multiple stressors using machine learning: A comparison of environmental DNA metabarcoding and morphological data. Water Research. 2020;183: 116004. doi: 10.1016/j.watres.2020.116004 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Hideyuki Doi

27 Jul 2021

PONE-D-21-15608

Environmental DNA is comparable to morphology-based indices of macroinvertebrates in a large-scale ecological assessment

PLOS ONE

Dear Dr. Brantschen,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

==============================

I got the recommendations and comments from two expert reviewers on the field. The both reviewer agree that the manuscript is technically sound and the data support the conclusions. However, lack of  home message were suggested by the reviewer and many of minor points, and I totally share their comments. Therefore, I can invite you to submit a revised version of the manuscript that addresses the points raised by the reviewers.

==============================

Please submit your revised manuscript by Sep 10 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Hideyuki Doi

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match.

When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section.

3. Thank you for stating the following in the Acknowledgments Section of your manuscript:

“We thank Silvia Kobel and Aria Minder for technical advice in the laboratory. We thank the Swiss Federal Office for the Environment (BAFU/FOEN) and all the contractors for logistic support and the provision of the eDNA samples. The data analyzed in this paper were generated in collaboration with the Genetic Diversity Centre (GDC), ETH Zurich. Funding for the project (to FA) is from the Swiss National Science Foundation Grant No 31003A_173074 and the Swiss Federal Office for the Environment (BAFU/FOEN).”

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

“The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.”

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

Additional Editor Comments (if provided):

I got the recommendations and comments from two expert reviewers on the field. The both reviewer agree that the manuscript is technically sound and the data support the conclusions. However, lack of home message were suggested by the reviewer and many of minor points, and I totally share their comments. Therefore, I can invite you to submit a revised version of the manuscript that addresses the points raised by the reviewers.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: I am pleased to provide this review of Manuscript “Environmental DNA is comparable to morphology-based indices of macroinvertebrates in a large-scale ecological assessment” by Brantschen et al. The manuscript describes that macroinvertebrate survey using eDNA and traditional kick-sampling methods, and describing the characteristics of each data set and evaluating the river health indices based on each data set, complemented by machine learning.

In particular, the manuscript concisely argues that both of eDNA and traditional methods are each a proxy of true ecosystem status.

The manuscript is well structured therefore main statements are clear. Experimental methods and results are objective. All figures are beautiful and easy to understand. I put several questions and comments based on interest, not criticism. Main suggestion is to describe the characteristics of the sites where IBCH index show a category divergence between eDNA and traditional methods. If your team consider the content regarding my comments worthwhile, please add them to the manuscript.

Reviewer #2: In the article “Environmental DNA is comparable to morphology-based indices of macroinvertebrates in a large-scale ecological assessment” the authors have introduced a machine learning approach in order to calculate water quality indices based on macroinvertebrates composition as bioindicators. I am pleased to see that the approach can be comparable to the traditionally used method. However, I found the take home message a bit diluted through the manuscript and would like to propose some changes that I hope help to improve the current version.

Abstract: I would work a little bit more in the abstract as like it is right now, I don´t think it emphasizes enough what has been done. For example, machine learning approach is introduced at the end of last paragraph and for me it seems like an additional step, more than part of the methods. It is also quite contradictory that it is stated first that eDNA found less indicator taxa but then the indices were congruent? It is a bit difficult to follow if you don´t read the manuscript.

Lines 74-75 This is a very good point, maybe more emphasis on this in the introduction section?

Line 93 Effectivity instead of effective.

Line 101 I miss references here giving example, most sounds a bit vague.

Line 105 Very true, is there a reference to include here or just a personal statement?

Line 132 The predefined taxonomic groups: are not easy to understand.

Line 161 A diagram or figure summarizing sampling details would be really helpful, it can be supplementary or part of the Figure 1.

Line 165 State here how the 2L were taken: were 2L sampling site and n=4, then 500mL per sample/filter?

Line 169 Briefly explain here why upstream, or use a reference.

Line 218 Quantified instead of measured?

Line 246 How was this done?

Lines 247-248 What do you mean by low amplification?

Lines 258-259 I am not in favor of using read counts

Line 303 Why do you use this primer set instead the one developed by Eltbrech?

Line 320 I wouldn´t call this a weak correlation at all.

Line 400-401 The comparison of abundances makes no sense for me.

Lines 419-420 I wouldn´t say this is the explanation, as you are finding less indicator groups when using eDNA.

Lines 432-433 Could you use those non-targeted? Are they relevant? Flag species for example?

Line 444 I am confused about read counts, why did you calculate alpha and gamma div using relative abundances of reads and then not for the indices? I am not sure those diversities are giving relevant information when calculating using eDNA.

Line 466 So, if the machine learning approach is not employed, are the indices comparable?

Line 496 Is it worthy to make both methods complementary? Your reasoning through the manuscript is that the eDNA approach can be comparable, then, why complementary?

I find contradictory stating that one to one comparison is impossible but then directly comparisons are made with the diversity measures.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Noriko Uchida

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: PONE-D-21-15608_reviewer.pdf

PLoS One. 2021 Sep 21;16(9):e0257510. doi: 10.1371/journal.pone.0257510.r002

Author response to Decision Letter 0


1 Sep 2021

Jeanine Brantschen

Überlandstr. 133

Department of Aquatic Ecology

Eawag

jeanine.brantschen@eawag.ch

27th of August 2021

Response letter for our manuscript

Dear Prof. Dr. Hide Doi

Firstly, we would like to thank you for your consideration and positive evaluation of our manuscript entitled “Environmental DNA gives comparable results to morphology-based indices of macroinvertebrates in a large-scale ecological assessment ”.

The two expert reviewers gave very constructive feedback. We carefully considered and implemented their comments and their comments increased the reproducibility and the effective communication of our scientific findings.

Please find below a copy of every comment made by the reviewer and our detailed response, the line numbers refer to the revised manuscript. We further added the revised manuscript and a version with all changes highlighted in yellow. We are confident that our changes have strengthened the manuscript.

On behalf of all authors, I would like to thank you for your time and for considering our study for publication in PLOS ONE. We look forward to hearing your decision.

Yours sincerely,

Jeanine Brantschen, on behalf of all authors

PONE-D-21-15608

Environmental DNA is comparable to morphology-based indices of macroinvertebrates in a large-scale ecological assessment

PLOS ONE

Dear Dr. Brantschen,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Response: Thank you for your positive and highly constructive evaluation, and the offer to submit a carefully revised version. We have addressed all your and the reviewer’s comments and are happy to herewith submit a fully revised version of the manuscript that has addressed and resolved all comments.

I got the recommendations and comments from two expert reviewers on the field. The both reviewer agree that the manuscript is technically sound and the data support the conclusions. However, lack of home messages were suggested by the reviewer and many of minor points, and I totally share their comments. Therefore, I can invite you to submit a revised version of the manuscript that addresses the points raised by the reviewers.

Response: We thank both reviewers for their careful and positive evaluation of our manuscript. We are especially happy to hear that both agree on the soundness and conclusions of our manuscript. We have integrated their helpful comments to increase especially reproducibility and clarity with respect to the generalization of our study.

===========================================================================

Please submit your revised manuscript by Sep 10 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

• A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

• A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

• An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

Response: We uploaded all the required documents named: 'Response to Reviewers', 'Revised Manuscript with Track Changes', and 'Manuscript'. Additionally, we expanded the supporting information file, therefore added the files 'Revised Supporting Information with Track Changes' and the file ' Supporting Information'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

Response: We respectfully decided not to upload our protocols, as all methods are described in detail in the method sections. The raw data are accessible on the open access repository of the European Nucleotide Archive. Furthermore, we happily provide protocols upon request to the corresponding authors.

We look forward to receiving your revised manuscript.

Kind regards,

Hideyuki Doi

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

Response: Thank you for pointing out the style requirements. We adjusted the format of the manuscript to meet the PLOS ONE’s template, we included information on the ‘Supporting information’ including the captions of the supplements after the reference section. Also, we edited the title page, namely the authors and their affiliations, and we removed the keywords and ORCID-IDs to match the journals’ format requirements.

2. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match.

When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section.

Response: We added the information necessary in the updated ‘Funding Information’ section further down to make sure the „Financial Disclosure“ matches the „Funding Information“.

3. Thank you for stating the following in the Acknowledgments Section of your manuscript:

“We thank Silvia Kobel and Aria Minder for technical advice in the laboratory. We thank the Swiss Federal Office for the Environment (BAFU/FOEN) and all the contractors for logistic support and the provision of the eDNA samples. The data analyzed in this paper were generated in collaboration with the Genetic Diversity Centre (GDC), ETH Zurich. Funding for the project (to FA) is from the Swiss National Science Foundation Grant No 31003A_173074 and the Swiss Federal Office for the Environment (BAFU/FOEN).”

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

“The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.”

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

Response: We appreciate the comment and the updating of the online submission form on our behalf. We removed all the funding information from the acknowledgments section. Here, we provide an updated version of the Funding Statement, which reads as follow: „The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Funding for the project (to FA) is provided by the Swiss National Science Foundation Grant No 31003A_173074 and the Swiss Federal Office for the Environment (BAFU/FOEN).”

Additional Editor Comments (if provided):

I got the recommendations and comments from two expert reviewers on the field. The both reviewer agree that the manuscript is technically sound and the data support the conclusions. However, lack of home message were suggested by the reviewer and many of minor points, and I totally share their comments. Therefore, I can invite you to submit a revised version of the manuscript that addresses the points raised by the reviewers.

Response: Thank you for your positive evaluation of our manuscript and for the invitation for resubmission. We clarified and expanded the statements regarding the main take-home messages, and added also a statement about the general importance of our scientific findings in the abstract, the discussion and the conclusion section. We thank the editor and the reviewer for this comment; the alterations improved the communication of the relevance of our study. All further points were considered as detailed in the sections below.  

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

________________________________________

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

________________________________________

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

________________________________________

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

________________________________________

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: I am pleased to provide this review of Manuscript “Environmental DNA is comparable to morphology-based indices of macroinvertebrates in a large-scale ecological assessment” by Brantschen et al. The manuscript describes that macroinvertebrate survey using eDNA and traditional kick-sampling methods, and describing the characteristics of each data set and evaluating the river health indices based on each data set, complemented by machine learning. In particular, the manuscript concisely argues that both of eDNA and traditional methods are each a proxy of true ecosystem status.

The manuscript is well structured therefore main statements are clear. Experimental methods and results are objective. All figures are beautiful and easy to understand. I put several questions and comments based on interest, not criticism. Main suggestion is to describe the characteristics of the sites where IBCH index show a category divergence between eDNA and traditional methods. If your team consider the content regarding my comments worthwhile, please add them to the manuscript.

Response: We would like to thank reviewer #1 for the positive feedback and the constructive comments on our manuscript. We are especially happy to hear that our figures and results are appreciated from a visual and scientific perspective. All suggestions given are very helpful, and we addressed them. Specifically, we implemented responses to the questions in our manuscript and provided the suggested characteristics about all sampled sites in form of a table to the supplementary information. We added the text „More details of the sampling sites can be found in the supplements (Supporting information S1 Table).”

Comment Line 162: Were these sampling sites where the community would be expected to be different on the right and left banks?

Response: In the sampled river systems, we do not expect the communities within a single site to differ between the right and left bank because these rivers and streams are found in landscapes that have very similar land uses on both sides. We added this information to line 147, saying: “In these river systems, communities are not expected to systematically differ between the left and right river banks.”

Comment Line 295: How would you support that using two of four is a reasonable strategy?

Response: We thank the reviewer for this comment. Indeed, which and how using different thresholds for quality filtering of metabarcoding data is an ongoing discussion (see also Mächler et al 2021, Molecular Ecology). Often, a quality-filtering step includes the removal of singletons or the definition of a certain read threshold (see for example: Blackman et al., 2020, nature scientific reports; Leese et al., 2021, Environmental DNA).

Here, we decided to make use of the true replication, and only include records that were detected in at least two independent filter replicates. This decision was based on preliminary analyses, and we added this information. However, based on your comment we repeated the analysis with a less stringent (1 of 4) threshold. The results are both qualitatively and quantitatively very consistent (see additional figure 1 below or in the submitted response document), and the comparison with the kick-net data fit well (adj. R2 for 1 out of 4 threshold: 0.56; adj. R2 for 2 out of 4 threshold: 0.61). This high consistency is a strong indication for the robustness of our approach, and justifies the strategy.

We added a sentence to line 241 to include this: “We tested the sensitivity of our results with respect to the 2 out of 4 threshold, and found that the results are qualitatively and quantitatively consistent with a less stringent threshold.“

Additional Figure 1: Biotic index based on bioindicators. Comparison of the index on the biological state (IBCH index) based on kick-net-derived scores (IBCH index observed) versus the predicted index derived from eDNA data. Here, the eDNA data used the present-absence data of all OTUs present in at least 1 out of 4 replicate filters per site (as opposed to 2 out of 4 in the main manuscript). The linear regression model gives the relationship between observed and predicted values (adj. R2 = 0.5629, p-value < 0.001).

Comment Line 304: Fig 3. Describe what the vertical lines and dots in violin plots are.

Response: We have added the following text to the Figure legend in line 297: “In the violin plots, the black points give A) the mean read number of all sampling sites and B) the mean number of OTUs over all sites; the black vertical lines indicate the 95%-quantiles of all values.”

Comment Line 321: Would the results be the same if the taxa detected one of four was included? Since the replicates in this manuscript is not very robust (two from the right and two from the left bank), it is possible that the indicator taxa are only included in one sample if the right and left banks are heterogeneous environment.

Response: We appreciate the reviewer’s feedback on our replication. As said above, the river systems at hand are intermediate to large in size and anthropogenically impacted (homogenised), and both riverbank sides are generally highly similar. We now added the sentence on these systems having generally homogeneous left and right river bank sides. Thus, we expect that sampling both riverbanks and combining this data will result in a representative coverage of the community. However, based on your comment we also repeated the analysis, and included all data (i.e., also 1 out of 4 threshold). The results of this additional analysis for the Random Forest classification are very similar when using a threshold of 1 out of 4 (adj. R2 = 0.56 versus adj. R2 = 0.61. (See also response above).

Comment Line 367: What are the characteristics of sampling sites that mismatch the results of the ecological state evaluation of traditional sampling method?

Response: This is indeed an intriguing question. We addressed this comment by adding a table with site characteristics of all sites, highlighting the sites that had a mismatch in the random forest classification. There was, however, no obvious environmental parameters associated to those sites. Thus, we cannot say if these mismatches are simply sampling artifacts, temporal changes, or are linked to further unmeasured factors. We added the text „More details of the sampling sites can be found in the supplements (S1 Table), also providing information about the scores, the predictions and the classification for each site.”

Reviewer #2: In the article “Environmental DNA is comparable to morphology-based indices of macroinvertebrates in a large-scale ecological assessment” the authors have introduced a machine learning approach in order to calculate water quality indices based on macroinvertebrates composition as bioindicators. I am pleased to see that the approach can be comparable to the traditionally used method. However, I found the take home message a bit diluted through the manuscript and would like to propose some changes that I hope help to improve the current version.

Abstract: I would work a little bit more in the abstract as like it is right now, I don´t think it emphasizes enough what has been done. For example, machine learning approach is introduced at the end of last paragraph and for me it seems like an additional step, more than part of the methods. It is also quite contradictory that it is stated first that eDNA found less indicator taxa but then the indices were congruent? It is a bit difficult to follow if you don´t read the manuscript.

Response: We appreciate the evaluation of reviewer #2 and would like to thank them for the constructive comments on our manuscript. We implemented changes to the structure of the abstract, in order to emphasize the use of the random forest model, and provided concise conclusions about the relevance of this our study for the field. We are especially thankful for your point about contradictory statements and strengthened the home messages, improving the readability and to put our study in a broader context.

Lines 74-75 This is a very good point, maybe more emphasis on this in the introduction section?

Response: We appreciate the interest in our approach and we addressed this in the introduction in a section previously (lines 98-101), and we expanded on this point by adding the following sentences to the manuscript (lines 101-105): „A fundamental difference is the unit of OTUs vs. species used to calculate biotic indices, as not all OTUs are assigned to species. Taxonomy-free approaches using the genetic diversity covered by the sequencing could inform about important features of communities outside of the classical species concept”.

Line 93 Effectivity instead of effective.

Response: We thank the reviewer for this comment and we changed the sentence to „In the last decade, molecular approaches have proven effectivity for the…“ (line 69).

Line 101 I miss references here giving example, most sounds a bit vague.

Response: To support our statement, we added reference to the section (line 80). This statement is further supported by the examples for individual indicator taxa, in the lines 74-76.

Line 105 Very true, is there a reference to include here or just a personal statement?

Response: We appreciate the feedback and referenced our statement (line 82).

Line 132 The predefined taxonomic groups: are not easy to understand.

Response: We appreciate pointing out the wordy description and we changed the expression to „indicator taxa“ in line 133.

Line 161 A diagram or figure summarizing sampling details would be really helpful, it can be supplementary or part of the Figure 1.

Response: In order to give more sampling details, we added more information about the sampling scheme in form of a table that can be found in the supplementary information S1 Table.

Line 165 State here how the 2L were taken: were 2L sampling site and n=4, then 500mL per sample/filter?

Response: We happily clarified this point by adding the following information to the text: „(500 mL per filter, 2L per site)” in line 149.

Line 169 Briefly explain here why upstream, or use a reference.

Response: We added information to the phrase as an explanation to line 155: „in order to sample undisturbed habitats and to account for downstream transport of DNA.”

Line 218 Quantified instead of measured?

Response: We changed the phrase to „measured“.

Line 246: How was this done?

Response: We added the following sentence to line 233 to clarify: „The threshold was based on the number of reads in controls versus the overall number of reads in all samples for this species, so for every OTU we calculated a minimum number of reads in a sample to count as a detection, lower read numbers were removed.”

Lines 247-248 What do you mean by low amplification?

Response: We clarified this sentence in line 236 by rephrasing to “Further, we checked the read depth of samples compared to the controls.”

Lines 258-259 I am not in favor of using read counts

Response: We agree that read counts need to be cautiously used. We are aware that there is n ongoing debate about how to use them, and it is not our goal to take a strong position in this discussion. We thus state this in the discussion section line 436-438:

„However, reads from a metabarcoding approach cannot be translated into species counts [29,30], and thus we could not directly calculate the IBCH index based on specimen counts for eDNA reads.”

Further, in the analysis of gamma diversity, we use proportions of reads as a proxy of the detectability of the indicators with the eDNA method and we do not use reads for individual species abundances. For the further analysis of alpha diversity of indicator groups and the calculation of the ecological indices, we used presence-absence data, especially as we also agree with you on the caution needed with read numbers from metabarcoding approaches.

Line 303 Why do you use this primer set instead the one developed by Eltbrech?

Response: The primers we use here target the same barcode region like the suggested primers by Elbrecht and were developed by Leray (2013) and Geller (2013). These primers are commonly used for eDNA metabarcoding for the description of macroinvertebrates (see for example Cahill et al., 2018, Ecology & Evolution; Harper et al., 2021, Molecular Ecology, Nguyen et al., 2020, nature scientific reports) and we have used them for previous projects, thus we built on this knowledge of similar data sets.

Line 320 I wouldn´t call this a weak correlation at all.

Response: We appreciate the comment and rephrased the sentence in line 313 to: “A linear model showed little agreement in local richness pattern (adj. R2 = 0.026, p = 0.08) detected by the two methods (S5 Figure).“

Line 400-401 The comparison of abundances makes no sense for me.

We agree that there is a debate on how quantitative read counts are when working with eDNA metabarcoding in order to reflect species abundances in the field. We also agree that this relationship is weak at best. Importantly, the comparison of alpha diversity and the index calculation is based on presence-absence data. We also put emphasis on this aspect of abundance-estimate debate in our discussion line 436-438 and support our statement with recent literature (Elbrecht & Leese, 2015, Plos One; Piñol et al., 2019, Molecular Ecology). In our analysis, we do not use reads to reflect abundances of organisms on a site level, but we use proportions of the sequencing data to reflect the dominance of indicator group in our data.

Lines 419-420 I wouldn´t say this is the explanation, as you are finding less indicator groups when using eDNA.

Response: We thank the reviewer for this comment on our explanation. We clarified our statement by rephrasing this paragraph to the following: „as DNA is distributed homogenously in the water column and thus not all local organisms are detected.”

Lines 432-433 Could you use those non-targeted? Are they relevant? Flag species for example?

Response: We thank the reviewer for his interest in the our discussion. Indeed, the non-targeted OTUs could present features of communities that were neglected in the traditional monitoring. This extra information could be used to inform data-driven approaches, as discussed in the line 483-488. However, robust ecological understanding of the species and the systems is needed, to ground-truth the analysis.

Line 444 I am confused about read counts, why did you calculate alpha and gamma div using relative abundances of reads and then not for the indices? I am not sure those diversities are giving relevant information when calculating using eDNA.

Response: We used presence/absence data for number of indicator families in the calculation of alpha diversities on a site level, and presence-absence data of OTUs assigned to indicators for the calculation of the ecological index. We ensured that this is clearly stated in the method section in the sentence line 246: “After those filter steps, spatial patterns of indicator group richness based on presence-absence of families were plotted using Swissriverplot (v1.28.0).”

Line 466 So, if the machine learning approach is not employed, are the indices comparable?

Response: It would be indeed interesting to compare the directly calculated index to the random forest approach. However, the calculation of the biotic index directly from metabarcoding data is not possible, as reads do not scale with counts. This means the formula used for calculation from kick-net data categorizes counts into discrete classes (<3, 3-10, 10-100, >100). Those values are meaningless when applied to reads. This would mean that the scores calculated from reads would result in score that are not interpretable.

Line 496 Is it worthy to make both methods complementary? Your reasoning through the manuscript is that the eDNA approach can be comparable, then, why complementary? I find contradictory stating that one to one comparison is impossible but then directly comparisons are made with the diversity measures.

Response: In our results, we show that eDNA gives different measures of local diversity (Figure 3) when the taxa groups are restrained to the bioindicators. However, using a universal primer pair we gathered information of organisms not represented in the extant ecological assessment (so called „non-target taxa“). Therefore, we mentioned that eDNA can give complementary information to kick-net sampling on the level of biodiversity. However, when considering the results on the level of the biotic index, the eDNA indices where based on OTUs assigned to the indicator taxa. The random forest model gives comparable results to the kick-net based on family-level of those indicator groups.

To clarify our conclusion from messages, we added the following text to the conclusion line 490-501: „By carrying out a comparison of eDNA sampling with kick-net samples on a large scale, we take a crucial step in advancing the use of molecular methods for direct application in the assessment of the ecological state in routine monitoring programs. This study shows that eDNA sampling from water compared to kick-net data can give different estimates for the composition and diversity of macroinvertebrate communities at a local scale. Importantly, however, when assembling the community data on the level of the biotic states of river systems, both, kick-net and eDNA data indicate very comparable classifications of the ecological state. Thus, while the two methods are complementary at the level of biodiversity estimates, they still give comparable results at the level of ecological indices. Especially for biomonitorings, where the data on community composition and diversity is summarised into a biological index, eDNA can thus be a valid method to recover comparable assessments of ecological integrity.“

Further comments: We changed the acknowledgments in order to thank the two reviewers for their effort and feedback improving our manuscript. We added the following sentence to the Acknowledgement section (line 508): „We thank Noriko Uchida and a second, anonymous, reviewer for their valuable comments and constructive feedback on our manuscript.“

________________________________________

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

Yes, we would be open to publish the review history of our manuscript.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Noriko Uchida

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Response: We uploaded all our 6 figures to PACE and each of them was successfully converted to a valid file (.TIF), meaning the figures correspond to the PLOS requirements.

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 1

Hideyuki Doi

3 Sep 2021

Environmental DNA gives comparable results to morphology-based indices of macroinvertebrates in a large-scale ecological assessment

PONE-D-21-15608R1

Dear Dr. Brantschen,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Hideyuki Doi

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

I carefully checked the revised manuscript as well as the response letter. I agree the revisions according to the reviewers’ comments and now can recommend to publish the paper in this journal.

Reviewers' comments:

Acceptance letter

Hideyuki Doi

10 Sep 2021

PONE-D-21-15608R1

Environmental DNA gives comparable results to morphology-based indices of macroinvertebrates in a large-scale ecological assessment

Dear Dr. Brantschen:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Hideyuki Doi

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File. Additional information (S1-S7) supporting the study entitled “environmental DNA gives comparable results to morphology-based indices of macroinvertebrates in a large-scale ecological assessment”.

    (DOCX)

    Attachment

    Submitted filename: PONE-D-21-15608_reviewer.pdf

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    All raw Illumina sequencing data files are available from the European Nucletoide Archive (project number PRJEB44539).


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES