Abstract
Background
This paper describes two datasets: species occurrences, which were determined by environmental DNA (eDNA) metabarcoding and their associated DNA sequences, originating from a research project which was carried out along the Houdong River (猴洞坑), Jiaoxi Township, Yilan, Taiwan. The Houdong River begins at an elevation of 860 m and flows for approximately 9 km before it empties into the Pacific Ocean. Meandering through mountains, hills, plains and alluvial valleys, this short river system is representative of the fluvial systems in Taiwan. The primary objective of this study was to determine eukaryotic species occurrences in the riverine ecosystem through the use of the eDNA analysis. The second goal was, based on the current dataset, to establish a metabarcoding eDNA data template that will be useful and replicable for all users, particularly the Taiwan community. The species occurrence data are accessible at the Global Biodiversity Information Facility (GBIF) portal and its associated DNA sequences have been deposited in the European Nucleotide Archive (ENA) at EMBL-EBI, respectively. A total of 12 water samples from the study yielded an average of 1.5 million reads. The subsequent species identification from the collected samples resulted in the classification of 432 Operational Taxonomic Units (OTUs) out of a total of 2,734. Furthermore, a total of 1,356 occurrences with taxon matches in GBIF were documented (excluding 4,941 incertae sedis, accessed 05-12-2023). These data will be of substantial importance for future species and habitat monitoring within the short river, such as assessment of biodiversity patterns across different elevations, zonations and time periods and its correlation to water quality, land uses and anthropogenic activities. Further, these datasets will be of importance for regional ecological studies, in particular the freshwater ecosystem and its status in the current global change scenarios.
New information
The datasets are the first species diversity description of the Houdong River system using either eDNA or traditional monitoring processes.
Keywords: river ecosystem, species occurrence, metabarcoding, cytochrome c oxidase I gene, eukaryota
Introduction
Environmental DNA (eDNA) metabarcoding is an emerging tool that can provide an accurate and comprehensive representation of biotic communities (Beng and Corlett 2020, Carvalho et al. 2021, Doi et al. 2021) by identifying multiple lineages from a single environmental sample (Taberlet et al. 2012). While the earliest uses of eDNA specifically focused on RNA detection of microbial communities (Ogram et al. 1987), in the last decade, eDNA has grown in scope to include macrobial communities and, as a result, has become an important tool in biodiversity assessments, biomonitoring and identifying temporal shifts in community assemblages of both terrestrial and aquatic ecosystems (Bista et al. 2017, Jeunen et al. 2019, West et al. 2020, Pawlowski et al. 2020, Burian et al. 2021, Gregorič et al. 2022).
Ecosystems are replete with the genetic material of both resident and transient species. This genetic material includes organismal and extra-organismal DNA from microorganisms and macroorganisms in the form of faeces, shed skin or hair, carcasses and living bodies (Stewart 2019). Using genomic approaches to detect these ‘genetic footprints’ of organism life from environmental samples, for example, water, snow, soil, air or leaf swabs (Stewart 2019), provides an opportunity to increase the detection rate (Goldberg et al. 2016) and increase temporal resolution (Ogram et al. 1987, Pawlowski et al. 2020, Taberlet et al. 2012, Alexander et al. 2023Taberlet et al. 2012,) of biological studies. Notably, eDNA analyses surpass conventional methods in terms of precision and resolution (e.g. Nakagawa et al. (2018), Rourke et al. (2021)). Furthermore, they exhibit greater efficiency in the context of both cost and time when compared to conventional biological monitoring approaches. These factors are crucial in advancing long-term ecological research on a global scale (Evans et al. 2017, Jerde 2019, Larson et al. 2020, Bruel and White 2021).
The application of eDNA for species composition assessments is flexible depending on the organisms of interest and may function as a general or targeted instrument (e.g. Thomas et al. (2019), Beng and Corlett (2020), Burian et al. (2021), Carvalho et al. (2021), Alexander et al. (2023)). Targeted analyses aim to identify the presence or absence of a specific focal species (i.e. “is the targeted species here”), while general analyses aim to identify community composition (i.e. “what species are here”). For both aims, by relying on the genetic footprints and not direct observation, eDNA can be superior to traditional methods by allowing for higher resolution of community assemblage (e.g. Alexander et al. (2022)); the enhanced detection of rare, migratory, elusive and cryptic species (e.g. Barnes et al. (2014), Beng and Corlett (2020), Rourke et al. (2021), Shen et al. (2022)) and the detection of community shifts (e.g. DiBattista et al. (2020)) and small scale spatial assemblage changeovers (e.g. Jeunen et al. (2019)). Environmental sampling, followed by eDNA analyses, provides a powerful approach for identifying a more comprehensive community complexion, which is crucial for researchers.
An ecosystem examination utilising eDNA necessitates exchanging the information collected in addition to the noted benefits and significance. For instance, there is undeniable value in sharing eDNA data in accordance with the FAIR (Findable, Accessible, Interoperable and Reusable) principles, thereby enriching our understanding of global biodiversity. The efforts of international organisations committed to advancing and promoting instruments for the exchange of eDNA data are noticeable in the discussions that commenced at the TDWG conference (Suominen et al. 2023).
Hence, in this paper, we describe two datasets:
A species occurrence dataset derived from the eDNA analysis of surface water from the Houdong River in north-eastern Taiwan and
their associated DNA sequences.
We aimed to determine community composition and diversity changes along a 6.5 km stretch of the river as it passes from headwaters (primary subtropical forest) through residential areas, aquaculture farms and agricultural fields (e.g. rice farm), before reaching the estuary/river mouth. This is the first known freshwater eDNA dataset originating from a representative turbulent river in Taiwan and will be important as baseline data for further studies and environmental monitoring of this ecosystem. Given the inexperienced DNA open data attempt within the Taiwan community, the study collaborates with the Taiwan Biodiversity Information Facility (TaiBIF) to establish a data template for the eDNA open data workflow. TaiBIF is amongst the most active data hosting centres and nodes of the Global Biodiversity Information Facility (GBIF). We expect that, through this collaboration, we can promote the dissemination and use of eDNA and DNA metabarcoding datasets from the Taiwan community.
Sampling methods
Study extent
This was a one-time sampling event of water samples from four sites along the Houdong River in Jiaoxi Township, Yilan County, Taiwan (Fig. 1). The Houdong River is a popular tourist destination running across the Jiaoxi Township. The river system originates east of the Sidu Mountains and flows through primary and secondary forests, agricultural lands (rice), aquaculture farms and developed areas (light industrial and residential) until it eventually drains into the Pacific Ocean. Water samples were used for eDNA analysis and measurement of in-situ water quality.
Figure 1.
The four water sampling locations along the Houdong riverine system in Yilan County, Taiwan.
Sampling description
Water samples for eDNA were collected from near the surface of the river. Before the collection, the water containers were rinsed with the local water at each sampling site. Approximately 3 litres of water were collected for eDNA analysis. Some 200 ml of the collected water was used for water quality measurements. Water quality measurements were made using multiple hand-held probes on-site at each sampling site. Temperature, pH and dissolved oxygen were measured with a multi-parameter meter (Multiline® Multi 3620 IDS, WTW, Weilheim, Germany) equipped with an IDS pH electrode (SenTix 940, WTW) and an optical IDS dissolved oxygen sensor (FDO® 925, WTW). Turbidity was measured by a turbidity meter (TUB-430, EZDO, Taiwan). Salinity was determined by salinity refractometer (2491 MASTER-S/Milla Salinity Refractometer, Atago, Japan). Three replicate measurements of each parameter were taken. Water samples were transported to the Marine Research Station (MRS, Yilan County, Taiwan) of the Institute of Cellular and Organismic Biology for sample filtration. The water was first filtered through a 75 µm pore size sieve to eliminate larger particles. Afterwards, a 1 litre water sample from each site was filtered through a 0.22 µm filter and the sample kept on top of the filter membrane, under vacuum compression (PC651-0024, GeneDireX, USA). The filter membranes were placed in sterile Petri dishes and stored at -80°C until DNA extraction.
Step description
Wet lab process
DNA was extracted at the Biodiversity Research Center (Academia Sinica, Taipei, Taiwan). Each filtered membrane was cut into quarters. Three of the four pieces of filtered membranes were used in the study as three experimental replicates. The final quarter was saved as the sample backup. DNA from each quarter membrane piece was extracted using the Presto™ Stool DNA Extraction Kit (STLD100, Geneaid Biotech Ltd., Taiwan) following the manufacturer's instructions (Instruction Manual Ver. 10.21.17). The quality and quantity of the extracted DNA was assessed using a Nanodrop 2000 (Thermo Fisher Scientific Inc., USA) and the Qubit 4 dsDNA High Sensitivity Assay Kit (Thermo Fisher Scientific Inc., USA).
The MinibarF1 (5'TCCACTAATCACAARGATATTGGTAC) and MinibarR1 (5'GAAAATCATAA TGAAGGCATGAGC) primers that were designed by Meusnier et al. (2008) were used to amplify the 5' region (ca. 120-150 bp) of the mitochondrial Cytochrome c oxidase I (COI) gene. The universality of the primers was recommended for distinguishing the highly diverse DNA from the environmental mixture. We conducted PCR using a one-step single-indexed approach, with a 13 bp tag attached to the MinibarR1 primer. The PCR reaction volume was 16 μl, which included 8 μl KAPA HiFi HotStart ReadyMix (KK2602, Roche Molecular Systems Inc., USA), 5 μl ddH20, 1 μl of each primer (10μM) and 1 μl of DNA template. To optimise the protocol, we performed a preliminary PCR using an annealing temperature gradient and found that 54°C gave the best results. The PCR mixture was denatured at 95°C for 15 minutes, followed by 35 cycles of 94°C for 30 seconds, 54°C for 30 seconds and a final elongation at 72°C for 10 minutes.
The PCR products were checked on a 1.5% agarose gel and quantified with the Invitrogen Qubit 4 fluorometer (Thermo Fisher Scientific Inc., USA). Afterwards, all the PCR products were pooled in one tube for next-generation sequencing. Sequencing was performed on the Illumina NovaSeq 6000 platform with 2*150 paired-end reads by Genomics Co., Taipei, Taiwan.
Data processing and analysis
The Illumina raw reads were demultiplexed by Genomics Co., Taipei, Taiwan. FastQC (v.0.11.9; https://github.com/s-andrews/FastQC) was used to check quality. The forward and reverse primers of the demultiplexed reads were trimmed using Cutadapt (version 4.2; Martin (2011)). The USEARCH platform (v.11.0.667; Edgar (2010)) was used to verify if the primer sequences were completely removed from the demultiplexed reads. The "denoised-paired" function was used to create an amplicon sequence variant (ASV) data output from the demultiplexed reads. The "denoise-paired" function in QIIME2 (v.2023.2.0; Bolyen et al. (2019)) can automatically trim, filter, denoise, merge reads and remove chimeric reads in one step. The maximum expected error of forward and reverse reads was set to 1.0. The reads with a quality score of less than 20 were truncated. The minimum overlap length for the forward and reverse reads merger was set to 16 bp. Other parameters followed the default settings in the "denoise-paired" function. No reads were trimmed or truncated during the ASV creation process. The ASV output was then further clustered through the "cluster-features-de-novo" function provided by QIIME 2. ASVs with more than 97% sequence identity were clustered into one operational taxonomic unit (OTU). The sequences that were shorter than 100 bp in the ASV and OTU results were discarded. Taxonomic assignments were conducted using Constax (v.2.0.18; Liber et al. (2021)) against the MIDORI database (v.GB250; Machida et al. (2017)). The R package phyloseq was used to analyse the preprocessed sequencing data (v.1.40.0; McMurdie and Holmes (2013)). The piechart figure was produced with ggplot2 (v.3.4.2; Villanueva and Chen (2019)). Lowest taxon level-annotation of each OTU was extracted to perform a secondary species mapping to the GBIF Backbone Taxonomy (GBIF Secretariat 2011) on 05-12-2023 using the "name_backbone_checklist" function in R package rgbif (v.3.7.7; Chamberlain et al. (2022), R Core Team (2023)). The map was visualised using the R packages ggplot2 (v.3.4.2; Villanueva and Chen (2019)) and annotated using ggspatial (v.1.1.9; https://paleolimbot.github.io/ggspatial/), metR (v.0.14.0; https://github.com/eliocamp/metR) and ggrepel (v.0.9.3; https://CRAN.R-project.org/package=ggrepel).
Open data and code
Two datasets were associated with this study: DNA sequence data and occurrence data (see Data resources). We converted the occurrence data into Darwin Core Archive standard (Darwin Core Task Group 2009) and validated the datasheet using the GBIF Data Validator (Global Biodiversity Information Facility 2017). We then published the dataset containing one occurrence core (i.e. foundational part of the dataset with information about each occurrence) and one DNA-derived data extension (Hoh 2023) using the Integrated Publishing Toolkit (IPT) of GBIF installed under the Taiwan Biodiversity Information Facility (TaiBIF). We have included three supplementary files to help describe the dataset. They are the attributes of the sampling event (Suppl. material 1), the water quality measurements (Suppl. material 2) and the relationship of each technical sample to the sampling event (Suppl. material 3). These files were attached as the current GBIF data model schema does not support event core matching with a DNA-derived data extension. All source code used in the project can be found in the project's GitHub repository.
Geographic coverage
Description
We selected four sites along the Houdong River (猴洞坑), Jiaoxi Township, Yilan County, Taiwan (Table 1) for water sample collections and in-situ water quality measurements: Upstream waterfall (WF), downstream river (FR), estuary (ES) and river mouth (RM). These four sites spanned a river length of 6.5 km.
Table 1.
Coordinates of the four sampling sites.
| Station | North | East |
|---|---|---|
| 猴洞坑瀑布 | Upstream waterfall (WF) | 24.843580 | 121.781830 |
| 猴洞溪 | Downstream river (FR) | 24.835094 | 121.799932 |
| 下埔排水線 | Estuary (ES) | 24.835900 | 121.818660 |
| 竹安出海口 | River mouth (RM) | 24.840520 | 121.826640 |
Coordinates
24.824 and 24.871 Latitude; 121.768 and 121.846 Longitude.
Taxonomic coverage
Description
We detected eukaryotic organisms in the water samples using the COI mitochondrial gene. A total of 2,736 OTUs were identified and 421 of the OTUs were assigned to at least the kingdom level using the MIDORI database (v.GB250; Machida et al. (2017); Fig. 2). On GBIF, this dataset (Hoh 2023) consists of a total of 6,297 occurrences with 22% (1,356 occurrences; last accessed 05-12-2023) having a taxon match on the GBIF Backbone Taxonomy, with remaining occurrences being assigned as incertae sedis (i.e. taxa unknown). The species occurrence dataset was standardised and presented in GBIF annotated Darwin Core Archive (see Data resources), grouped by sampling events (i.e. sites and identifiable via the eventID column).
Figure 2.
The taxonomic ranking of 432 classified operational taxonomic units (OTUs). The colours represent different kingdoms.
Taxa included
| Rank | Scientific Name | |
|---|---|---|
| kingdom | Animalia | |
| kingdom | Chromista | |
| kingdom | Fungi | |
| kingdom | Plantae | |
| kingdom | Protozoa | |
| phylum | Amoebozoa | |
| phylum | Bryophyta | |
| phylum | Cryptophyta | |
| phylum | Nemertea | |
| phylum | Sulcozoa | |
| phylum | Annelida | |
| phylum | Bryozoa | |
| phylum | Gastrotricha | |
| phylum | Ochrophyta | |
| phylum | Tracheophyta | |
| phylum | Arthropoda | |
| phylum | Chaetognatha | |
| phylum | Glomeromycota | |
| phylum | Oomycota | |
| phylum | Zygomycota | |
| phylum | Ascomycota | |
| phylum | Chlorophyta | |
| phylum | Haptophyta | |
| phylum | Platyhelminthes | |
| phylum | Basidiomycota | |
| phylum | Chordata | |
| phylum | Mollusca | |
| phylum | Rhodophyta | |
| phylum | Blastocladiomycota | |
| phylum | Cnidaria | |
| phylum | Mycetozoa | |
| phylum | Rotifera |
Temporal coverage
Notes
This was a one-time sampling of water samples and corresponding water quality parameters on 28-04-2022.
Usage licence
Usage licence
Other
IP rights notes
Datasets produced by the current work are licensed under a Creative Commons Attribution (CC-BY) 4.0 Licence.
Data resources
Data package title
eDNA along Houdong riverine zonation in Taiwan
Number of data sets
2
Data set 1.
Data set name
eDNA along riverine zonation of Houdong River, Yilan, Taiwan [Project ID: PRJEB60905]
Data format
Genomic Standard Consortium MIxS water package
Download URL
Data format version
mixs6.1.0
Description
DNA sequence data have been deposited on ENA at EMBL-EBI under accession number PRJEB60905 following the Genomic Standard Consortium MIxS standard (Yilmaz et al. 2011). Below are described the nine default columns under the 'Read Files' section on the project (or dataset) page, which can also be obtained from downloading the TSV report on the Project page.
Data set 1.
| Column label | Column description |
|---|---|
| study_accession | The project accession number created by ENA for this submission (PRJEB60905 for this dataset). |
| sample_accession | The sample accession number created by ENA for this submission. A total of 12 Biosamples (comprised of three replicates from each of the four sampling sites) were registered. Each accession from the link https://www.ebi.ac.uk/ena/browser/view/[sample_accession] describes basic information about the sample following the MIxS standard. |
| experiment_accession | The experiment accession number created by ENA for this submission. A total of 12 Experiments were registered. Each accession from the link https://www.ebi.ac.uk/ena/browser/view/[experiment_accession] describes sequencing instrument and library-associated information. |
| run_accession | The run accession number created by ENA for this submission. A total of 12 Runs were registered. Each accession from the link https://www.ebi.ac.uk/ena/browser/view/[run_accession] contains read and base count information. |
| tax_id | Taxon ID in ENA. Since this is a metabarcoding study, all entries are 256318, which corresponds to 'metagenome'. |
| scientific_name | Since this is a metabarcoding study, scientific name is not applicable and hence all entries are 'metagenome'. |
| fastq_ftp | The FTP link to download DNA reads obtained from each Run. This is the ENA Archived Generated File as described here. The format of the file is a gunzip-compressed FASTQ file. Two FTP links were provided that separate the forward and reverse reads from each paired experiment by [run_accession]_1.fastq.gz and [run_accession]_2.fastq.gz, respectively. |
| submitted_ftp | The FTP link to download DNA reads uploaded to ENA by the submitter before automation curation resulting to fastq_ftp. |
| bam_ftp | The FTP link to download the BAM file of each Run. |
Data set 2.
Data set name
eDNA along Houdong riverine zonation in Taiwan
Data format
Darwin Core Archive
Download URL
https://www.gbif.org/dataset/2615342d-7349-4e75-ae34-cda6cb403e2e ; https://ipt.taibif.tw/archive.do?r=houdongkeng_water_edna
Data format version
2021-07-15
Description
There are two links in the Download URL. The first links to the download page of the GBIF annotated Darwin Core Archive and the second links to the Source Darwin Core Archive from the TaiBIF IPT. The second link is provided because the DNA-derived data extension associated with the occurrence datasheet is not available through the GBIF-annotated Darwin Core Archive download option, although the extension is included in the source archive available either through the GBIF webpage for the dataset or directly from the TaiBIF IPT. Downloading from both links gives GZ-compressed files containing the occurrence core and DNA-derived data extension files in TXT format. The below table describes a total of 76 data fields from both the occurrence core and DNA derived data extension, sorted alphabetically. The data field descriptions are written as listed in the List of Darwin Core terms (accessed April 2023; Darwin Core Task Group (2009)), but modified as needed if applicable to the current study context. The occurrence core datasheet can also be downloaded via GBIF API-based tools such as rgbif (Chamberlain et al. 2022) for further analyses.
Data set 2.
| Column label | Column description |
|---|---|
| ampliconSize | The length of the amplicon in basepairs. |
| amplificationReactionVolume | PCR reaction volume |
| amplificationReactionVolumeUnit | Unit used for PCR reaction volume. |
| basisOfRecord | The specific nature of the data record. |
| class | The full scientific name of the class in which the taxon is classified. |
| concentration | Concentration of DNA (weight ng/volume µl). |
| concentrationUnit | Unit used for concentration measurement. |
| continent | The name of the continent in which the Location occurs. |
| coordinateUncertaintyInMetres | The horizontal distance (in metres) from the given decimalLatitude and decimalLongitude describing the smallest circle containing the whole of the Location. |
| country | The name of the country in which the Location occurs. |
| countryCode | The standard code for the country in which the Location occurs. |
| county | The full, unabbreviated name of the smaller administrative region in which the Location occurs. |
| datasetName | The name identifying the dataset from which the record was derived. |
| dateIdentified | The date on which the subject was determined as representing the Taxon. |
| day | The integer day of the month on which the Event occurred. |
| decimalLatitude | The geographic latitude (in decimal degrees, using the spatial reference system given in geodeticDatum) of the geographic centre of a Location. |
| decimalLongitude | The geographic longitude (in decimal degrees, using the spatial reference system given in geodeticDatum) of the geographic centre of a Location. |
| DNA_sequence | The DNA sequence. |
| env_broad_scale | The major environmental system from which the sample came. ENVO's biome subclasses determined in https://ontobee.org/ontology/ENVO. |
| env_local_scale | The entity in which the sample's local vicinity, smaller spatial grain than the entry in env_broad_scale. ENVO's biome subclasses determined in https://ontobee.org/ontology/ENVO. |
| env_medium | Environmental material immediately surrounded the sample prior to sampling, using subclasses of ENVO’s environmental material class determined in https://ontobee.org/ontology/ENVO. |
| eventDate | The date-time when the event was recorded (i.e. water sampling time). |
| eventID | An identifier associated with an sampling event. |
| eventTime | The time when the event was recorded (i.e. water sampling time). |
| experimental_factor | The variable aspects of an experiment design to describe the experiment. The ontology terms determined from Experimental Factor Ontology (EFO). |
| family | The full scientific name of the family in which the taxon is classified. |
| genus | The full scientific name of the genus in which the taxon is classified. |
| geodeticDatum | The full scientific name of the genus in which the taxon is classified. |
| habitat | A category or description of the habitat in which the Event occurred. Using subclasses of ENVO’s environmental material class determined in https://ontobee.org/ontology/ENVO. |
| higherClassification | Taxa names terminating at the rank immediately superior to the taxon referenced in the taxon record. Current metabarcoding study was targeting the eukaryotic organism and hence the entry is "Eukaryota". |
| identificationReferences | Publication reference used in the Identification. |
| identificationRemarks | Comments or notes about the Identification. |
| kingdom | The full scientific name of the kingdom in which the taxon is classified. |
| lib_layout | Specify whether to expect single, paired or other configuration of reads. |
| licence | A legal document giving official permission to do something with the resource. |
| locationID | An identifier for the set of location information as listed in Table 1. |
| materialSampleID | An identifier for the MaterialSample. |
| methodDeterminationConcentrationAndRatios | Method used for concentration measurement. |
| month | The integer month in which the Event occurred. |
| nucl_acid_amp | A link to electronic resource that describes the PCR amplification of specific nucleic acids. |
| nucl_acid_ext | A link to electronic resource that describes the material separation to recover the nucleic acid fraction from a sample. |
| occurrenceID | An identifier for the Occurrence. Format: [samp_name]:OTU[number]. |
| occurrenceStatus | A statement about the presence or absence of a Taxon at a Location. |
| order | The full scientific name of the order in which the taxon is classified. |
| organismQuantity | Number of reads of this OTU in the sample. |
| organismQuantityType | DNA sequence reads. |
| otu_class_appr | Cutoffs and approach used when clustering the "species-level" OTUs. |
| otu_db | Reference database for "species-level" OTUs. |
| otu_seq_comp_appr | Tool and thresholds used to compare sequences when computing "species-level" OTUs. |
| pcr_cond | Description of reaction conditions and components of PCR. |
| pcr_primer_forward | Forward PCR primer that were used to amplify the sequence of the targeted gene. |
| pcr_primer_name_forward | Name of the forward PCR primer used. |
| pcr_primer_name_reverse | Name of the reverse PCR primer used. |
| pcr_primer_reference | Reference for the PCR primers that were used to amplify the sequence of the targeted gene. |
| pcr_primer_reverse | Reverse PCR primer that were used to amplify the sequence of the targeted gene. |
| pcr_primers | PCR primers that were used to amplify the sequence of the targeted gene, locus or subfragment. |
| phylum | The full scientific name of the phylum or division in which the taxon is classified. |
| preparations | Preparations methods for the sample. |
| project_name | Name of the project within which the sequencing was organised. |
| rightsHolder | The organisation owning or managing rights over the resource. |
| samp_mat_process | The processing applied to the sample after retrieving the sample from environment. |
| samp_name | Unique sample name for each sample. Starts with abbreviation of sampling site as listed in Table 1 and ends with sample replicate number. |
| samp_size | Amount of sample (volume) that was collected. |
| sampleSizeUnit | DNA sequence reads. |
| sampleSizeValue | Total number of reads in the sample. |
| samplingProtocol | The names of, references to, or descriptions of the methods or protocols used during an Event. |
| scientificName | The full scientific name in the lowest level taxonomic rank that can be determined. The content in this field was obtained from secondary mapping of the lowest taxonomic rank in verbatimIdentification to the GBIF Backbone Taxonomy (see Methodology). |
| seq_meth | Sequencing method used. |
| size_frac | Filtering pore size used in sample preparation. |
| target_gene | Targeted gene or locus name for marker gene studies. |
| tax_ident | The phylogenetic marker(s) used to assign an organism name. |
| taxonRank | The taxonomic rank of the most specific name in the scientificName. |
| type | The nature or genre of the resource. |
| verbatimIdentification | The taxonomic identification of otu_db. |
| verbatimLocality | The original textual description of the place. |
| year | The four-digit year in which the Event occurred, according to the Common Era Calendar. |
Supplementary Material
Sampling event data
Daphne Z. Hoh
Data type
event
Brief description
A TSV datasheet in Darwin Core Archive format describing the four sampling events along the river.
File: oo_947548.tsv
Water quality measurements in each sampling event
Daphne Z. Hoh
Data type
measurement or fact
Brief description
A TSV datasheet in Darwin Core Archive format describing the water quality measurements in the four sampling events along the river.
File: oo_947549.tsv
Sample relationship
Daphne Z. Hoh
Data type
resource relationship
Brief description
A TSV datasheet in Darwin Core Archive format describing the relationship of each technical sample to the four sampling events.
File: oo_947550.tsv
Acknowledgements
This research was produced by the students in the Taiwan International Graduate Program (TIGP) Signature Course-Ecology Masterclass @ Taiwan (EMT course). Support was provided by the TIGP (Academia Sinica, Taipei, Taiwan), Biodiversity Research Center (Academia Sinica, Taipei, Taiwan), Marine Research Station (MRS) of the Institute of Cellular and Organismic Biology (Academia Sinica, Yilan, Taiwan) and Taiwan Biodiversity Information Facility (TaiBIF). We would like to thank Tzi-Yuan Wang for the lecture and teaching assistance. We would also like to thank Kiran Kumar Eripogu, Alyzza Calayag and Yue Rong Tan for assisting with the sample collection and DNA extraction process. We are grateful to all the members who arranged and participated in the EMT course.
Contributor Information
Min-Chen Wang, Email: mcwinlab@gmail.com.
Daphne Z. Hoh, Email: daphnehohzhiwei@gmail.com.
Author contributions
Conceptualisation: Min-Chen Wang, Ling Chiu and Yung-Che Tseng; Sample collection: Min-Chen Wang, Ling Chiu, Mark Angelo C. Bucay and Chung-Hsin Huang; Methodology and formal analysis: Chieh-Ping Lin, Chung-Hsin Huang, Cheng-Wei Chen, Daphne Z. Hoh and Min-Chen Wang; Investigation: Min-Chen Wang, Ling Chiu, Trevor Padgett, Daphne Z. Hoh, Mark Angelo C. Bucay, Chieh-Ping Lin, Chung-Hsin Huang, Cheng-Wei Chen, Zong-Yu Shen, Min-Chen Wang and John Wang; Visualisation: Chieh-Ping Lin and Daphne Z. Hoh; Writing—original draft preparation: Trevor Padgett, Daphne Z. Hoh, Mark Angelo C. Bucay, Chieh-Ping Lin, Chung-Hsin Huang, Cheng-Wei Chen; Writing—review and editing: Min-Chen Wang, Yung-Che Tseng, Jr-Kai Yu and John Wang; Supervision: Min-Chen Wang, Jr-Kai Yu and John Wang; Resources: Yung-Che Tseng, Jr-Kai Yu and John Wang; Funding acquisition: Jr-Kai Yu and John Wang. All authors commented on and approved the manuscript.
References
- Alexander Jason B., Marnane Michael J., Elsdon Travis S., Bunce Michael, Songploy Se, Sitaworawet Paweena, Harvey Euan S. Complementary molecular and visual sampling of fish on oil and gas platforms provides superior biodiversity characterisation. Marine Environmental Research. 2022;179 doi: 10.1016/j.marenvres.2022.105692. [DOI] [PubMed] [Google Scholar]
- Alexander Jason B., Marnane Michael J., McDonald Justin I., Lukehurst Sherralee S., Elsdon Travis S., Simpson Tiffany, Hinz Shawn, Bunce Michael, Harvey Euan S. Comparing environmental DNA collection methods for sampling community composition on marine infrastructure. Estuarine, Coastal and Shelf Science. 2023;283 doi: 10.1016/j.ecss.2023.108283. [DOI] [Google Scholar]
- Barnes Matthew A., Turner Cameron R., Jerde Christopher L., Renshaw Mark A., Chadderton W. Lindsay, Lodge David M. Environmental Conditions Influence eDNA Persistence in Aquatic Systems. Environmental Science & Technology. 2014;48(3):1819–1827. doi: 10.1021/es404734p. [DOI] [PubMed] [Google Scholar]
- Beng Kingsly C., Corlett Richard T. Applications of environmental DNA (eDNA) in ecology and conservation: opportunities, challenges and prospects. Biodiversity and Conservation. 2020;29(7):2089–2121. doi: 10.1007/s10531-020-01980-0. [DOI] [Google Scholar]
- Bista Iliana, Carvalho Gary R., Walsh Kerry, Seymour Mathew, Hajibabaei Mehrdad, Lallias Delphine, Christmas Martin, Creer Simon. Annual time-series analysis of aqueous eDNA reveals ecologically relevant dynamics of lake ecosystem biodiversity. Nature Communications. 2017;8(1) doi: 10.1038/ncomms14087. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bolyen Evan, Rideout Jai Ram, Dillon Matthew R, Bokulich Nicholas A, Abnet Christian C, Al-Ghalith Gabriel A, Alexander Harriet, Alm Eric J, Arumugam Manimozhiyan, Asnicar Francesco, Bai Yang, Bisanz Jordan E, Bittinger Kyle, Brejnrod Asker, Brislawn Colin J, Brown C Titus, Callahan Benjamin J, Caraballo-Rodríguez Andrés Mauricio, Chase John, Cope Emily K, Da Silva Ricardo, Diener Christian, Dorrestein Pieter C, Douglas Gavin M, Durall Daniel M, Duvallet Claire, Edwardson Christian F, Ernst Madeleine, Estaki Mehrbod, Fouquier Jennifer, Gauglitz Julia M, Gibbons Sean M, Gibson Deanna L, Gonzalez Antonio, Gorlick Kestrel, Guo Jiarong, Hillmann Benjamin, Holmes Susan, Holste Hannes, Huttenhower Curtis, Huttley Gavin A, Janssen Stefan, Jarmusch Alan K, Jiang Lingjing, Kaehler Benjamin D, Kang Kyo Bin, Keefe Christopher R, Keim Paul, Kelley Scott T, Knights Dan, Koester Irina, Kosciolek Tomasz, Kreps Jorden, Langille Morgan G I, Lee Joslynn, Ley Ruth, Liu Yong-Xin, Loftfield Erikka, Lozupone Catherine, Maher Massoud, Marotz Clarisse, Martin Bryan D, McDonald Daniel, McIver Lauren J, Melnik Alexey V, Metcalf Jessica L, Morgan Sydney C, Morton Jamie T, Naimey Ahmad Turan, Navas-Molina Jose A, Nothias Louis Felix, Orchanian Stephanie B, Pearson Talima, Peoples Samuel L, Petras Daniel, Preuss Mary Lai, Pruesse Elmar, Rasmussen Lasse Buur, Rivers Adam, Robeson Michael S, Rosenthal Patrick, Segata Nicola, Shaffer Michael, Shiffer Arron, Sinha Rashmi, Song Se Jin, Spear John R, Swafford Austin D, Thompson Luke R, Torres Pedro J, Trinh Pauline, Tripathi Anupriya, Turnbaugh Peter J, Ul-Hasan Sabah, van der Hooft Justin J J, Vargas Fernando, Vázquez-Baeza Yoshiki, Vogtmann Emily, von Hippel Max, Walters William, Wan Yunhu, Wang Mingxun, Warren Jonathan, Weber Kyle C, Williamson Charles H D, Willis Amy D, Xu Zhenjiang Zech, Zaneveld Jesse R, Zhang Yilong, Zhu Qiyun, Knight Rob, Caporaso J Gregory. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature biotechnology. 2019;37(9):1091. doi: 10.1038/s41587-019-0252-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bruel Rosalie, White Easton R. Sampling requirements and approaches to detect ecosystem shifts. Ecological Indicators. 2021;121 doi: 10.1016/j.ecolind.2020.107096. [DOI] [Google Scholar]
- Burian Alfred, Mauvisseau Quentin, Bulling Mark, Domisch Sami, Qian Song, Sweet Michael. Improving the reliability of eDNA data interpretation. Molecular Ecology Resources. 2021;21(5):1422–1433. doi: 10.1111/1755-0998.13367. [DOI] [PubMed] [Google Scholar]
- Carvalho Carolina S., de Oliveira Marina Elisa, Rodriguez‐Castro Karen Giselle, Saranholi Bruno H., Galetti Pedro M. Efficiency of eDNA and iDNA in assessing vertebrate diversity and its abundance. Molecular Ecology Resources. 2021;22(4):1262–1273. doi: 10.1111/1755-0998.13543. [DOI] [PubMed] [Google Scholar]
- Chamberlain Scott, Oldoni Damiano, Geffert Laurens, Desmet Peter, Barve Vijay, Ram Karthik, Blissett Matt, Waller John, McGlinn Dan, Ooms Jeroen, Ye Steven, Oksanen Jari, Marwick Ben, John, Sumner Michael, Sriram ropensci/rgbif. [2022-02-09T00:00:00+02:00];2022 doi: 10.5281/zenodo.6023735. v3.7.0. [DOI]
- Group Darwin Core Task. Biodiversity Information Standards (TDWG); 2009. Darwin Core. [Google Scholar]
- DiBattista Joseph D., Reimer James D., Stat Michael, Masucci Giovanni D., Biondi Piera, De Brauwer Maarten, Wilkinson Shaun P., Chariton Anthony A., Bunce Michael. Environmental DNA can act as a biodiversity barometer of anthropogenic pressures in coastal ecosystems. Scientific Reports. 2020;10(1) doi: 10.1038/s41598-020-64858-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Doi Hideyuki, Inui Ryutei, Matsuoka Shunsuke, Akamatsu Yoshihisa, Goto Masuji, Kono Takanori. Estimation of biodiversity metrics by environmental DNA metabarcoding compared with visual and capture surveys of river fish communities. Freshwater Biology. 2021;66(7):1257–1266. doi: 10.1111/fwb.13714. [DOI] [Google Scholar]
- Edgar Robert C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010;26(19):2460–2461. doi: 10.1093/bioinformatics/btq461. [DOI] [PubMed] [Google Scholar]
- Evans Nathan T., Shirey Patrick D., Wieringa Jamin G., Mahon Andrew R., Lamberti Gary A. Comparative Cost and Effort of Fish Distribution Detection via Environmental DNA Analysis and Electrofishing. Fisheries. 2017;42(2):90–99. doi: 10.1080/03632415.2017.1276329. [DOI] [Google Scholar]
- Secretariat GBIF. GBIF Backbone Taxonomy. Checklist dataset. 2011 doi: 10.15468/39omei. Accessed via GBIF.org on 2023-11-14. [DOI]
- Facility Global Biodiversity Information. GBIF data validator. https://www.gbif.org/tools/data-validator. 2017 Accessed on 2023-05-15.
- Goldberg Caren S., Turner Cameron R., Deiner Kristy, Klymus Katy E., Thomsen Philip Francis, Murphy Melanie A., Spear Stephen F., McKee Anna, Oyler‐McCance Sara J., Cornman Robert Scott, Laramie Matthew B., Mahon Andrew R., Lance Richard F., Pilliod David S., Strickler Katherine M., Waits Lisette P., Fremier Alexander K., Takahara Teruhiko, Herder Jelger E., Taberlet Pierre. Critical considerations for the application of environmental DNA methods to detect aquatic species. Methods in Ecology and Evolution. 2016;7(11):1299–1307. doi: 10.1111/2041-210x.12595. [DOI] [Google Scholar]
- Gregorič Matjaž, Kutnjak Denis, Bačnik Katarina, Gostinčar Cene, Pecman Anja, Ravnikar Maja, Kuntner Matjaž. Spider webs as eDNA samplers: Biodiversity assessment across the tree of life. Molecular ecology resources. 2022;22(7):2534–2545. doi: 10.1111/1755-0998.13629. [DOI] [PubMed] [Google Scholar]
- Hoh Daphne. Taiwan Biodiversity Information Facility; 2023. eDNA along Houdong riverine zonation in Taiwan. Version 1.7. [DOI] [Google Scholar]
- Jerde Christopher L. Can we manage fisheries with the inherent uncertainty from eDNA? Journal of Fish Biology. 2019;98(2):341–353. doi: 10.1111/jfb.14218. [DOI] [PubMed] [Google Scholar]
- Jeunen Gert‐Jan, Knapp Michael, Spencer Hamish G., Lamare Miles D., Taylor Helen R., Stat Michael, Bunce Michael, Gemmell Neil J. Environmental DNA (eDNA) metabarcoding reveals strong discrimination among diverse marine habitats connected by water movement. Molecular Ecology Resources. 2019;19(2):426–438. doi: 10.1111/1755-0998.12982. [DOI] [PubMed] [Google Scholar]
- Larson Eric R, Graham Brittney M, Achury Rafael, Coon Jaime J, Daniels Melissa K, Gambrell Daniel K, Jonasen Kacie L, King Gregory D, LaRacuente Nicholas, Perrin‐Stowe Tolulope IN, Reed Emily M, Rice Christopher J, Ruzi Selina SA, Thairu Margaret W, Wilson Jared C, Suarez Andrew V. From <scp>eDNA</scp> to citizen science: emerging tools for the early detection of invasive species. Frontiers in Ecology and the Environment. 2020;18(4):194–202. doi: 10.1002/fee.2162. [DOI] [Google Scholar]
- Liber Julian A, Bonito Gregory, Benucci Gian Maria Niccolò. CONSTAX2: improved taxonomic classification of environmental DNA markers. Bioinformatics. 2021;37(21):3941–3943. doi: 10.1093/bioinformatics/btab347. [DOI] [PubMed] [Google Scholar]
- Machida Ryuji J., Leray Matthieu, Ho Shian-Lei, Knowlton Nancy. Metazoan mitochondrial gene sequence reference datasets for taxonomic assignment of environmental samples. Scientific Data. 2017;4(1) doi: 10.1038/sdata.2017.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martin Marcel. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal. 2011;17(1) doi: 10.14806/ej.17.1.200. [DOI] [Google Scholar]
- McMurdie Paul J., Holmes Susan. phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data. PLoS ONE. 2013;8(4) doi: 10.1371/journal.pone.0061217. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meusnier Isabelle, Singer Gregory AC, Landry Jean-François, Hickey Donal A, Hebert Paul DN, Hajibabaei Mehrdad. A universal DNA mini-barcode for biodiversity analysis. BMC Genomics. 2008;9(1) doi: 10.1186/1471-2164-9-214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nakagawa Hikaru, Yamamoto Satoshi, Sato Yukuto, Sado Tetsuya, Minamoto Toshifumi, Miya Masaki. Comparing local‐ and regional‐scale estimations of the diversity of stream fish using <scp>eDNA</scp> metabarcoding and conventional observation methods. Freshwater Biology. 2018;63(6):569–580. doi: 10.1111/fwb.13094. [DOI] [Google Scholar]
- Ogram Andrew, Sayler Gary S., Barkay Tamar. The extraction and purification of microbial DNA from sediments. Journal of Microbiological Methods. 1987;7:57–66. doi: 10.1016/0167-7012(87)90025-x. [DOI] [Google Scholar]
- Pawlowski Jan, Apothéloz‐Perret‐Gentil Laure, Altermatt Florian. Environmental DNA: What's behind the term? Clarifying the terminology and recommendations for its future use in biomonitoring. Molecular Ecology. 2020;29(22):4258–4264. doi: 10.1111/mec.15643. [DOI] [PubMed] [Google Scholar]
- Team R Core. R Foundation for Statistical Computing, Vienna, Austria; 2023. R: A language and environment for statistical computing. Version 4.2.2. [Google Scholar]
- Rourke Meaghan L., Fowler Ashley M., Hughes Julian M., Broadhurst Matt K., DiBattista Joseph D., Fielder Stewart, Wilkes Walburn Jackson, Furlan Elise M. Environmental DNA (eDNA) as a tool for assessing fish biomass: A review of approaches and future considerations for resource surveys. Environmental DNA. 2021;4(1):9–33. doi: 10.1002/edn3.185. [DOI] [Google Scholar]
- Shen Mei, Xiao Nengwen, Zhao Ziyi, Guo Ningning, Luo Zunlan, Sun Guang, Li Junsheng. eDNA metabarcoding as a promising conservation tool to monitor fish diversity in Beijing water systems compared with ground cages. Scientific Reports. 2022;12(1) doi: 10.1038/s41598-022-15488-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stewart Kathryn A. Understanding the effects of biotic and abiotic factors on sources of aquatic environmental DNA. Biodiversity and Conservation. 2019;28(5):983–1001. doi: 10.1007/s10531-019-01709-8. [DOI] [Google Scholar]
- Suominen Saara, Frøslev Tobias Guldberg, Johansson Veronika, Endresen Dag, Schigel Dmitry, Obst Matthias., editors. Exchange and reuse of environmental DNA and metabarcoding data; Biodiversity Information Science and Standards; The Biodiversity Information Standards Conference (TDWG); Hobart, Tasmania, Australia. 2023. [Google Scholar]
- Taberlet Pierre, Coissac Eric, Hajibabaei Mehrdad, Rieseberg Loren H. Environmental DNA. Molecular Ecology. 2012;21(8):1789–1793. doi: 10.1111/j.1365-294x.2012.05542.x. [DOI] [PubMed] [Google Scholar]
- Thomas Austen C., Tank Samantha, Nguyen Phong L., Ponce Jake, Sinnesael Mieke, Goldberg Caren S. A system for rapid eDNA detection of aquatic invasive species. Environmental DNA. 2019;2(3):261–270. doi: 10.1002/edn3.25. [DOI] [Google Scholar]
- Villanueva Randle Aaron M., Chen Zhuo Job. ggplot2: Elegant Graphics for Data Analysis (2nd ed.) Measurement: Interdisciplinary Research and Perspectives. 2019;17(3):160–167. doi: 10.1080/15366367.2019.1565254. [DOI] [Google Scholar]
- West Katrina M., Stat Michael, Harvey Euan S., Skepper Craig L., DiBattista Joseph D., Richards Zoe T., Travers Michael J., Newman Stephen J., Bunce Michael. eDNA metabarcoding survey reveals fine‐scale coral reef community variation across a remote, tropical island ecosystem. Molecular Ecology. 2020;29(6):1069–1086. doi: 10.1111/mec.15382. [DOI] [PubMed] [Google Scholar]
- Yilmaz Pelin, Kottmann Renzo, Field Dawn, Knight Rob, Cole James R, Amaral-Zettler Linda, Gilbert Jack A, Karsch-Mizrachi Ilene, Johnston Anjanette, Cochrane Guy, Vaughan Robert, Hunter Christopher, Park Joonhong, Morrison Norman, Rocca-Serra Philippe, Sterk Peter, Arumugam Manimozhiyan, Bailey Mark, Baumgartner Laura, Birren Bruce W, Blaser Martin J, Bonazzi Vivien, Booth Tim, Bork Peer, Bushman Frederic D, Buttigieg Pier Luigi, Chain Patrick S G, Charlson Emily, Costello Elizabeth K, Huot-Creasy Heather, Dawyndt Peter, DeSantis Todd, Fierer Noah, Fuhrman Jed A, Gallery Rachel E, Gevers Dirk, Gibbs Richard A, San Gil Inigo, Gonzalez Antonio, Gordon Jeffrey I, Guralnick Robert, Hankeln Wolfgang, Highlander Sarah, Hugenholtz Philip, Jansson Janet, Kau Andrew L, Kelley Scott T, Kennedy Jerry, Knights Dan, Koren Omry, Kuczynski Justin, Kyrpides Nikos, Larsen Robert, Lauber Christian L, Legg Teresa, Ley Ruth E, Lozupone Catherine A, Ludwig Wolfgang, Lyons Donna, Maguire Eamonn, Methé Barbara A, Meyer Folker, Muegge Brian, Nakielny Sara, Nelson Karen E, Nemergut Diana, Neufeld Josh D, Newbold Lindsay K, Oliver Anna E, Pace Norman R, Palanisamy Giriprakash, Peplies Jörg, Petrosino Joseph, Proctor Lita, Pruesse Elmar, Quast Christian, Raes Jeroen, Ratnasingham Sujeevan, Ravel Jacques, Relman David A, Assunta-Sansone Susanna, Schloss Patrick D, Schriml Lynn, Sinha Rohini, Smith Michelle I, Sodergren Erica, Spo Aymé, Stombaugh Jesse, Tiedje James M, Ward Doyle V, Weinstock George M, Wendel Doug, White Owen, Whiteley Andrew, Wilke Andreas, Wortman Jennifer R, Yatsunenko Tanya, Glöckner Frank Oliver. Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nature biotechnology. 2011;29(5):415–20. doi: 10.1038/nbt.1823. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Sampling event data
Daphne Z. Hoh
Data type
event
Brief description
A TSV datasheet in Darwin Core Archive format describing the four sampling events along the river.
File: oo_947548.tsv
Water quality measurements in each sampling event
Daphne Z. Hoh
Data type
measurement or fact
Brief description
A TSV datasheet in Darwin Core Archive format describing the water quality measurements in the four sampling events along the river.
File: oo_947549.tsv
Sample relationship
Daphne Z. Hoh
Data type
resource relationship
Brief description
A TSV datasheet in Darwin Core Archive format describing the relationship of each technical sample to the four sampling events.
File: oo_947550.tsv


