Skip to main content
Biodiversity Data Journal logoLink to Biodiversity Data Journal
. 2024 Apr 23;12:e116921. doi: 10.3897/BDJ.12.e116921

Environmental DNA-based biodiversity profiling along the Houdong River in north-eastern Taiwan

Chieh-Ping Lin 1,2, Chung-Hsin Huang 3,4, Trevor Padgett 3,4, Mark Angelo C Bucay 3,5, Cheng-Wei Chen 3,5, Zong-Yu Shen 3,5, Ling Chiu 6,7, Yung-Che Tseng 6, Jr-Kai Yu 6,8, John Wang 3,2, Min-Chen Wang 6,9,, Daphne Z Hoh 10,
PMCID: PMC11061556  PMID: 38694844

Abstract

Background

This paper describes two datasets: species occurrences, which were determined by environmental DNA (eDNA) metabarcoding and their associated DNA sequences, originating from a research project which was carried out along the Houdong River (猴洞坑), Jiaoxi Township, Yilan, Taiwan. The Houdong River begins at an elevation of 860 m and flows for approximately 9 km before it empties into the Pacific Ocean. Meandering through mountains, hills, plains and alluvial valleys, this short river system is representative of the fluvial systems in Taiwan. The primary objective of this study was to determine eukaryotic species occurrences in the riverine ecosystem through the use of the eDNA analysis. The second goal was, based on the current dataset, to establish a metabarcoding eDNA data template that will be useful and replicable for all users, particularly the Taiwan community. The species occurrence data are accessible at the Global Biodiversity Information Facility (GBIF) portal and its associated DNA sequences have been deposited in the European Nucleotide Archive (ENA) at EMBL-EBI, respectively. A total of 12 water samples from the study yielded an average of 1.5 million reads. The subsequent species identification from the collected samples resulted in the classification of 432 Operational Taxonomic Units (OTUs) out of a total of 2,734. Furthermore, a total of 1,356 occurrences with taxon matches in GBIF were documented (excluding 4,941 incertae sedis, accessed 05-12-2023). These data will be of substantial importance for future species and habitat monitoring within the short river, such as assessment of biodiversity patterns across different elevations, zonations and time periods and its correlation to water quality, land uses and anthropogenic activities. Further, these datasets will be of importance for regional ecological studies, in particular the freshwater ecosystem and its status in the current global change scenarios.

New information

The datasets are the first species diversity description of the Houdong River system using either eDNA or traditional monitoring processes.

Keywords: river ecosystem, species occurrence, metabarcoding, cytochrome c oxidase I gene, eukaryota

Introduction

Environmental DNA (eDNA) metabarcoding is an emerging tool that can provide an accurate and comprehensive representation of biotic communities (Beng and Corlett 2020, Carvalho et al. 2021, Doi et al. 2021) by identifying multiple lineages from a single environmental sample (Taberlet et al. 2012). While the earliest uses of eDNA specifically focused on RNA detection of microbial communities (Ogram et al. 1987), in the last decade, eDNA has grown in scope to include macrobial communities and, as a result, has become an important tool in biodiversity assessments, biomonitoring and identifying temporal shifts in community assemblages of both terrestrial and aquatic ecosystems (Bista et al. 2017, Jeunen et al. 2019, West et al. 2020, Pawlowski et al. 2020, Burian et al. 2021, Gregorič et al. 2022).

Ecosystems are replete with the genetic material of both resident and transient species. This genetic material includes organismal and extra-organismal DNA from microorganisms and macroorganisms in the form of faeces, shed skin or hair, carcasses and living bodies (Stewart 2019). Using genomic approaches to detect these ‘genetic footprints’ of organism life from environmental samples, for example, water, snow, soil, air or leaf swabs (Stewart 2019), provides an opportunity to increase the detection rate (Goldberg et al. 2016) and increase temporal resolution (Ogram et al. 1987, Pawlowski et al. 2020, Taberlet et al. 2012, Alexander et al. 2023Taberlet et al. 2012,) of biological studies. Notably, eDNA analyses surpass conventional methods in terms of precision and resolution (e.g. Nakagawa et al. (2018), Rourke et al. (2021)). Furthermore, they exhibit greater efficiency in the context of both cost and time when compared to conventional biological monitoring approaches. These factors are crucial in advancing long-term ecological research on a global scale (Evans et al. 2017, Jerde 2019, Larson et al. 2020, Bruel and White 2021).

The application of eDNA for species composition assessments is flexible depending on the organisms of interest and may function as a general or targeted instrument (e.g. Thomas et al. (2019), Beng and Corlett (2020), Burian et al. (2021), Carvalho et al. (2021), Alexander et al. (2023)). Targeted analyses aim to identify the presence or absence of a specific focal species (i.e. “is the targeted species here”), while general analyses aim to identify community composition (i.e. “what species are here”). For both aims, by relying on the genetic footprints and not direct observation, eDNA can be superior to traditional methods by allowing for higher resolution of community assemblage (e.g. Alexander et al. (2022)); the enhanced detection of rare, migratory, elusive and cryptic species (e.g. Barnes et al. (2014), Beng and Corlett (2020), Rourke et al. (2021), Shen et al. (2022)) and the detection of community shifts (e.g. DiBattista et al. (2020)) and small scale spatial assemblage changeovers (e.g. Jeunen et al. (2019)). Environmental sampling, followed by eDNA analyses, provides a powerful approach for identifying a more comprehensive community complexion, which is crucial for researchers.

An ecosystem examination utilising eDNA necessitates exchanging the information collected in addition to the noted benefits and significance. For instance, there is undeniable value in sharing eDNA data in accordance with the FAIR (Findable, Accessible, Interoperable and Reusable) principles, thereby enriching our understanding of global biodiversity. The efforts of international organisations committed to advancing and promoting instruments for the exchange of eDNA data are noticeable in the discussions that commenced at the TDWG conference (Suominen et al. 2023).

Hence, in this paper, we describe two datasets:

  1. A species occurrence dataset derived from the eDNA analysis of surface water from the Houdong River in north-eastern Taiwan and

  2. their associated DNA sequences.

We aimed to determine community composition and diversity changes along a 6.5 km stretch of the river as it passes from headwaters (primary subtropical forest) through residential areas, aquaculture farms and agricultural fields (e.g. rice farm), before reaching the estuary/river mouth. This is the first known freshwater eDNA dataset originating from a representative turbulent river in Taiwan and will be important as baseline data for further studies and environmental monitoring of this ecosystem. Given the inexperienced DNA open data attempt within the Taiwan community, the study collaborates with the Taiwan Biodiversity Information Facility (TaiBIF) to establish a data template for the eDNA open data workflow. TaiBIF is amongst the most active data hosting centres and nodes of the Global Biodiversity Information Facility (GBIF). We expect that, through this collaboration, we can promote the dissemination and use of eDNA and DNA metabarcoding datasets from the Taiwan community.

Sampling methods

Study extent

This was a one-time sampling event of water samples from four sites along the Houdong River in Jiaoxi Township, Yilan County, Taiwan (Fig. 1). The Houdong River is a popular tourist destination running across the Jiaoxi Township. The river system originates east of the Sidu Mountains and flows through primary and secondary forests, agricultural lands (rice), aquaculture farms and developed areas (light industrial and residential) until it eventually drains into the Pacific Ocean. Water samples were used for eDNA analysis and measurement of in-situ water quality.

Figure 1.

Figure 1.

The four water sampling locations along the Houdong riverine system in Yilan County, Taiwan.

Sampling description

Water samples for eDNA were collected from near the surface of the river. Before the collection, the water containers were rinsed with the local water at each sampling site. Approximately 3 litres of water were collected for eDNA analysis. Some 200 ml of the collected water was used for water quality measurements. Water quality measurements were made using multiple hand-held probes on-site at each sampling site. Temperature, pH and dissolved oxygen were measured with a multi-parameter meter (Multiline® Multi 3620 IDS, WTW, Weilheim, Germany) equipped with an IDS pH electrode (SenTix 940, WTW) and an optical IDS dissolved oxygen sensor (FDO® 925, WTW). Turbidity was measured by a turbidity meter (TUB-430, EZDO, Taiwan). Salinity was determined by salinity refractometer (2491 MASTER-S/Milla Salinity Refractometer, Atago, Japan). Three replicate measurements of each parameter were taken. Water samples were transported to the Marine Research Station (MRS, Yilan County, Taiwan) of the Institute of Cellular and Organismic Biology for sample filtration. The water was first filtered through a 75 µm pore size sieve to eliminate larger particles. Afterwards, a 1 litre water sample from each site was filtered through a 0.22 µm filter and the sample kept on top of the filter membrane, under vacuum compression (PC651-0024, GeneDireX, USA). The filter membranes were placed in sterile Petri dishes and stored at -80°C until DNA extraction.

Step description

Wet lab process

DNA was extracted at the Biodiversity Research Center (Academia Sinica, Taipei, Taiwan). Each filtered membrane was cut into quarters. Three of the four pieces of filtered membranes were used in the study as three experimental replicates. The final quarter was saved as the sample backup. DNA from each quarter membrane piece was extracted using the Presto™ Stool DNA Extraction Kit (STLD100, Geneaid Biotech Ltd., Taiwan) following the manufacturer's instructions (Instruction Manual Ver. 10.21.17). The quality and quantity of the extracted DNA was assessed using a Nanodrop 2000 (Thermo Fisher Scientific Inc., USA) and the Qubit 4 dsDNA High Sensitivity Assay Kit (Thermo Fisher Scientific Inc., USA).

The MinibarF1 (5'TCCACTAATCACAARGATATTGGTAC) and MinibarR1 (5'GAAAATCATAA TGAAGGCATGAGC) primers that were designed by Meusnier et al. (2008) were used to amplify the 5' region (ca. 120-150 bp) of the mitochondrial Cytochrome c oxidase I (COI) gene. The universality of the primers was recommended for distinguishing the highly diverse DNA from the environmental mixture. We conducted PCR using a one-step single-indexed approach, with a 13 bp tag attached to the MinibarR1 primer. The PCR reaction volume was 16 μl, which included 8 μl KAPA HiFi HotStart ReadyMix (KK2602, Roche Molecular Systems Inc., USA), 5 μl ddH20, 1 μl of each primer (10μM) and 1 μl of DNA template. To optimise the protocol, we performed a preliminary PCR using an annealing temperature gradient and found that 54°C gave the best results. The PCR mixture was denatured at 95°C for 15 minutes, followed by 35 cycles of 94°C for 30 seconds, 54°C for 30 seconds and a final elongation at 72°C for 10 minutes.

The PCR products were checked on a 1.5% agarose gel and quantified with the Invitrogen Qubit 4 fluorometer (Thermo Fisher Scientific Inc., USA). Afterwards, all the PCR products were pooled in one tube for next-generation sequencing. Sequencing was performed on the Illumina NovaSeq 6000 platform with 2*150 paired-end reads by Genomics Co., Taipei, Taiwan.

Data processing and analysis

The Illumina raw reads were demultiplexed by Genomics Co., Taipei, Taiwan. FastQC (v.0.11.9; https://github.com/s-andrews/FastQC) was used to check quality. The forward and reverse primers of the demultiplexed reads were trimmed using Cutadapt (version 4.2; Martin (2011)). The USEARCH platform (v.11.0.667; Edgar (2010)) was used to verify if the primer sequences were completely removed from the demultiplexed reads. The "denoised-paired" function was used to create an amplicon sequence variant (ASV) data output from the demultiplexed reads. The "denoise-paired" function in QIIME2 (v.2023.2.0; Bolyen et al. (2019)) can automatically trim, filter, denoise, merge reads and remove chimeric reads in one step. The maximum expected error of forward and reverse reads was set to 1.0. The reads with a quality score of less than 20 were truncated. The minimum overlap length for the forward and reverse reads merger was set to 16 bp. Other parameters followed the default settings in the "denoise-paired" function. No reads were trimmed or truncated during the ASV creation process. The ASV output was then further clustered through the "cluster-features-de-novo" function provided by QIIME 2. ASVs with more than 97% sequence identity were clustered into one operational taxonomic unit (OTU). The sequences that were shorter than 100 bp in the ASV and OTU results were discarded. Taxonomic assignments were conducted using Constax (v.2.0.18; Liber et al. (2021)) against the MIDORI database (v.GB250; Machida et al. (2017)). The R package phyloseq was used to analyse the preprocessed sequencing data (v.1.40.0; McMurdie and Holmes (2013)). The piechart figure was produced with ggplot2 (v.3.4.2; Villanueva and Chen (2019)). Lowest taxon level-annotation of each OTU was extracted to perform a secondary species mapping to the GBIF Backbone Taxonomy (GBIF Secretariat 2011) on 05-12-2023 using the "name_backbone_checklist" function in R package rgbif (v.3.7.7; Chamberlain et al. (2022), R Core Team (2023)). The map was visualised using the R packages ggplot2 (v.3.4.2; Villanueva and Chen (2019)) and annotated using ggspatial (v.1.1.9; https://paleolimbot.github.io/ggspatial/), metR (v.0.14.0; https://github.com/eliocamp/metR) and ggrepel (v.0.9.3; https://CRAN.R-project.org/package=ggrepel).

Open data and code

Two datasets were associated with this study: DNA sequence data and occurrence data (see Data resources). We converted the occurrence data into Darwin Core Archive standard (Darwin Core Task Group 2009) and validated the datasheet using the GBIF Data Validator (Global Biodiversity Information Facility 2017). We then published the dataset containing one occurrence core (i.e. foundational part of the dataset with information about each occurrence) and one DNA-derived data extension (Hoh 2023) using the Integrated Publishing Toolkit (IPT) of GBIF installed under the Taiwan Biodiversity Information Facility (TaiBIF). We have included three supplementary files to help describe the dataset. They are the attributes of the sampling event (Suppl. material 1), the water quality measurements (Suppl. material 2) and the relationship of each technical sample to the sampling event (Suppl. material 3). These files were attached as the current GBIF data model schema does not support event core matching with a DNA-derived data extension. All source code used in the project can be found in the project's GitHub repository.

Geographic coverage

Description

We selected four sites along the Houdong River (猴洞坑), Jiaoxi Township, Yilan County, Taiwan (Table 1) for water sample collections and in-situ water quality measurements: Upstream waterfall (WF), downstream river (FR), estuary (ES) and river mouth (RM). These four sites spanned a river length of 6.5 km.

Table 1.

Coordinates of the four sampling sites.

Station North East
猴洞坑瀑布 | Upstream waterfall (WF) 24.843580 121.781830
猴洞溪 | Downstream river (FR) 24.835094 121.799932
下埔排水線 | Estuary (ES) 24.835900 121.818660
竹安出海口 | River mouth (RM) 24.840520 121.826640

Coordinates

24.824 and 24.871 Latitude; 121.768 and 121.846 Longitude.

Taxonomic coverage

Description

We detected eukaryotic organisms in the water samples using the COI mitochondrial gene. A total of 2,736 OTUs were identified and 421 of the OTUs were assigned to at least the kingdom level using the MIDORI database (v.GB250; Machida et al. (2017); Fig. 2). On GBIF, this dataset (Hoh 2023) consists of a total of 6,297 occurrences with 22% (1,356 occurrences; last accessed 05-12-2023) having a taxon match on the GBIF Backbone Taxonomy, with remaining occurrences being assigned as incertae sedis (i.e. taxa unknown). The species occurrence dataset was standardised and presented in GBIF annotated Darwin Core Archive (see Data resources), grouped by sampling events (i.e. sites and identifiable via the eventID column).

Figure 2.

Figure 2.

The taxonomic ranking of 432 classified operational taxonomic units (OTUs). The colours represent different kingdoms.

Taxa included

Rank Scientific Name
kingdom Animalia
kingdom Chromista
kingdom Fungi
kingdom Plantae
kingdom Protozoa
phylum Amoebozoa
phylum Bryophyta
phylum Cryptophyta
phylum Nemertea
phylum Sulcozoa
phylum Annelida
phylum Bryozoa
phylum Gastrotricha
phylum Ochrophyta
phylum Tracheophyta
phylum Arthropoda
phylum Chaetognatha
phylum Glomeromycota
phylum Oomycota
phylum Zygomycota
phylum Ascomycota
phylum Chlorophyta
phylum Haptophyta
phylum Platyhelminthes
phylum Basidiomycota
phylum Chordata
phylum Mollusca
phylum Rhodophyta
phylum Blastocladiomycota
phylum Cnidaria
phylum Mycetozoa
phylum Rotifera

Temporal coverage

Notes

This was a one-time sampling of water samples and corresponding water quality parameters on 28-04-2022.

Usage licence

Usage licence

Other

IP rights notes

Datasets produced by the current work are licensed under a Creative Commons Attribution (CC-BY) 4.0 Licence.

Data resources

Data package title

eDNA along Houdong riverine zonation in Taiwan

Number of data sets

2

Data set 1.

Data set name

eDNA along riverine zonation of Houdong River, Yilan, Taiwan [Project ID: PRJEB60905]

Data format

Genomic Standard Consortium MIxS water package

Download URL

https://www.ebi.ac.uk/ena/browser/view/PRJEB60905

Data format version

mixs6.1.0

Description

DNA sequence data have been deposited on ENA at EMBL-EBI under accession number PRJEB60905 following the Genomic Standard Consortium MIxS standard (Yilmaz et al. 2011). Below are described the nine default columns under the 'Read Files' section on the project (or dataset) page, which can also be obtained from downloading the TSV report on the Project page.

Data set 1.
Column label Column description
study_accession The project accession number created by ENA for this submission (PRJEB60905 for this dataset).
sample_accession The sample accession number created by ENA for this submission. A total of 12 Biosamples (comprised of three replicates from each of the four sampling sites) were registered. Each accession from the link https://www.ebi.ac.uk/ena/browser/view/[sample_accession] describes basic information about the sample following the MIxS standard.
experiment_accession The experiment accession number created by ENA for this submission. A total of 12 Experiments were registered. Each accession from the link https://www.ebi.ac.uk/ena/browser/view/[experiment_accession] describes sequencing instrument and library-associated information.
run_accession The run accession number created by ENA for this submission. A total of 12 Runs were registered. Each accession from the link https://www.ebi.ac.uk/ena/browser/view/[run_accession] contains read and base count information.
tax_id Taxon ID in ENA. Since this is a metabarcoding study, all entries are 256318, which corresponds to 'metagenome'.
scientific_name Since this is a metabarcoding study, scientific name is not applicable and hence all entries are 'metagenome'.
fastq_ftp The FTP link to download DNA reads obtained from each Run. This is the ENA Archived Generated File as described here. The format of the file is a gunzip-compressed FASTQ file. Two FTP links were provided that separate the forward and reverse reads from each paired experiment by [run_accession]_1.fastq.gz and [run_accession]_2.fastq.gz, respectively.
submitted_ftp The FTP link to download DNA reads uploaded to ENA by the submitter before automation curation resulting to fastq_ftp.
bam_ftp The FTP link to download the BAM file of each Run.

Data set 2.

Data set name

eDNA along Houdong riverine zonation in Taiwan

Data format

Darwin Core Archive

Download URL

https://www.gbif.org/dataset/2615342d-7349-4e75-ae34-cda6cb403e2e ; https://ipt.taibif.tw/archive.do?r=houdongkeng_water_edna

Data format version

2021-07-15

Description

There are two links in the Download URL. The first links to the download page of the GBIF annotated Darwin Core Archive and the second links to the Source Darwin Core Archive from the TaiBIF IPT. The second link is provided because the DNA-derived data extension associated with the occurrence datasheet is not available through the GBIF-annotated Darwin Core Archive download option, although the extension is included in the source archive available either through the GBIF webpage for the dataset or directly from the TaiBIF IPT. Downloading from both links gives GZ-compressed files containing the occurrence core and DNA-derived data extension files in TXT format. The below table describes a total of 76 data fields from both the occurrence core and DNA derived data extension, sorted alphabetically. The data field descriptions are written as listed in the List of Darwin Core terms (accessed April 2023; Darwin Core Task Group (2009)), but modified as needed if applicable to the current study context. The occurrence core datasheet can also be downloaded via GBIF API-based tools such as rgbif (Chamberlain et al. 2022) for further analyses.

Data set 2.
Column label Column description
ampliconSize The length of the amplicon in basepairs.
amplificationReactionVolume PCR reaction volume
amplificationReactionVolumeUnit Unit used for PCR reaction volume.
basisOfRecord The specific nature of the data record.
class The full scientific name of the class in which the taxon is classified.
concentration Concentration of DNA (weight ng/volume µl).
concentrationUnit Unit used for concentration measurement.
continent The name of the continent in which the Location occurs.
coordinateUncertaintyInMetres The horizontal distance (in metres) from the given decimalLatitude and decimalLongitude describing the smallest circle containing the whole of the Location.
country The name of the country in which the Location occurs.
countryCode The standard code for the country in which the Location occurs.
county The full, unabbreviated name of the smaller administrative region in which the Location occurs.
datasetName The name identifying the dataset from which the record was derived.
dateIdentified The date on which the subject was determined as representing the Taxon.
day The integer day of the month on which the Event occurred.
decimalLatitude The geographic latitude (in decimal degrees, using the spatial reference system given in geodeticDatum) of the geographic centre of a Location.
decimalLongitude The geographic longitude (in decimal degrees, using the spatial reference system given in geodeticDatum) of the geographic centre of a Location.
DNA_sequence The DNA sequence.
env_broad_scale The major environmental system from which the sample came. ENVO's biome subclasses determined in https://ontobee.org/ontology/ENVO.
env_local_scale The entity in which the sample's local vicinity, smaller spatial grain than the entry in env_broad_scale. ENVO's biome subclasses determined in https://ontobee.org/ontology/ENVO.
env_medium Environmental material immediately surrounded the sample prior to sampling, using subclasses of ENVO’s environmental material class determined in https://ontobee.org/ontology/ENVO.
eventDate The date-time when the event was recorded (i.e. water sampling time).
eventID An identifier associated with an sampling event.
eventTime The time when the event was recorded (i.e. water sampling time).
experimental_factor The variable aspects of an experiment design to describe the experiment. The ontology terms determined from Experimental Factor Ontology (EFO).
family The full scientific name of the family in which the taxon is classified.
genus The full scientific name of the genus in which the taxon is classified.
geodeticDatum The full scientific name of the genus in which the taxon is classified.
habitat A category or description of the habitat in which the Event occurred. Using subclasses of ENVO’s environmental material class determined in https://ontobee.org/ontology/ENVO.
higherClassification Taxa names terminating at the rank immediately superior to the taxon referenced in the taxon record. Current metabarcoding study was targeting the eukaryotic organism and hence the entry is "Eukaryota".
identificationReferences Publication reference used in the Identification.
identificationRemarks Comments or notes about the Identification.
kingdom The full scientific name of the kingdom in which the taxon is classified.
lib_layout Specify whether to expect single, paired or other configuration of reads.
licence A legal document giving official permission to do something with the resource.
locationID An identifier for the set of location information as listed in Table 1.
materialSampleID An identifier for the MaterialSample.
methodDeterminationConcentrationAndRatios Method used for concentration measurement.
month The integer month in which the Event occurred.
nucl_acid_amp A link to electronic resource that describes the PCR amplification of specific nucleic acids.
nucl_acid_ext A link to electronic resource that describes the material separation to recover the nucleic acid fraction from a sample.
occurrenceID An identifier for the Occurrence. Format: [samp_name]:OTU[number].
occurrenceStatus A statement about the presence or absence of a Taxon at a Location.
order The full scientific name of the order in which the taxon is classified.
organismQuantity Number of reads of this OTU in the sample.
organismQuantityType DNA sequence reads.
otu_class_appr Cutoffs and approach used when clustering the "species-level" OTUs.
otu_db Reference database for "species-level" OTUs.
otu_seq_comp_appr Tool and thresholds used to compare sequences when computing "species-level" OTUs.
pcr_cond Description of reaction conditions and components of PCR.
pcr_primer_forward Forward PCR primer that were used to amplify the sequence of the targeted gene.
pcr_primer_name_forward Name of the forward PCR primer used.
pcr_primer_name_reverse Name of the reverse PCR primer used.
pcr_primer_reference Reference for the PCR primers that were used to amplify the sequence of the targeted gene.
pcr_primer_reverse Reverse PCR primer that were used to amplify the sequence of the targeted gene.
pcr_primers PCR primers that were used to amplify the sequence of the targeted gene, locus or subfragment.
phylum The full scientific name of the phylum or division in which the taxon is classified.
preparations Preparations methods for the sample.
project_name Name of the project within which the sequencing was organised.
rightsHolder The organisation owning or managing rights over the resource.
samp_mat_process The processing applied to the sample after retrieving the sample from environment.
samp_name Unique sample name for each sample. Starts with abbreviation of sampling site as listed in Table 1 and ends with sample replicate number.
samp_size Amount of sample (volume) that was collected.
sampleSizeUnit DNA sequence reads.
sampleSizeValue Total number of reads in the sample.
samplingProtocol The names of, references to, or descriptions of the methods or protocols used during an Event.
scientificName The full scientific name in the lowest level taxonomic rank that can be determined. The content in this field was obtained from secondary mapping of the lowest taxonomic rank in verbatimIdentification to the GBIF Backbone Taxonomy (see Methodology).
seq_meth Sequencing method used.
size_frac Filtering pore size used in sample preparation.
target_gene Targeted gene or locus name for marker gene studies.
tax_ident The phylogenetic marker(s) used to assign an organism name.
taxonRank The taxonomic rank of the most specific name in the scientificName.
type The nature or genre of the resource.
verbatimIdentification The taxonomic identification of otu_db.
verbatimLocality The original textual description of the place.
year The four-digit year in which the Event occurred, according to the Common Era Calendar.

Supplementary Material

Supplementary material 1

Sampling event data

Daphne Z. Hoh

Data type

event

Brief description

A TSV datasheet in Darwin Core Archive format describing the four sampling events along the river.

File: oo_947548.tsv

Supplementary material 2

Water quality measurements in each sampling event

Daphne Z. Hoh

Data type

measurement or fact

Brief description

A TSV datasheet in Darwin Core Archive format describing the water quality measurements in the four sampling events along the river.

File: oo_947549.tsv

bdj-12-e116921-s002.tsv (11.3KB, tsv)
Supplementary material 3

Sample relationship

Daphne Z. Hoh

Data type

resource relationship

Brief description

A TSV datasheet in Darwin Core Archive format describing the relationship of each technical sample to the four sampling events.

File: oo_947550.tsv

Acknowledgements

This research was produced by the students in the Taiwan International Graduate Program (TIGP) Signature Course-Ecology Masterclass @ Taiwan (EMT course). Support was provided by the TIGP (Academia Sinica, Taipei, Taiwan), Biodiversity Research Center (Academia Sinica, Taipei, Taiwan), Marine Research Station (MRS) of the Institute of Cellular and Organismic Biology (Academia Sinica, Yilan, Taiwan) and Taiwan Biodiversity Information Facility (TaiBIF). We would like to thank Tzi-Yuan Wang for the lecture and teaching assistance. We would also like to thank Kiran Kumar Eripogu, Alyzza Calayag and Yue Rong Tan for assisting with the sample collection and DNA extraction process. We are grateful to all the members who arranged and participated in the EMT course.

Contributor Information

Min-Chen Wang, Email: mcwinlab@gmail.com.

Daphne Z. Hoh, Email: daphnehohzhiwei@gmail.com.

Author contributions

Conceptualisation: Min-Chen Wang, Ling Chiu and Yung-Che Tseng; Sample collection: Min-Chen Wang, Ling Chiu, Mark Angelo C. Bucay and Chung-Hsin Huang; Methodology and formal analysis: Chieh-Ping Lin, Chung-Hsin Huang, Cheng-Wei Chen, Daphne Z. Hoh and Min-Chen Wang; Investigation: Min-Chen Wang, Ling Chiu, Trevor Padgett, Daphne Z. Hoh, Mark Angelo C. Bucay, Chieh-Ping Lin, Chung-Hsin Huang, Cheng-Wei Chen, Zong-Yu Shen, Min-Chen Wang and John Wang; Visualisation: Chieh-Ping Lin and Daphne Z. Hoh; Writing—original draft preparation: Trevor Padgett, Daphne Z. Hoh, Mark Angelo C. Bucay, Chieh-Ping Lin, Chung-Hsin Huang, Cheng-Wei Chen; Writing—review and editing: Min-Chen Wang, Yung-Che Tseng, Jr-Kai Yu and John Wang; Supervision: Min-Chen Wang, Jr-Kai Yu and John Wang; Resources: Yung-Che Tseng, Jr-Kai Yu and John Wang; Funding acquisition: Jr-Kai Yu and John Wang. All authors commented on and approved the manuscript.

References

  1. Alexander Jason B., Marnane Michael J., Elsdon Travis S., Bunce Michael, Songploy Se, Sitaworawet Paweena, Harvey Euan S. Complementary molecular and visual sampling of fish on oil and gas platforms provides superior biodiversity characterisation. Marine Environmental Research. 2022;179 doi: 10.1016/j.marenvres.2022.105692. [DOI] [PubMed] [Google Scholar]
  2. Alexander Jason B., Marnane Michael J., McDonald Justin I., Lukehurst Sherralee S., Elsdon Travis S., Simpson Tiffany, Hinz Shawn, Bunce Michael, Harvey Euan S. Comparing environmental DNA collection methods for sampling community composition on marine infrastructure. Estuarine, Coastal and Shelf Science. 2023;283 doi: 10.1016/j.ecss.2023.108283. [DOI] [Google Scholar]
  3. Barnes Matthew A., Turner Cameron R., Jerde Christopher L., Renshaw Mark A., Chadderton W. Lindsay, Lodge David M. Environmental Conditions Influence eDNA Persistence in Aquatic Systems. Environmental Science & Technology. 2014;48(3):1819–1827. doi: 10.1021/es404734p. [DOI] [PubMed] [Google Scholar]
  4. Beng Kingsly C., Corlett Richard T. Applications of environmental DNA (eDNA) in ecology and conservation: opportunities, challenges and prospects. Biodiversity and Conservation. 2020;29(7):2089–2121. doi: 10.1007/s10531-020-01980-0. [DOI] [Google Scholar]
  5. Bista Iliana, Carvalho Gary R., Walsh Kerry, Seymour Mathew, Hajibabaei Mehrdad, Lallias Delphine, Christmas Martin, Creer Simon. Annual time-series analysis of aqueous eDNA reveals ecologically relevant dynamics of lake ecosystem biodiversity. Nature Communications. 2017;8(1) doi: 10.1038/ncomms14087. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bolyen Evan, Rideout Jai Ram, Dillon Matthew R, Bokulich Nicholas A, Abnet Christian C, Al-Ghalith Gabriel A, Alexander Harriet, Alm Eric J, Arumugam Manimozhiyan, Asnicar Francesco, Bai Yang, Bisanz Jordan E, Bittinger Kyle, Brejnrod Asker, Brislawn Colin J, Brown C Titus, Callahan Benjamin J, Caraballo-Rodríguez Andrés Mauricio, Chase John, Cope Emily K, Da Silva Ricardo, Diener Christian, Dorrestein Pieter C, Douglas Gavin M, Durall Daniel M, Duvallet Claire, Edwardson Christian F, Ernst Madeleine, Estaki Mehrbod, Fouquier Jennifer, Gauglitz Julia M, Gibbons Sean M, Gibson Deanna L, Gonzalez Antonio, Gorlick Kestrel, Guo Jiarong, Hillmann Benjamin, Holmes Susan, Holste Hannes, Huttenhower Curtis, Huttley Gavin A, Janssen Stefan, Jarmusch Alan K, Jiang Lingjing, Kaehler Benjamin D, Kang Kyo Bin, Keefe Christopher R, Keim Paul, Kelley Scott T, Knights Dan, Koester Irina, Kosciolek Tomasz, Kreps Jorden, Langille Morgan G I, Lee Joslynn, Ley Ruth, Liu Yong-Xin, Loftfield Erikka, Lozupone Catherine, Maher Massoud, Marotz Clarisse, Martin Bryan D, McDonald Daniel, McIver Lauren J, Melnik Alexey V, Metcalf Jessica L, Morgan Sydney C, Morton Jamie T, Naimey Ahmad Turan, Navas-Molina Jose A, Nothias Louis Felix, Orchanian Stephanie B, Pearson Talima, Peoples Samuel L, Petras Daniel, Preuss Mary Lai, Pruesse Elmar, Rasmussen Lasse Buur, Rivers Adam, Robeson Michael S, Rosenthal Patrick, Segata Nicola, Shaffer Michael, Shiffer Arron, Sinha Rashmi, Song Se Jin, Spear John R, Swafford Austin D, Thompson Luke R, Torres Pedro J, Trinh Pauline, Tripathi Anupriya, Turnbaugh Peter J, Ul-Hasan Sabah, van der Hooft Justin J J, Vargas Fernando, Vázquez-Baeza Yoshiki, Vogtmann Emily, von Hippel Max, Walters William, Wan Yunhu, Wang Mingxun, Warren Jonathan, Weber Kyle C, Williamson Charles H D, Willis Amy D, Xu Zhenjiang Zech, Zaneveld Jesse R, Zhang Yilong, Zhu Qiyun, Knight Rob, Caporaso J Gregory. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature biotechnology. 2019;37(9):1091. doi: 10.1038/s41587-019-0252-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bruel Rosalie, White Easton R. Sampling requirements and approaches to detect ecosystem shifts. Ecological Indicators. 2021;121 doi: 10.1016/j.ecolind.2020.107096. [DOI] [Google Scholar]
  8. Burian Alfred, Mauvisseau Quentin, Bulling Mark, Domisch Sami, Qian Song, Sweet Michael. Improving the reliability of eDNA data interpretation. Molecular Ecology Resources. 2021;21(5):1422–1433. doi: 10.1111/1755-0998.13367. [DOI] [PubMed] [Google Scholar]
  9. Carvalho Carolina S., de Oliveira Marina Elisa, Rodriguez‐Castro Karen Giselle, Saranholi Bruno H., Galetti Pedro M. Efficiency of eDNA and iDNA in assessing vertebrate diversity and its abundance. Molecular Ecology Resources. 2021;22(4):1262–1273. doi: 10.1111/1755-0998.13543. [DOI] [PubMed] [Google Scholar]
  10. Chamberlain Scott, Oldoni Damiano, Geffert Laurens, Desmet Peter, Barve Vijay, Ram Karthik, Blissett Matt, Waller John, McGlinn Dan, Ooms Jeroen, Ye Steven, Oksanen Jari, Marwick Ben, John, Sumner Michael, Sriram ropensci/rgbif. [2022-02-09T00:00:00+02:00];2022 doi: 10.5281/zenodo.6023735. v3.7.0. [DOI]
  11. Group Darwin Core Task. Biodiversity Information Standards (TDWG); 2009. Darwin Core. [Google Scholar]
  12. DiBattista Joseph D., Reimer James D., Stat Michael, Masucci Giovanni D., Biondi Piera, De Brauwer Maarten, Wilkinson Shaun P., Chariton Anthony A., Bunce Michael. Environmental DNA can act as a biodiversity barometer of anthropogenic pressures in coastal ecosystems. Scientific Reports. 2020;10(1) doi: 10.1038/s41598-020-64858-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Doi Hideyuki, Inui Ryutei, Matsuoka Shunsuke, Akamatsu Yoshihisa, Goto Masuji, Kono Takanori. Estimation of biodiversity metrics by environmental DNA metabarcoding compared with visual and capture surveys of river fish communities. Freshwater Biology. 2021;66(7):1257–1266. doi: 10.1111/fwb.13714. [DOI] [Google Scholar]
  14. Edgar Robert C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010;26(19):2460–2461. doi: 10.1093/bioinformatics/btq461. [DOI] [PubMed] [Google Scholar]
  15. Evans Nathan T., Shirey Patrick D., Wieringa Jamin G., Mahon Andrew R., Lamberti Gary A. Comparative Cost and Effort of Fish Distribution Detection via Environmental DNA Analysis and Electrofishing. Fisheries. 2017;42(2):90–99. doi: 10.1080/03632415.2017.1276329. [DOI] [Google Scholar]
  16. Secretariat GBIF. GBIF Backbone Taxonomy. Checklist dataset. 2011 doi: 10.15468/39omei. Accessed via GBIF.org on 2023-11-14. [DOI]
  17. Facility Global Biodiversity Information. GBIF data validator. https://www.gbif.org/tools/data-validator. 2017 Accessed on 2023-05-15.
  18. Goldberg Caren S., Turner Cameron R., Deiner Kristy, Klymus Katy E., Thomsen Philip Francis, Murphy Melanie A., Spear Stephen F., McKee Anna, Oyler‐McCance Sara J., Cornman Robert Scott, Laramie Matthew B., Mahon Andrew R., Lance Richard F., Pilliod David S., Strickler Katherine M., Waits Lisette P., Fremier Alexander K., Takahara Teruhiko, Herder Jelger E., Taberlet Pierre. Critical considerations for the application of environmental DNA methods to detect aquatic species. Methods in Ecology and Evolution. 2016;7(11):1299–1307. doi: 10.1111/2041-210x.12595. [DOI] [Google Scholar]
  19. Gregorič Matjaž, Kutnjak Denis, Bačnik Katarina, Gostinčar Cene, Pecman Anja, Ravnikar Maja, Kuntner Matjaž. Spider webs as eDNA samplers: Biodiversity assessment across the tree of life. Molecular ecology resources. 2022;22(7):2534–2545. doi: 10.1111/1755-0998.13629. [DOI] [PubMed] [Google Scholar]
  20. Hoh Daphne. Taiwan Biodiversity Information Facility; 2023. eDNA along Houdong riverine zonation in Taiwan. Version 1.7. [DOI] [Google Scholar]
  21. Jerde Christopher L. Can we manage fisheries with the inherent uncertainty from eDNA? Journal of Fish Biology. 2019;98(2):341–353. doi: 10.1111/jfb.14218. [DOI] [PubMed] [Google Scholar]
  22. Jeunen Gert‐Jan, Knapp Michael, Spencer Hamish G., Lamare Miles D., Taylor Helen R., Stat Michael, Bunce Michael, Gemmell Neil J. Environmental DNA (eDNA) metabarcoding reveals strong discrimination among diverse marine habitats connected by water movement. Molecular Ecology Resources. 2019;19(2):426–438. doi: 10.1111/1755-0998.12982. [DOI] [PubMed] [Google Scholar]
  23. Larson Eric R, Graham Brittney M, Achury Rafael, Coon Jaime J, Daniels Melissa K, Gambrell Daniel K, Jonasen Kacie L, King Gregory D, LaRacuente Nicholas, Perrin‐Stowe Tolulope IN, Reed Emily M, Rice Christopher J, Ruzi Selina SA, Thairu Margaret W, Wilson Jared C, Suarez Andrew V. From <scp>eDNA</scp> to citizen science: emerging tools for the early detection of invasive species. Frontiers in Ecology and the Environment. 2020;18(4):194–202. doi: 10.1002/fee.2162. [DOI] [Google Scholar]
  24. Liber Julian A, Bonito Gregory, Benucci Gian Maria Niccolò. CONSTAX2: improved taxonomic classification of environmental DNA markers. Bioinformatics. 2021;37(21):3941–3943. doi: 10.1093/bioinformatics/btab347. [DOI] [PubMed] [Google Scholar]
  25. Machida Ryuji J., Leray Matthieu, Ho Shian-Lei, Knowlton Nancy. Metazoan mitochondrial gene sequence reference datasets for taxonomic assignment of environmental samples. Scientific Data. 2017;4(1) doi: 10.1038/sdata.2017.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Martin Marcel. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal. 2011;17(1) doi: 10.14806/ej.17.1.200. [DOI] [Google Scholar]
  27. McMurdie Paul J., Holmes Susan. phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data. PLoS ONE. 2013;8(4) doi: 10.1371/journal.pone.0061217. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Meusnier Isabelle, Singer Gregory AC, Landry Jean-François, Hickey Donal A, Hebert Paul DN, Hajibabaei Mehrdad. A universal DNA mini-barcode for biodiversity analysis. BMC Genomics. 2008;9(1) doi: 10.1186/1471-2164-9-214. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Nakagawa Hikaru, Yamamoto Satoshi, Sato Yukuto, Sado Tetsuya, Minamoto Toshifumi, Miya Masaki. Comparing local‐ and regional‐scale estimations of the diversity of stream fish using <scp>eDNA</scp> metabarcoding and conventional observation methods. Freshwater Biology. 2018;63(6):569–580. doi: 10.1111/fwb.13094. [DOI] [Google Scholar]
  30. Ogram Andrew, Sayler Gary S., Barkay Tamar. The extraction and purification of microbial DNA from sediments. Journal of Microbiological Methods. 1987;7:57–66. doi: 10.1016/0167-7012(87)90025-x. [DOI] [Google Scholar]
  31. Pawlowski Jan, Apothéloz‐Perret‐Gentil Laure, Altermatt Florian. Environmental DNA: What's behind the term? Clarifying the terminology and recommendations for its future use in biomonitoring. Molecular Ecology. 2020;29(22):4258–4264. doi: 10.1111/mec.15643. [DOI] [PubMed] [Google Scholar]
  32. Team R Core. R Foundation for Statistical Computing, Vienna, Austria; 2023. R: A language and environment for statistical computing. Version 4.2.2. [Google Scholar]
  33. Rourke Meaghan L., Fowler Ashley M., Hughes Julian M., Broadhurst Matt K., DiBattista Joseph D., Fielder Stewart, Wilkes Walburn Jackson, Furlan Elise M. Environmental DNA (eDNA) as a tool for assessing fish biomass: A review of approaches and future considerations for resource surveys. Environmental DNA. 2021;4(1):9–33. doi: 10.1002/edn3.185. [DOI] [Google Scholar]
  34. Shen Mei, Xiao Nengwen, Zhao Ziyi, Guo Ningning, Luo Zunlan, Sun Guang, Li Junsheng. eDNA metabarcoding as a promising conservation tool to monitor fish diversity in Beijing water systems compared with ground cages. Scientific Reports. 2022;12(1) doi: 10.1038/s41598-022-15488-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Stewart Kathryn A. Understanding the effects of biotic and abiotic factors on sources of aquatic environmental DNA. Biodiversity and Conservation. 2019;28(5):983–1001. doi: 10.1007/s10531-019-01709-8. [DOI] [Google Scholar]
  36. Suominen Saara, Frøslev Tobias Guldberg, Johansson Veronika, Endresen Dag, Schigel Dmitry, Obst Matthias., editors. Exchange and reuse of environmental DNA and metabarcoding data; Biodiversity Information Science and Standards; The Biodiversity Information Standards Conference (TDWG); Hobart, Tasmania, Australia. 2023. [Google Scholar]
  37. Taberlet Pierre, Coissac Eric, Hajibabaei Mehrdad, Rieseberg Loren H. Environmental DNA. Molecular Ecology. 2012;21(8):1789–1793. doi: 10.1111/j.1365-294x.2012.05542.x. [DOI] [PubMed] [Google Scholar]
  38. Thomas Austen C., Tank Samantha, Nguyen Phong L., Ponce Jake, Sinnesael Mieke, Goldberg Caren S. A system for rapid eDNA detection of aquatic invasive species. Environmental DNA. 2019;2(3):261–270. doi: 10.1002/edn3.25. [DOI] [Google Scholar]
  39. Villanueva Randle Aaron M., Chen Zhuo Job. ggplot2: Elegant Graphics for Data Analysis (2nd ed.) Measurement: Interdisciplinary Research and Perspectives. 2019;17(3):160–167. doi: 10.1080/15366367.2019.1565254. [DOI] [Google Scholar]
  40. West Katrina M., Stat Michael, Harvey Euan S., Skepper Craig L., DiBattista Joseph D., Richards Zoe T., Travers Michael J., Newman Stephen J., Bunce Michael. eDNA metabarcoding survey reveals fine‐scale coral reef community variation across a remote, tropical island ecosystem. Molecular Ecology. 2020;29(6):1069–1086. doi: 10.1111/mec.15382. [DOI] [PubMed] [Google Scholar]
  41. Yilmaz Pelin, Kottmann Renzo, Field Dawn, Knight Rob, Cole James R, Amaral-Zettler Linda, Gilbert Jack A, Karsch-Mizrachi Ilene, Johnston Anjanette, Cochrane Guy, Vaughan Robert, Hunter Christopher, Park Joonhong, Morrison Norman, Rocca-Serra Philippe, Sterk Peter, Arumugam Manimozhiyan, Bailey Mark, Baumgartner Laura, Birren Bruce W, Blaser Martin J, Bonazzi Vivien, Booth Tim, Bork Peer, Bushman Frederic D, Buttigieg Pier Luigi, Chain Patrick S G, Charlson Emily, Costello Elizabeth K, Huot-Creasy Heather, Dawyndt Peter, DeSantis Todd, Fierer Noah, Fuhrman Jed A, Gallery Rachel E, Gevers Dirk, Gibbs Richard A, San Gil Inigo, Gonzalez Antonio, Gordon Jeffrey I, Guralnick Robert, Hankeln Wolfgang, Highlander Sarah, Hugenholtz Philip, Jansson Janet, Kau Andrew L, Kelley Scott T, Kennedy Jerry, Knights Dan, Koren Omry, Kuczynski Justin, Kyrpides Nikos, Larsen Robert, Lauber Christian L, Legg Teresa, Ley Ruth E, Lozupone Catherine A, Ludwig Wolfgang, Lyons Donna, Maguire Eamonn, Methé Barbara A, Meyer Folker, Muegge Brian, Nakielny Sara, Nelson Karen E, Nemergut Diana, Neufeld Josh D, Newbold Lindsay K, Oliver Anna E, Pace Norman R, Palanisamy Giriprakash, Peplies Jörg, Petrosino Joseph, Proctor Lita, Pruesse Elmar, Quast Christian, Raes Jeroen, Ratnasingham Sujeevan, Ravel Jacques, Relman David A, Assunta-Sansone Susanna, Schloss Patrick D, Schriml Lynn, Sinha Rohini, Smith Michelle I, Sodergren Erica, Spo Aymé, Stombaugh Jesse, Tiedje James M, Ward Doyle V, Weinstock George M, Wendel Doug, White Owen, Whiteley Andrew, Wilke Andreas, Wortman Jennifer R, Yatsunenko Tanya, Glöckner Frank Oliver. Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nature biotechnology. 2011;29(5):415–20. doi: 10.1038/nbt.1823. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material 1

Sampling event data

Daphne Z. Hoh

Data type

event

Brief description

A TSV datasheet in Darwin Core Archive format describing the four sampling events along the river.

File: oo_947548.tsv

Supplementary material 2

Water quality measurements in each sampling event

Daphne Z. Hoh

Data type

measurement or fact

Brief description

A TSV datasheet in Darwin Core Archive format describing the water quality measurements in the four sampling events along the river.

File: oo_947549.tsv

bdj-12-e116921-s002.tsv (11.3KB, tsv)
Supplementary material 3

Sample relationship

Daphne Z. Hoh

Data type

resource relationship

Brief description

A TSV datasheet in Darwin Core Archive format describing the relationship of each technical sample to the four sampling events.

File: oo_947550.tsv


Articles from Biodiversity Data Journal are provided here courtesy of Pensoft Publishers

RESOURCES