Abstract
MitoFish, MitoAnnotator, and MiFish Pipeline are comprehensive databases of fish mitochondrial genomes (mitogenomes), accurate annotation software of fish mitogenomes, and a web platform for metabarcoding analysis of fish mitochondrial environmental DNA (eDNA), respectively. The MitoFish Suite currently receives over 48,000 visits worldwide every year; however, the performance and usefulness of the online platforms can still be improved. Here, we present essential updates on these platforms, including an enrichment of the reference data sets, an enhanced searching function, substantially faster genome annotation and eDNA analysis with the denoising of sequencing errors, and a multisample comparative analysis function. These updates have made our platform more intuitive, effective, and reliable. These updated platforms are freely available at http://mitofish.aori.u-tokyo.ac.jp/.
Keywords: fish, mitochondrial genome, genome annotation, database, web service, environmental DNA
Introduction
Fish mitochondrial genomes (mitogenomes) provide fundamental resources for studying fish evolution, diversity, and genetics, as well as for conservation biology, fisheries, and food science (Miya and Nishida 2015). Notably, environmental DNA (eDNA) analysis of fish mitochondrial DNA has now been established as a powerful, efficient, and noninvasive technology to detect and monitor fish species based on the polymerase chain reaction (PCR) of environmental water samples (Yao et al. 2022). PCR primers used for eDNA analysis can target either species-specific or universally conserved sequences. In the latter case, universal primers and parallel sequencing allow eDNA metabarcoding to enumerate fish species in such environment. Although several universal primers have been developed for fish eDNA metabarcoding, MiFish primers are the most widely used because they target and identify diverse fish species (Miya et al. 2015; Xiong et al. 2022).
MitoFish, MitoAnnotator, and MiFish Pipeline are comprehensive databases of fish mitogenomes, accurate annotation software for fish mitogenomes, and a web platform for MiFish-primer metabarcoding analysis of fish mitochondrial eDNA (Iwasaki et al. 2013; Sato et al. 2018). The MitoFish Suite has been maintained and developed for more than 10 years and is frequently accessed by researchers worldwide, with over 48,000 annual visits and over 700 citations as of January 2023.
We have made major updates to MitoFish, MitoAnnotator, and MiFish Pipeline to enrich the reference data sets, enhance the search function, achieve faster genome annotation, enable faster eDNA analysis with the denoising of sequencing errors, and allow a multisample comparative analysis function. The updated platforms will continue to contribute to basic fish biology, as well as studies of fish resources and conservation, and are freely available at http://mitofish.aori.u-tokyo.ac.jp/.
Updates on MitoFish: Data Increase and Enhanced Search
When we published the original MitoFish paper in 2013, it contained fish mitogenomes of 1,184 species as well as partial mitochondrial DNA sequences of 17,958 species. By 2022, these numbers had increased to 3,492 and 41,653, respectively. MitoFish has become a fundamental database for a variety of fish studies (fig. 1A). At the order level, MitoFish now includes mitogenomes from 56 orders of fish taxonomy (fig. 1B). The data increase has mainly been due to the generalization of ultrafast sequencing platforms. Because improvements in sequencing platforms are expected to continue (Slatko et al. 2018), we envision that MitoFish will further expand, as there are more than 36,000 fish species globally (Fricke et al. 2022).
Fig. 1.
Updated MitoFish and MitoAnnotator. (A) Screenshot of MitoFish. (B) Taxonomic coverage of fish mitogenome data. (C) MitoFish search by synonyms and common names. (D) Speedup of MitoAnnotator with the updated algorithm flowchart on tRNA annotation. (E) Summary of differences in the results of the latest MitoAnnotator using ten fish mitochondrial genomes. Identical: seven annotations yielded identical results after the original and the updated versions of MitoAnnotator were used. Improved: three annotations yielded improved results upon using the updated version (see supplementary table S1, Supplementary Material online for details).
To allow more diverse users to easily access data in MitoFish, the latest version enables searches using synonyms and common names of fish species (fig. 1C). Many fish species have more than one synonym (e.g., 69.3% of the species in MitoFish v3.84). Some of these names are widely used. Examples include gray mullet fish (Planiliza affinis vs. Liza affinis) and keeled mullet fish (Chelon carinatus vs. Liza carinata). In the updated version, we fetched all synonyms and common names from the NCBI taxonomy database (Schoch et al. 2020) to allow users to use any name to search. In addition, synonyms and common names are shown in an extra collapsible row of the search result table (fig. 1C).
We have also added a button to easily batch-download sequences of specific genes and the whole mitogenomes of multiple species (supplementary fig. S1, Supplementary Material online). The visual appearance of MitoFish, including a colored representation of the taxonomic levels, has been improved (fig. 1B).
Updates on MitoAnnotator: Substantially Faster Mitogenome Annotation
We have substantially improved the running speed of fish mitogenome annotation using MitoAnnotator. The original version of MitoAnnotator required an average of more than 3 min to annotate a single mitochondrial genome of approximately 16 kb (using a 2.1 GHz Intel Xeon Gold CPU with 48 cores). This time has been shortened more than 5-fold to approximately 30 s. The most time-consuming part of the original version was transfer RNA (tRNA) annotation by MiTFi (Juhling et al. 2012), which required approximately 160 s per mitogenome. The replacement of MiTFi by tRNAscan-SE v2.0 (Chan et al. 2021) with an updated search engine and multithreading now facilitates the process of tRNA annotation in just 5 s (fig. 1D). The BLASTX used in the coding sequence annotation step was also updated to version 2.9.0+ (Camacho et al. 2009).
We applied MitoAnnotator to ten fish mitochondrial genomes with known atypical structures (Iwasaki et al. 2013). The latest version of MitoAnnotator produced the same annotation results for the seven genomes as the original version. For the other three genomes, the latest version of MitoAnnotator successfully identified more tRNA genes and made the start codon positions consistent with RefSeq entries (fig. 1E and supplementary table S1, Supplementary Material online).
We also improved the visual appearance of MitoAnnotator to show its progress in real time. An annotation file for direct submission to NCBI is now provided.
Updates on MiFish Pipeline: Sequencing Error Denoise, Updated Reference Database, Substantially Faster Analysis, and Comparative Analysis Function
In eDNA metabarcoding using parallel DNA sequencing, low-abundance DNA sequences can be observed because of either minor species (true positives) or artificial sequencing errors (false positives). Denoising is a technique used to discriminate between true and false positives based on the fact that erroneous sequences should be accompanied by abundant and almost identical sequences (Rosen et al. 2012). Although the original version of the MiFish Pipeline was not equipped with a denoising function, the latest MiFish Pipeline employs unoise3 (Edgar 2016) to discriminate meaningful low-abundance sequences from artificial errors in MiFish-primer metabarcoding (Miya et al. 2020). This denoising step replaces the mapping step in the original version (fig. 2A). Using sample data from a river in Okinawa, Japan (Sato et al. 2018), which contained 6,714 pairs of Illumina sequencing reads (NCBI SRA Accession ID: DRR126155), the denoising step highlighted two novel species with low abundance, which were estimated to be true positives by careful investigation using BLAT (supplementary table S2, Supplementary Material online).
Fig. 2.
Updated MiFish Pipeline. (A) Speedup of MiFish Pipeline with the updated flowchart. Third-party tools are marked below in brackets. (B) Flowchart of the updating of the MiFish Pipeline reference database. (C) Screenshots of sample group assignment, reference database selection, data upload, and result pages for a comparison of multiple samples. A tab-style navigation bar allows users to switch results among multiple samples and see diversity metrics among different groups.
We also updated the MiFish Pipeline reference database. Compared with the previous version constructed in 2019 (MiFish DB Ver. 1.00), the latest database contains the MiFish-primer regions of an additional 2,004 species, comprising 9,569 species in total (MiFish DB Ver. 3.85). To build an updated reference database, we used the complete and partial mitochondrial sequence data in MitoFish as templates and conducted in silico PCR to obtain MiFish amplicons. We also retrieved Sanger sequences of the MiFish amplicons from the GenBank database through a keyword search. These two data sets were combined, and redundant sequences from the same species and taxonomically misidentified sequences were removed after thorough manual investigation (fig. 2B). Although the MiFish Pipeline reference database will be updated periodically with the update of MitoFish, users can now select previous database versions to reproduce past analysis results. Using the same Okinawa River data, the updated reference database fixed the taxonomy identification results for 11 of the 35 haploids owing to redundancy reduction or newly released sequences (supplementary table S3, Supplementary Material online).
Regardless of the implementation of the denoising step and the increase in the reference database size, the latest MiFish Pipeline runs substantially faster than its original version, owing to updated workflows and third-party tools (fig. 2A). The use of the same Okinawa River data led to a decrease in the running time from approximately 40 to 5 s.
Another feature of the latest MiFish Pipeline is the ease of comparison of multiple samples, including alpha and beta diversities (fig. 2C). Metrics that include chao1, Shannon diversity, Simpson's diversity, and Bray–Curtis dissimilarity are automatically calculated using scikit-bio v0.5.6 (http://scikit-bio.org) to make downstream analyses easier. As a sample data set for comparing multiple samples, we analyzed MiFish eDNA metabarcoding data from 12 samples from 3 Korean rivers (Alam et al. 2020) (supplementary table S4, Supplementary Material online). The identified fish lists and alpha diversity matrices among the different rivers were highly consistent with those reported in the original study (supplementary table S5 and fig. S2, Supplementary Material online). The analysis required approximately 12 min compared with approximately 27 min in the original version.
We have also improved the visual appearance of the MiFish Pipeline to allow users to easily check the phylogenetic relationships of the identified species, haploid information, fish photos, confidence values, and diversity statistics (fig. 2C). An informative sheet that summarizes ecological and conservation data of the detected fish species is added to an exported Excel file to let users quickly interpret the results and develop subsequent research plans (supplementary fig. S3, Supplementary Material online).
Last but not least, a stand-alone version of MiFish Pipeline has been released at https://github.com/billzt/MiFish, which can be used with other eDNA primer sets and reference databases (supplementary table S6, Supplementary Material online).
Conclusion
Since its launch in 2013, MitoFish, MitoAnnotator, and MiFish Pipeline have continued to grow to support various fish studies. Further developments of sequencing technologies are envisioned. Our platforms will adapt to these changes to contribute to various fish-related research studies.
Supplementary Material
Acknowledgments
The authors thank Tsukasa Fukunaga, Jiwei Yang, and Fei Xia for their contribution to this project and the editor and two anonymous reviewers for their valuable comments. This study was supported by MEXT Advancement of Technologies for Utilizing Big Data of Marine Life JPMXD1521474594, JSPS KAKENHI 22H04925, and JST CREST JPMJCR19S2.
Contributor Information
Tao Zhu, Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan.
Yukuto Sato, Research Laboratory Center, Faculty of Medicine, The University of the Ryukyus, Nishihara, Okinawa, Japan.
Tetsuya Sado, Department of Collection Management, Natural History Museum and Institute, Chiba, Chiba, Japan.
Masaki Miya, Department of Collection Management, Natural History Museum and Institute, Chiba, Chiba, Japan.
Wataru Iwasaki, Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan; Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Bunkyo-ku, Tokyo, Japan; Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan; Atmosphere and Ocean Research Institute, The University of Tokyo, Kashiwa, Chiba, Japan; Institute for Quantitative Biosciences, The University of Tokyo, Bunkyo-ku, Tokyo, Japan; Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, Bunkyo-ku, Tokyo, Japan.
Supplementary Material
Supplementary data are available at Molecular Biology and Evolution online.
References
- Alam MJ, Kim NK, Andriyono S, Choi HK, Lee JH, Kim HW. 2020. Assessment of fish biodiversity in four Korean rivers using environmental DNA metabarcoding. PeerJ 8:e9508. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. 2009. BLAST+: architecture and applications. BMC Bioinformatics 10:421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chan PP, Lin BY, Mak AJ, Lowe TM. 2021. tRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes. Nucleic Acids Res. 49:9077–9096. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Edgar RC. 2016. UNOISE2: improved error-correction for Illumina 16S and ITS amplicon sequencing. bioRxiv:081257. [Google Scholar]
- Fricke R, Eschmeyer W, Van der Laan R. 2022. Catalog of fishes: genera, species, references. San Francisco: (CA: ): Electronic Version; The California Academy of Sciences. [Google Scholar]
- Iwasaki W, Fukunaga T, Isagozawa R, Yamada K, Maeda Y, Satoh TP, Sado T, Mabuchi K, Takeshima H, Miya M, et al. 2013. MitoFish and MitoAnnotator: a mitochondrial genome database of fish with an accurate and automatic annotation pipeline. Mol Biol Evol. 30:2531–2540. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Juhling F, Putz J, Bernt M, Donath A, Middendorf M, Florentz C, Stadler PF. 2012. Improved systematic tRNA gene annotation allows new insights into the evolution of mitochondrial tRNA structures and into the mechanisms of mitochondrial genome rearrangements. Nucleic Acids Res. 40:2833–2845. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miya M, Gotoh RO, Sado T. 2020. MiFish metabarcoding: a high-throughput approach for simultaneous detection of multiple fish species from environmental DNA and other samples. Fish Sci. 86:939–970. [Google Scholar]
- Miya M, Nishida M. 2015. The mitogenomic contributions to molecular phylogenetics and evolution of fishes: a 15-year retrospect. Ichthyol Res. 62:29–71. [Google Scholar]
- Miya M, Sato Y, Fukunaga T, Sado T, Poulsen JY, Sato K, Minamoto T, Yamamoto S, Yamanaka H, Araki H, et al. 2015. MiFish, a set of universal PCR primers for metabarcoding environmental DNA from fishes: detection of more than 230 subtropical marine species. R Soc Open Sci. 2:150088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rosen MJ, Callahan BJ, Fisher DS, Holmes SP. 2012. Denoising PCR-amplified metagenome data. BMC Bioinformatics 13:283. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sato Y, Miya M, Fukunaga T, Sado T, Iwasaki W. 2018. MitoFish and MiFish pipeline: a mitochondrial genome database of fish with an analysis pipeline for environmental DNA metabarcoding. Mol Biol Evol. 35:1553–1555. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schoch CL, Ciufo S, Domrachev M, Hotton CL, Kannan S, Khovanskaya R, Leipe D, McVeigh R, O’Neill K, Robbertse B, et al. 2020. NCBI Taxonomy: a comprehensive update on curation, resources and tools. Database (Oxford) 2020:baaa062. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Slatko BE, Gardner AF, Ausubel FM. 2018. Overview of next-generation sequencing technologies. Curr Protoc Mol Biol. 122:e59. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xiong F, Shu L, Gan X, Zeng H, He S, Peng Z. 2022. Methodology for fish biodiversity monitoring with environmental DNA metabarcoding: the primers, databases and bioinformatic pipelines. Water Biol Secur. 1:100007. [Google Scholar]
- Yao M, Zhang S, Lu Q, Chen X, Zhang SY, Kong Y, Zhao J. 2022. Fishing for fish environmental DNA: ecological applications, methodological considerations, surveying designs, and ways forward. Mol Ecol. 31:5132–5164. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.