Abstract
MycoBank, a registration system for fungi established in 2004 to capture all taxonomic novelties, acts as a coordination hub between repositories such as Index Fungorum and Fungal Names. Since January 2013, registration of fungal names is a mandatory requirement for valid publication under the International Code of Nomenclature for algae, fungi and plants (ICN). This review explains the database innovations that have been implemented over the past few years, and discusses new features such as advanced queries, registration of typification events (MBT numbers for lecto, epi- and neotypes), the multi-lingual database interface, the nomenclature discussion forum, annotation system, and web services with links to third parties. MycoBank has also introduced novel identification services, linking DNA sequence data to numerous related databases to enable intelligent search queries. Although MycoBank fills an important void for taxon registration, challenges for the future remain to improve links between taxonomic names and DNA data, and to also introduce a formal system for naming fungi known from DNA sequence data only. To further improve the quality of MycoBank data, remote access will now allow registered mycologists to act as MycoBank curators, using Citrix software.
Keywords: MycoBank, EUBOLD identification services, Forum, Fungi, International Nucleotide Sequence Database Collaboration, Next Generation Sequencing, Nomenclature, Registration, Repositories, Typification
INTRODUCTION
MycoBank was officially launched in 2004 as an online repository with the primary aim to register all fungal taxonomic novelties published (including new names and combinations), and make this available in an open access database to the mycological community (Crous et al. 2004). One of the major constraints experienced by mycologists was that many newly published fungal names were not accessible to researchers in developing countries or simply overlooked, because they were published in obscure sources. Due to the large number of names published each year in a range of publications, MycoBank curators were not always able to verify and include all of them in the database. To address this issue, we approached a large number of journal editors that published taxonomic novelties, and suggested that they request authors deposit nomenclatural data, descriptions and illustrations in MycoBank, as good practice. This equates MycoBank as a phenotypic equivalent of GenBank, the main database for genotypic data. Authors would receive a unique identifier to link the registration to the name (equivalent to a GenBank accession number for data sequence), and would simultaneously be assured that no homonyms were published, as the search engine would inform authors if the name was already occupied (www.mycobank.org). Registration was seen as a two-step process; upon acceptance of the article, authors deposit their taxonomic novelties, provide the MB numbers in the protolog, and upon publication, notify MycoBank to ensure that the taxonomic novelty could be released to the community with date, volume and page numbers.
So popular was the system with mycologists, that proposals to make the deposit of the key elements mandatory for the valid publication of new scientific names of fungi, at all ranks, were prepared (Hawksworth et al. 2010), and debated at the 9th International Mycological Congress in Edinburgh in 2010. These were put before the Nomenclature Section of the 18th International Botanical Congress in Melbourne in July 2011, and incorporated into the International Code of Nomenclature for algae, fungi, and plants (McNeill et al. 2012). MycoBank can do much more than complete the basic requirements of the Code, but the only mandatory elements are the: name; rank; authorship; bibliographic details of the anticipated place of publication; diagnosis (or description) for names of novel taxa (which from 1 January 2012 can be in English or Latin); full bibliographic details of the basionym or replaced name for new combinations, names at new ranks, or replacement names; and for names of novel taxa also details of the name-bearing type and the institution or other place in which it is permanently preserved.
Although MycoBank was initially set up by CBS-KNAW staff in close collaboration with Index Fungorum, in 2009 it was decided that the ownership of the MycoBank system, database and website should be transferred to the International Mycological Association (IMA). In 2010 a new version of the MycoBank website was launched, based on the BioloMICS software (Robert et al. 2011). The advantage of the latter software is that the structure of the database could evolve according to the needs identified by the end-users and the curators of MycoBank. The new BioloMICS-based version of MycoBank has been regularly updated and improved since then. In this article, we present the major developments achieved during the past four years, as well as some usage statistics of the MycoBank system. In the last section, we will briefly describe how we see the database evolving in coming years.
NEW DEVELOPMENTS
Infrastructure
The latest version of the MycoBank software was released in April 2012, allowing curators to create new tables and fields according to the natural evolution of their increasing needs and the one of the end-users, without the intervention of any software developers. This is essential when new types of data and the associated analytical tools will be incorporated into the system.
In order to ensure a high level of security and availability of the MycoBank website, the whole MycoBank system (software, databases and website) has been transferred to a professional datacentre where power supply, Internet connections and backups are guaranteed.
In order to keep MycoBank users aware of the latest news and improvements related to the database and software, a “News” section was created that can be accessed at www.mycobank.org/BioloMICSNews.aspx.
Since the MycoBank website offers a large number of features, a “Frequently Asked Questions” and a ”Help” section are now available, providing a number of answers and videos associated with commonly asked questions (these features are available under the Help button on the main menu).
Queries
The new software interface was created in order to improve flexibility for queries. Basic (www.mycobank.org/Biolomics.aspx?Table=Mycobank) and advanced queries (www.mycobank.org/Biolomics.aspx?Table=Mycobank_Advanced) are now possible. Advanced users can build complex Boolean queries by combining AND, OR and NOT together with brackets. This makes it possible to ask a question such as “find all Candida species published after 1990 by Kurtzman and not by Fell”. This query will look like this: (Taxon contains Candida) AND (Publication date is after 1990) AND (Authors contains Kurtzman) AND NOT (Authors contains Fell). Results of queries are displayed as lists that can be exported to MS-Excel sheets or MS-Word documents.
In addition to the main taxonomic database, we have also added a bibliographic query system (www.mycobank.org/Biolomics.aspx?Table=Mycobank%20literature) as well as a thesaurus of terms commonly used in mycology (www.mycobank.org/Biolomics.aspx?Table=Thesaurus).
Name registration
The interface for the registration of the scientific names of new taxa, and new names, has also been redesigned and simplified, with fewer required steps than the previous version (Fig. 1). Popup windows are presented to depositors in order to facilitate data entries such as links to existing bibliographic records, country name, or higher taxonomic ranks.
It is recommended in the Code (McNeill et al. 2012: Rec. 42A.1) that registration numbers are obtained only “after a work is accepted for publication”. That is a wise precaution as during review it sometimes becomes necessary to change the chosen names.
Type registration
During the nomenclature discussion sessions at the 9th International Mycological Congress in Edinburgh (IMC9), the wish was expressed that MycoBank should start capturing typification events, as these are difficult to trace in the literature. Furthermore, without a clear overview of typification events, different authors might easily designate lectotypes, epitypes or neotypes the same name, which would be unfortunate and could lead to the same name being applied in different senses. We strongly support this suggestion, and anticipate that proposals to make the registration of later typifications mandatory will be made to the Nomenclature Section of the 19th International Botanical Congress in 2017.
In the summer of 2013, a new typification registration system was thus added to MycoBank. Mycologists can now log in to the system, and choose to register a type specimen for an existing taxon (this new option has been added directly below the normal “register new name” option, which delivers MB numbers). It means that mycologists can now get “MBT” numbers (MycoBank Typification numbers) for the designation of lectotypes, epitypes, and neotypes. However, if a novel combination or new name is linked to a typification event, a normal MB number would suffice, as the mycologist can directly indicate during the registration process of a combination or new name that the typification event is based on an epitype, neotype, or holotype specimen.
MBT numbers are most appropriately cited in typification sections of papers as follows:
Type. Italy: Padua, on withering leaves of Hedera helix, July 1875, L. Ranger (L–lectotype designated here, MBT12345); Sardegna, Cologne near Oleina, leaf litter of H. helix, 31 Aug. 1970, I. Hulk (CBS H-16992 - epitype designated here MBT176244, culture ex-epitype CBS 937.70).
Multi-lingual system
A major complaint of some users of the earlier version of MycoBank was that it was only available in English thus practically excluding people having difficulties with this language. For this reason, the software was modified to allow multiple languages to be displayed, and we contacted native Arabic, Chinese, Dutch, French, German, Portuguese, Spanish and Thai mycologists to translate the standard text (Fig. 2). Additional languages will be added as required by the community. Japanese and Russian will be added in 2014.
Forum
Since the Amsterdam declaration on fungal nomenclature (Hawksworth et al. 2011), and the introduction of the new Code (McNeill et al. 2012), mycologists have several new challenges to face reaching consensus with regard to the “one fungus one name” nomenclature (Hawksworth 2011, Wingfield et al. 2012). Two years ago, when discussions were initiated, we felt that there was a need to create a discussion forum to exchange ideas about dual nomenclature, and the name that should be retained. Hence, the Forum option was created and a large number of topics and discussions were initiated (Fig. 3).
Annotations and remote curation
Like many working databases, MycoBank is incomplete and contains errors and omissions that requires continuous updates by curators. However, it is virtually impossible for the small team of MycoBank curators to sustain such a huge task. The annotation system was therefore created to allow users (after a registration open to anyone) to post comments, suggest corrections or propose new data associated with already deposited taxa. Curators can then accept, reject or simply leave the comments as pending (Fig. 4). It is not, however, the role of Curators to impose a particular taxonomy as differences in scientific opinion have to be accommodated.
The same reasons that led us to include an open annotation system, led to a request for help from additional Curators. In April 2014 and to achieve this goal, remote curation using the Citrix XenApp software will allow volunteer specialists (approved as curators by the MycoBank Advisory Board) to manage sections of the database related to their competences. The first workshop for new Curators will be given at CBS in Utrecht on Saturday 26 April 2014, with a further session planned at the International Mycological Congress (IMC10) in Bangkok.
Web services and central system for registration of fungal names
Many users and websites are interested to obtain data in batches and incorporate this in their own databases. Since MycoBank is a public database used by many other repositories, it was important to provide a number of web services that can be consumed by third party machines. We therefore created several dynamic web services that can easily be changed or adapted if needed.
One year ago, one additional mycological taxon name registration website was established (Fungal Names - FN in China at http://fungalinfo.im.ac.cn/fungalname/fungalname.html), in collaboration with the long established Index Fungorum website, which also provides the option for online registration (Index Fungorum - IF in the UK at www.indexfungorum.org). The International Commission on the Taxonomy of Fungi (ICTF) and the Nomenclature Committee for Fungi (NCF) suggested that the three registrars should synchronize their data and MycoBank was asked to create a central web service that would provide unique numbers to the three systems and exchange data among them. The system was released in June 2013, and IF and FN are currently implementing the needed changes to their system in order to have a fully synchronized system.
Links to third parties
Many other websites are rich resources that can be associated with fungal names available in the MycoBank system. Structural links to the following websites have been created: Catalogue of Life (CoL), Encyclopedia of Life (EOL), Global Biodiversity Information Facility (GBIF), Index Fungorum (IF), Integrated Taxonomic Information System (ITIS), Google Scholar, PubMed, Google, Wikimedia, Wikipedia, Wikispecies, BOLD Systems, EMBL, NCBI, All Russian Collection of Microorganisms (VKM), and CBS collection and StrainInfo. More ad hoc links are also available for some taxa.
Identification services
MycoBank is not only a repository of data associated with fungal names and vouchers, but also offers unique online pairwise sequence identification services (www.mycobank.org/biolomicssequences.aspx) against curated databases such as Q-bank (www.q-bank.eu), CBS collections websites (www.cbs.knaw.nl, Fusarium, dermatophytes, indoor fungi, Calonectria, Yeasts, etc), Fungal Barcoding (www.fungalbarcoding.org), EUBOLD system (www.eubold.org), ISHAM ITS Database for Human and Animal Pathogenic Fungi (www.mycologylab.org) or UNITE (http://unite.ut.ee). NCBI/Genbank databases can also be used to perform pairwise sequence alignments. Users interested in identifying unknown sequences can compare them against all the wanted reference databases at the same time or separately and results are gathered centrally and proposed as a unique matching list.
Other more advanced identification services are also possible using a combination of morphological, physiological and/or molecular data (see www.mycobank.org/DefaultInfo.aspx?Page=polyphasicID).
Statistics
In total 254,120 unique visitors have visited the English version of MycoBank between April 2012 and 3 December 2013. In December 2012 we launched several language versions of the website, French (3992 unique visitors), Arabic (2466), Chinese (1953), Dutch (1079), German (1828), Portuguese (2141) and Spanish (2207). Recently, a Thai language version has been introduced.
On an average day, 1872 unique users visit one of the MycoBank portals, while the average visit duration is between 6–10 min per user.
The MycoBank user-base is truly global: 13.65 % of the users are located in the USA, but people from 205 countries have used MycoBank since April 2012. Table 1 lists the top 10 countries using of MycoBank around the world.
Table 1.
Rank | Country | Percentage of total users |
---|---|---|
1 | USA | 13.65 |
2 | France | 6.31 |
3 | Germany | 6.03 |
4 | Spain | 4.87 |
5 | Italy | 4.2 |
6 | Brazil | 3.8 |
7 | Russia | 3.4 |
8 | India | 3.32 |
9 | Canada | 3.22 |
10 | China | 3.03 |
Researchers depositing new scientific names in MycoBank, interested in forum discussions or willing to annotate taxon records have to be registered in MycoBank. Presently 5680 profiles have been registered since MycoBank was initiated in 2004. During the period between 1985 and 2012, 8031 different taxonomists published at least one new fungal species. The average number of authors was 1.86/species. The first 50 authors contributed to 22.9 % of the new species. The first 100 authors contributed to 32.1 % of the new species. The first 1000 authors contributed to 74.3 % of the new species and 6077 authors published between 1 and 5 new species only. One hundred and seventeen authors published more than 100 new species and during this period.
The evolution of the number of newly described species between 1759 and 2009 can be seen in Fig. 5. The number of new species grew constantly (except during the World War II period) despite the reduced number of fungal taxonomists. This is likely due to new technologies allowing mycologists to better distinguish specimens and cultures and therefore separate species, and new techniques permitting them to process and handle larger numbers of specimens. Between 2003 and 2012, the number of newly described species varied from 1692 in 2005 to 3541 in 2012 (2436, 2450, 1692, 1868, 2271, 2391, 2724, 2155, 2374, and 3541).
A more detailed analysis of changing patterns over time in the description of new fungal species will be presented elsewhere.
Future
MycoBank is one of the three repositories that fill an important requirement in terms of the registration of scientific names now required by the Code. While it is increasingly becoming a rich source of knowledge at species, genus, family, and higher levels, the databases of the International Nucleotide Sequence Database Collaboration (INSDC), a consortium consisting of NCBI, EMBL and DDBJ, serves as the international repository for molecular sequences. The task of linking MycoBank entries based on reference material (specimens and strains) to INSDC sequences, often only known from environmental sequences, is a real challenge. It incorporates subjective taxonomic interpretations with many species described and circumscribed on the basis of non-molecular criteria (morphology, physiology, ecology, etc.). Voucher data annotated consistently in all databases will possibly remains the most effective way to link species names and their associated molecular data. The Darwin Core standard (http://rs.tdwg.org/dwc/) has partly been proposed and is one way to standardize the formulation of such data. The links between species (and subsequently to higher taxonomic ranks) and sequences can only be done via strains and specimens. The biological repositories of fungal voucher data, culture collections and herbaria (listed in Index Herbariorum) are of major importance by housing reference material, information and strains (Durães Sette et al. 2013). Other initiatives such as the barcoding of life (BOLD systems, EUBOLD, China BOLD, etc.) or the UNITE database are providing useful links between reference material and barcoding sequences. Some projects are dealing with the establishment of reference databases in specific fields such as medical mycology (“DNA barcoding of pathogenic fungi as the basis for the development of novel standardized diagnostic tools”, W. Meyer, V. Robert, D. Ellis & S. Chen”, Australian NH&MRC grant). The Straininfo and WDCM databases are gathering strain data from culture collections and are providing links to INSDC databases, but their scope is limited to cultivable strains and it is now commonly accepted that most of the diversity is not present in culture collections or museums but is simply unknown (Hawksworth 2001, Kirk et al. 2008, Blackwell 2011, Mora et al. 2011). Projects to digitize herbaria will provide additional information. One recent example is the Mycoportal funded by the US National Science Foundation. There still remain many research collections around the world with useful information of unique strains or specimens, but these are often unavailable to third parties. MycoBank also maintains an ex-type strain and specimen database that is linked to species descriptions, which is in the process of being linking to INSDC-based sequences, in order to objectify or at least to provide a molecular background to species circumscriptions. Hence, the co-authors of this paper as well as other prominent mycologists and institutions are actively working on this matter and are preparing workshops, guidelines and tools to better fill the gap of linking sequences to species. One of the ways to solve this problem is to suggest MycoBank depositors of new species to provide molecular data, in addition to strains or specimens, or links to these data during the registration process. It is common knowledge that the voluntary deposit of additional data is a burden to many researchers, but it must be remembered that it is not mandatory, even though the options to deposit extra information appear on the input screens. On the other hand reliable, openly available data from databases and associated websites is a cornerstone of scientific progress. There are several ways to obtain data for reference databases. The first one is voluntary submission but as already mentioned, this approach is only partly successful. The second one is by incitements such as funding, increased citations and improved visibility facilitated by providing researchers free, useful software and database related tools. The third one is by enforcement and using new rules to be established by official bodies such as the International Code of Nomenclature for algae, fungi, and plants, by journals or reference databases. A combination of the three options may be needed to achieve the goal of a reliable taxonomy based on molecular data linked to accessible strains and specimens, and not only on phenotypic criteria.
Linking species data via molecular data using strains and specimens is important, but will not solve all problems or opportunities induced by the usage of modern technologies. Next Generation Sequencing (NGS) methods or high throughput screening technologies already allow us to obtain large datasets that would not be accessible using traditional sampling, isolation and collecting methods. New species are traditionally based on the isolation of one, or ideally several specimens that are studied and deposited in reference collections. With NGS it is possible to obtain millions of sequences from a single soil sample in a few hours and get an idea about the relative abundance of the taxa present. It is also possible and relatively easy to monitor the changes in ecosystems or hosts over time. The known diversity constitutes only a small fraction of the real fungal biodiversity (Hawksworth 2001, Kirk et al. 2008, Blackwell 2011, Mora et al. 2011). Given the drastic reduction of taxonomists and financial support attributed to systematics, it is unlikely that traditional taxonomic approaches will ever allow us to get a near complete idea of the scope of microbial diversity. Therefore, ignoring the impact of new technologies such NGS for the discovery of existing diversity would be a major mistake. Currently, there are no mechanisms allowing researchers to record, share and describe new taxa on the basis of such new technologies, other than the recently proposed system of UNITE (Kõljalg et al. 20131). Although there are a number of issues associated with these new technologies in terms of data quality, reproducibility and quantity, there is no definitive reason to ignore them. Hence, MycoBank, in collaboration with INSDC, UNITE and other DNA Barcoding initiatives (in its broad definition) will propose mechanisms and tools to record non-specimen based descriptions for candidates species (Taylor 2011). We are currently working on tools for the semi-automated curation of large datasets, for fast and accurate assignments to species or candidate species. Given the amounts of data to be handled and analyzed, new technologies need to be developed. This can only be accomplished through the collaboration of several groups of experts ranging from ecologists, taxonomists, molecular researchers, bioinformaticians, informaticians, mathematicians, database specialists to technologists focused in molecular or information technologies and hardware devices such as CPUs, GPUs, or FPGAs.
Acknowledgments
The success of MycoBank is due to the depositors of new taxa and as such, the authors would like to thank them for their contribution. Paul M. Kirk from Index Fungorum is a major contributor to MycoBank and is thankfully acknowledged for his huge contribution.
The MycoBank software developments were partially made within the framework of the Indexing for Life (i4Life) project, a European e-Infrastructure project co-funded by the European Commission’s Seventh Framework Program for Research and Technological Development.
Some data entries were made possible thanks to the EMbaRC project also supported by the European Commission’s Seventh Framework Program for Research and Technological Development
Parts of the developments were done in the framework of the “Inversteringproject NCB” supported by the “Fonds Economische Structuurversterking” (FES) of the Dutch Ministry of Education, Culture and Science (OCW).
Parts of the developments were also done in the framework of the “DNA barcoding of pathogenic fungi as the basis for the development of novel standardized diagnostic”, Australian NH&MRC grant #APP1031952 to WM and VR.
CLS acknowledges support from the Intramural Research Program of the NIH, National Library of Medicine.
Footnotes
See IMA Fungus 4: (33), 2013.
REFERENCES
- Blackwell M. (2011) The Fungi: 1, 2, 3… 5.1 million species? American Journal of Botany 98: 426–438 [DOI] [PubMed] [Google Scholar]
- Crous PW, Gams W, Stalpers JA, Robert V, Stegehuis G. (2004) MycoBank: an online initiative to launch mycology into the 21st century. Studies in Mycology 50: 19–22 [Google Scholar]
- Durães Sette L, Pagnocca FC, Rodrigues A. (2013) Microbial culture collections as pillars for promoting fungal diversity, conservation and exploitation. Fungal Genetics and Biology 60: 2–8 [DOI] [PubMed] [Google Scholar]
- Hawksworth DL. (2001) The magnitude of fungal diversity: the 1.5 million species estimate revisited. Mycological Research 105: 1422–1432 [Google Scholar]
- Hawksworth DL. (2011) A new dawn for the naming of fungi: impacts of decisions made in Melbourne in July 2011 on the future publication and regulation of fungal names. MycoKeys 1: 7–20 IMA Fungus 2: 155–162 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hawksworth DL, Cooper JA, Crous PW, Hyde KD, Iturriaga T, et al. (2010) Proposals to make the pre-publication deposit of key nomenclatural information in a recognized repository a requirement for valid publication of organisms treated as fungi under the Code. Taxon 59: 660–662; Mycotaxon 111: 514–519 [Google Scholar]
- Hawksworth DL, Crous PW, Redhead SA, Reynolds DR, Samson RA, Seifert KA, Taylor JW, Wingfield MJ, et al. (2011) The Amsterdam Declaration on Fungal Nomenclature. IMA Fungus 2: 105–112 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kirk PM, Cannon PF, Minter DW, Stalpers JA. (2008) Ainsworth & Bisby’s Dictionary of the Fungi, 10th edn Wallingford, CABI Publishing [Google Scholar]
- Kõljalg U, Nilsson RH, Abarenkov K, Tedersoo L, Taylor AFS. [and 37 others] (2013) Towards a unified paradigm for sequence-based identification of fungi. Molecular Ecology 22: 5271–5277 [DOI] [PubMed] [Google Scholar]
- McNeill J, Barrie FR, Buck WR, Demoulin V, Greuter W, Hawksworth DL, Herendeen PS, Knapp S, Marhold K, Prado J, Prud’homme van Reine WF, Smith GE, Wiersema JH, Turland NJ. (eds) (2012) International Code of Nomenclature for algae, fungi, and plants (Melbourne Code) adopted by the Eighteenth International Botanical Congress Melbourne, Australia, July 2011. [Regnum Vegetabile No. 154.]. Königstein: Koeltz Scientific Books [Google Scholar]
- Mora C, Tittensor DP, Adl S, Simpson AG, Worm B. (2011) How many species are there on Earth and in the ocean? PLoS Biology 9(8): e1001127. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Robert V, Szoke S, Jabas J, Vu D, Chouchen O, Blom E, Cardinali G. (2011) BioloMICS software, Biological data Management, Identification, Classification and Statistics. The Open Applied Informatics Journal 5: 87–98 [Google Scholar]
- Taylor JW. (2011) One Fungus = One Name: DNA and fungal nomenclature twenty years after PCR. IMA Fungus 2: 113–120 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wingfield MJ, De Beer ZW, Slippers B, Wingfield BD, Groenewald JZ, Lombard L, Crous PW. (2012) One fungus, one name promotes progressive plant pathology. Molecular Plant Pathology 13: 604–613 [DOI] [PMC free article] [PubMed] [Google Scholar]